Issue status update for
http://smalltalk.gnu.org/node/113 Post a follow up: http://smalltalk.gnu.org/project/comments/add/113 Project: GNU Smalltalk Version: <none> Component: Base classes Category: bug reports Priority: normal Assigned to: Unassigned Reported by: elmex Updated by: elmex Status: active Attachment: http://smalltalk.gnu.org/files/issues/unitest2.st.txt (849 bytes) Take the attached program. Which prints here: 3 44 E3 <-> EF 81 <-> BF AA <-> BE E3 <-> E6 81 <-> A8 BE <-> B0 E3 <-> E7 81 <-> B8 9F <-> B0 But should print (at least as far as my understanding in Unicode and encodings goes): 3 33 E3 <-> E3 81 <-> 81 AA <-> AA E3 <-> E3 81 <-> 81 BE <-> BE E3 <-> E3 81 <-> 81 9F <-> 9F _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Issue status update for
http://smalltalk.gnu.org/project/issue/113 Post a follow up: http://smalltalk.gnu.org/project/comments/add/113 Project: GNU Smalltalk Version: <none> Component: Base classes Category: bug reports Priority: normal Assigned to: Unassigned Reported by: elmex Updated by: bonzinip Status: active Attachment: http://smalltalk.gnu.org/files/issues/gst-encoding-lazy.patch (594 bytes) EF-BF-BE is the unicode "byte order mark" (BOM) encoded in UTF-8. It was born as a way to distinguish big- and little-endian UTF-16. Since it's not really a character, Iconv tries to strip it when converting to a UnicodeString, but it is failing to do so in this case. Now, under Mac OS X I get the expected result, under Linux I get yours. The reason is that my Mac is big-endian, so Iconv produces big-endian UTF-16, while Linux produces little-endian UTF-16. Since the default encoding of UTF-16 is big-endian, the Mac happens to get the right thing, while Linux messes up the encoding. So later on the "pipe peekFor: $<16rFEFF>" statement to strip the BOM does not work. The attached patch fixes this by making EncodedString look for a BOM when retrieving the encoding, rather than when setting it. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Robin Redeker-2
Issue status update for
http://smalltalk.gnu.org/project/issue/113 Post a follow up: http://smalltalk.gnu.org/project/comments/add/113 Project: GNU Smalltalk Version: <none> Component: Base classes Category: bug reports Priority: normal -Assigned to: Unassigned +Assigned to: bonzinip Reported by: elmex Updated by: bonzinip -Status: active +Status: fixed fixed in patch-612, which is the same patch I posted plus this testcase str := EncodedString fromString: (String new: 2) encoding: 'UTF-16'. str valueAt: 1 put: 254; valueAt: 2 put: 255. self assert: str numberOfCharacters = 0. str valueAt: 1 put: 255; valueAt: 2 put: 254. self assert: str numberOfCharacters = 0 Thanks! _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Paolo Bonzini
On Mon, Oct 22, 2007 at 02:01:23AM -0700, Paolo Bonzini wrote:
> Issue status update for > http://smalltalk.gnu.org/project/issue/113 > [.snip.] > > The attached patch fixes this by making EncodedString look for a BOM > when retrieving the encoding, rather than when setting it. Thanks it works now! I hope you don't mind me filing so many bugreports :) I've been working on my chat implementation which uses JSON recently and I'm eager to support Unicode. Robin _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |