I added a hook to Xtreams files this evening.
The hook by now is in FileDirectory (no comment... I'll be happy when a replacement is integrated in Squeak/Pharo). ((FileDirectory default / 'toto.txt') writing encoding: #ascii) write: 'hello world'; close. ((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest. I didn't hook into StandardFileStream, but rather a simplified version of David Lewis IOHandle. I separated this stuff in Xtreams-SqueakExternals. Anyway, this is not really related to Xtreams, most of the functionality should be in core (apart reading/writing/appending which might better return in Xtreams-Terminals). I also slighlty modified XTFileReadStream and XTFileWriteStream to fit some Squeak specific API. Maybe I could have just used the VW API and provide compatibility layer... We'll see later. Nicolas |
On 12 Oct 2010, at 21:26, Nicolas Cellier wrote: > I added a hook to Xtreams files this evening. > The hook by now is in FileDirectory (no comment... I'll be happy when > a replacement is integrated in Squeak/Pharo). > > ((FileDirectory default / 'toto.txt') writing encoding: #ascii) write: > 'hello world'; close. > ((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest. > > I didn't hook into StandardFileStream, but rather a simplified version > of David Lewis IOHandle. > I separated this stuff in Xtreams-SqueakExternals. > Anyway, this is not really related to Xtreams, most of the > functionality should be in core (apart reading/writing/appending which > might better return in Xtreams-Terminals). > > I also slighlty modified XTFileReadStream and XTFileWriteStream to fit > some Squeak specific API. > Maybe I could have just used the VW API and provide compatibility > layer... We'll see later. > > Nicolas Thanks again, Nicolas, it is impressive to see you make so much progress. For me, all UTF16 tests seem to fail. These tests have the following code: Smalltalk isBigEndian ifTrue: [ bytes := (bytes reading transforming: [ :in :out || first | first := in get. out put: in get; put: first ]) rest ]. In my image, Smalltalk isBigEndian is false, but it seems that the switching should have been done. If I force it in the debugger, the tests would succeed as far as I can see. Sven |
On Tue, 12 Oct 2010, Sven Van Caekenberghe wrote:
> > On 12 Oct 2010, at 21:26, Nicolas Cellier wrote: > >> I added a hook to Xtreams files this evening. >> The hook by now is in FileDirectory (no comment... I'll be happy when >> a replacement is integrated in Squeak/Pharo). >> >> ((FileDirectory default / 'toto.txt') writing encoding: #ascii) write: >> 'hello world'; close. >> ((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest. >> >> I didn't hook into StandardFileStream, but rather a simplified version >> of David Lewis IOHandle. >> I separated this stuff in Xtreams-SqueakExternals. >> Anyway, this is not really related to Xtreams, most of the >> functionality should be in core (apart reading/writing/appending which >> might better return in Xtreams-Terminals). >> >> I also slighlty modified XTFileReadStream and XTFileWriteStream to fit >> some Squeak specific API. >> Maybe I could have just used the VW API and provide compatibility >> layer... We'll see later. >> >> Nicolas > > Thanks again, Nicolas, it is impressive to see you make so much progress. > > For me, all UTF16 tests seem to fail. These tests have the following code: > > Smalltalk isBigEndian ifTrue: [ > bytes := (bytes reading transforming: [ :in :out || first | first := in get. out put: in get; put: first ]) rest ]. > > In my image, Smalltalk isBigEndian is false, but it seems that the switching should have been done. If I force it in the debugger, the tests would succeed as far as I can see. Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that. Levente > > Sven > > > > |
Levente,
On 12 Oct 2010, at 23:08, Levente Uzonyi wrote: > Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that. OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ? Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ? Sven |
At Tue, 12 Oct 2010 23:35:36 +0200,
Sven Van Caekenberghe wrote: > > Levente, > > On 12 Oct 2010, at 23:08, Levente Uzonyi wrote: > > > Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that. > > OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ? > Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ? Hmm, doesn't it sound like the test is wrong? The endianness in UTF16 means the order in two-octet for each code-point. The external data comes as Byte(Array|String) and internal is UTF-32-ish data, so the platform endianness should not matter. -- Yoshiki |
On Tue, 12 Oct 2010, Yoshiki Ohshima wrote:
> At Tue, 12 Oct 2010 23:35:36 +0200, > Sven Van Caekenberghe wrote: >> >> Levente, >> >> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote: >> >>> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that. >> >> OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ? >> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ? > > Hmm, doesn't it sound like the test is wrong? The endianness in > UTF16 means the order in two-octet for each code-point. The external > data comes as Byte(Array|String) and internal is UTF-32-ish data, so > the platform endianness should not matter. According to rfc2781 the test is wrong and Squeak's implementation is right: "4.3 Interpreting text labelled as UTF-16 Text labelled with the "UTF-16" charset might be serialized in either big-endian or little-endian order. If the first two octets of the text is 0xFE followed by 0xFF, then the text can be interpreted as being big-endian. If the first two octets of the text is 0xFF followed by 0xFE, then the text can be interpreted as being little- endian. If the first two octets of the text is not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian." Levente > > -- Yoshiki > > |
Martin Kobetic fixed that in the VW repository. Be patient :)
Here is his message about the changes: The new version in Store includes the following: * changed default UTF16 encoder setup to be always big-endian, regardless of current platform; apparently Unicode says that's the default assumption without BOM and that's also what Squeak does. Ultimately we probably want our own portable encoder. * consequently changed the UTF16 tests to use big-endian unconditionally as well * bunch of tests neglected to close transforming write streams they created leaving collectable, but lingering processes behind. * simplified Encoder class initialization (the registerEncodingsIn: setup was overkill) * Encoder class>>for: (and therefore also #encoding:) now accepts both a Symbol or anything that understands #streamingAsEncoder. This allows passing in a preconfigured Encoder instance for example. Nicolas 2010/10/13 Levente Uzonyi <[hidden email]>: > On Tue, 12 Oct 2010, Yoshiki Ohshima wrote: > >> At Tue, 12 Oct 2010 23:35:36 +0200, >> Sven Van Caekenberghe wrote: >>> >>> Levente, >>> >>> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote: >>> >>>> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters >>>> are not initialized to the platform's endianness, but the test expects that. >>> >>> OK, so UTF16TextConverters>>#useLittleEndian: should be called with >>> Smalltalk isLittleEndian as argument, yes ? >>> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very >>> elegantly. Or would it be better done in an initialize (that is not there) ? >> >> Hmm, doesn't it sound like the test is wrong? The endianness in >> UTF16 means the order in two-octet for each code-point. The external >> data comes as Byte(Array|String) and internal is UTF-32-ish data, so >> the platform endianness should not matter. > > According to rfc2781 the test is wrong and Squeak's implementation is right: > > "4.3 Interpreting text labelled as UTF-16 > > Text labelled with the "UTF-16" charset might be serialized in either > big-endian or little-endian order. If the first two octets of the > text is 0xFE followed by 0xFF, then the text can be interpreted as > being big-endian. If the first two octets of the text is 0xFF > followed by 0xFE, then the text can be interpreted as being little- > endian. If the first two octets of the text is not 0xFE followed by > 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be > interpreted as being big-endian." > > > Levente > >> >> -- Yoshiki >> >> > > |
Free forum by Nabble | Edit this page |