Xtreams files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Xtreams files

Nicolas Cellier
I added a hook to Xtreams files this evening.
The hook by now is in FileDirectory (no comment... I'll be happy when
a replacement is integrated in Squeak/Pharo).

((FileDirectory default / 'toto.txt') writing encoding: #ascii) write:
'hello world'; close.
((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest.

I didn't hook into StandardFileStream, but rather a simplified version
of David Lewis IOHandle.
I separated this stuff in Xtreams-SqueakExternals.
Anyway, this is not really related to Xtreams, most of the
functionality should be in core (apart reading/writing/appending which
might better return in Xtreams-Terminals).

I also slighlty modified XTFileReadStream and XTFileWriteStream to fit
some Squeak specific API.
Maybe I could have just used the VW API and provide compatibility
layer... We'll see later.

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Sven Van Caekenberghe

On 12 Oct 2010, at 21:26, Nicolas Cellier wrote:

> I added a hook to Xtreams files this evening.
> The hook by now is in FileDirectory (no comment... I'll be happy when
> a replacement is integrated in Squeak/Pharo).
>
> ((FileDirectory default / 'toto.txt') writing encoding: #ascii) write:
> 'hello world'; close.
> ((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest.
>
> I didn't hook into StandardFileStream, but rather a simplified version
> of David Lewis IOHandle.
> I separated this stuff in Xtreams-SqueakExternals.
> Anyway, this is not really related to Xtreams, most of the
> functionality should be in core (apart reading/writing/appending which
> might better return in Xtreams-Terminals).
>
> I also slighlty modified XTFileReadStream and XTFileWriteStream to fit
> some Squeak specific API.
> Maybe I could have just used the VW API and provide compatibility
> layer... We'll see later.
>
> Nicolas

Thanks again, Nicolas, it is impressive to see you make so much progress.

For me, all UTF16 tests seem to fail. These tests have the following code:

        Smalltalk isBigEndian ifTrue: [
                bytes := (bytes reading transforming: [ :in :out || first | first := in get. out put: in get; put: first ]) rest ].

In my image, Smalltalk isBigEndian is false, but it seems that the switching should have been done. If I force it in the debugger, the tests would succeed as far as I can see.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Levente Uzonyi-2
On Tue, 12 Oct 2010, Sven Van Caekenberghe wrote:

>
> On 12 Oct 2010, at 21:26, Nicolas Cellier wrote:
>
>> I added a hook to Xtreams files this evening.
>> The hook by now is in FileDirectory (no comment... I'll be happy when
>> a replacement is integrated in Squeak/Pharo).
>>
>> ((FileDirectory default / 'toto.txt') writing encoding: #ascii) write:
>> 'hello world'; close.
>> ((FileDirectory default / 'toto.txt') reading encoding: #ascii) rest.
>>
>> I didn't hook into StandardFileStream, but rather a simplified version
>> of David Lewis IOHandle.
>> I separated this stuff in Xtreams-SqueakExternals.
>> Anyway, this is not really related to Xtreams, most of the
>> functionality should be in core (apart reading/writing/appending which
>> might better return in Xtreams-Terminals).
>>
>> I also slighlty modified XTFileReadStream and XTFileWriteStream to fit
>> some Squeak specific API.
>> Maybe I could have just used the VW API and provide compatibility
>> layer... We'll see later.
>>
>> Nicolas
>
> Thanks again, Nicolas, it is impressive to see you make so much progress.
>
> For me, all UTF16 tests seem to fail. These tests have the following code:
>
> Smalltalk isBigEndian ifTrue: [
> bytes := (bytes reading transforming: [ :in :out || first | first := in get. out put: in get; put: first ]) rest ].
>
> In my image, Smalltalk isBigEndian is false, but it seems that the switching should have been done. If I force it in the debugger, the tests would succeed as far as I can see.

Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are
not initialized to the platform's endianness, but the test expects that.


Levente

>
> Sven
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Sven Van Caekenberghe
Levente,

On 12 Oct 2010, at 23:08, Levente Uzonyi wrote:

> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that.

OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ?
Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ?

Sven




Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Yoshiki Ohshima-2
At Tue, 12 Oct 2010 23:35:36 +0200,
Sven Van Caekenberghe wrote:
>
> Levente,
>
> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote:
>
> > Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that.
>
> OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ?
> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ?

  Hmm, doesn't it sound like the test is wrong?  The endianness in
UTF16 means the order in two-octet for each code-point.  The external
data comes as Byte(Array|String) and internal is UTF-32-ish data, so
the platform endianness should not matter.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Levente Uzonyi-2
On Tue, 12 Oct 2010, Yoshiki Ohshima wrote:

> At Tue, 12 Oct 2010 23:35:36 +0200,
> Sven Van Caekenberghe wrote:
>>
>> Levente,
>>
>> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote:
>>
>>> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters are not initialized to the platform's endianness, but the test expects that.
>>
>> OK, so UTF16TextConverters>>#useLittleEndian: should be called with Smalltalk isLittleEndian as argument, yes ?
>> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very elegantly. Or would it be better done in an initialize (that is not there) ?
>
>  Hmm, doesn't it sound like the test is wrong?  The endianness in
> UTF16 means the order in two-octet for each code-point.  The external
> data comes as Byte(Array|String) and internal is UTF-32-ish data, so
> the platform endianness should not matter.

According to rfc2781 the test is wrong and Squeak's implementation is
right:

"4.3 Interpreting text labelled as UTF-16

    Text labelled with the "UTF-16" charset might be serialized in either
    big-endian or little-endian order. If the first two octets of the
    text is 0xFE followed by 0xFF, then the text can be interpreted as
    being big-endian. If the first two octets of the text is 0xFF
    followed by 0xFE, then the text can be interpreted as being little-
    endian. If the first two octets of the text is not 0xFE followed by
    0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
    interpreted as being big-endian."


Levente

>
> -- Yoshiki
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Xtreams files

Nicolas Cellier
Martin Kobetic fixed that in the VW repository. Be patient :)
Here is his message about the changes:


The new version in Store includes the following:

* changed default UTF16 encoder setup to be always big-endian,
regardless of current platform; apparently Unicode says that's the
default assumption without BOM and that's also what Squeak does.
Ultimately we probably want our own portable encoder.
* consequently changed the UTF16 tests to use big-endian unconditionally as well
* bunch of tests neglected to close transforming write streams they
created leaving collectable, but lingering processes behind.
* simplified Encoder class initialization (the registerEncodingsIn:
setup was overkill)
* Encoder class>>for: (and therefore also #encoding:) now accepts both
a Symbol or anything that understands #streamingAsEncoder. This allows
passing in a preconfigured Encoder instance for example.

Nicolas

2010/10/13 Levente Uzonyi <[hidden email]>:

> On Tue, 12 Oct 2010, Yoshiki Ohshima wrote:
>
>> At Tue, 12 Oct 2010 23:35:36 +0200,
>> Sven Van Caekenberghe wrote:
>>>
>>> Levente,
>>>
>>> On 12 Oct 2010, at 23:08, Levente Uzonyi wrote:
>>>
>>>> Oh. I "fixed" those yesterday. The problem is that UTF16TextConverters
>>>> are not initialized to the platform's endianness, but the test expects that.
>>>
>>> OK, so UTF16TextConverters>>#useLittleEndian: should be called with
>>> Smalltalk isLittleEndian as argument, yes ?
>>> Who ? The client, XTSqueakEncoder>>#encoding: could do it, but not very
>>> elegantly. Or would it be better done in an initialize (that is not there) ?
>>
>>  Hmm, doesn't it sound like the test is wrong?  The endianness in
>> UTF16 means the order in two-octet for each code-point.  The external
>> data comes as Byte(Array|String) and internal is UTF-32-ish data, so
>> the platform endianness should not matter.
>
> According to rfc2781 the test is wrong and Squeak's implementation is right:
>
> "4.3 Interpreting text labelled as UTF-16
>
>   Text labelled with the "UTF-16" charset might be serialized in either
>   big-endian or little-endian order. If the first two octets of the
>   text is 0xFE followed by 0xFF, then the text can be interpreted as
>   being big-endian. If the first two octets of the text is 0xFF
>   followed by 0xFE, then the text can be interpreted as being little-
>   endian. If the first two octets of the text is not 0xFE followed by
>   0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
>   interpreted as being big-endian."
>
>
> Levente
>
>>
>> -- Yoshiki
>>
>>
>
>