MemoryFileSystemFile>>#readStream forces String

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

MemoryFileSystemFile>>#readStream forces String

Udo Schneider
All,

is there any reason why MemoryFileSystemFile>>#readStream forces it's
content to a String (#aString)?

readStream
        ^ ReadStream on: self bytes  asString from: 1 to: size

I'm parsing XML files from an in-memory ZIP Archive and had some real
problem with non-ASCII characters. Took me some while to figure out
reading from the in-memory-archive returns a String. This prevents the
XML Parser from doing a PI based decoding (utf-8 in this case).

Just as a sidenote: Although the GT-Spotter/XML Integration relies on
FileReference it assumes that the file is in the DiskFilesystem (some
methods only pass a Path). Is this intentional? If not I'd try to fix it
on the run.

Thanks,

Udo


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Sven Van Caekenberghe-2

> On 29 Feb 2016, at 20:27, Udo Schneider <[hidden email]> wrote:
>
> All,
>
> is there any reason why MemoryFileSystemFile>>#readStream forces it's content to a String (#aString)?
>
> readStream
> ^ ReadStream on: self bytes  asString from: 1 to: size
>
> I'm parsing XML files from an in-memory ZIP Archive and had some real problem with non-ASCII characters. Took me some while to figure out reading from the in-memory-archive returns a String. This prevents the XML Parser from doing a PI based decoding (utf-8 in this case).

I think it is not good to do this, better stick with bytes.

Primitive streams should be binary only, interpreting them as characters is easily done wrapping a ZnCharacter[Read|Write]Stream on them.

But this whole situation is a mess.

> Just as a sidenote: Although the GT-Spotter/XML Integration relies on FileReference it assumes that the file is in the DiskFilesystem (some methods only pass a Path). Is this intentional? If not I'd try to fix it on the run.
>
> Thanks,
>
> Udo
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Max Leske
We introduced #binaryReadStream and friends as a work around in Pharo 4. The reason we didn’t fix the problem properly is that we’ve been waiting for the introduction of XStreams in hopes of getting a cleaner stream API.

Max

> On 29 Feb 2016, at 23:06, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
>> On 29 Feb 2016, at 20:27, Udo Schneider <[hidden email]> wrote:
>>
>> All,
>>
>> is there any reason why MemoryFileSystemFile>>#readStream forces it's content to a String (#aString)?
>>
>> readStream
>> ^ ReadStream on: self bytes  asString from: 1 to: size
>>
>> I'm parsing XML files from an in-memory ZIP Archive and had some real problem with non-ASCII characters. Took me some while to figure out reading from the in-memory-archive returns a String. This prevents the XML Parser from doing a PI based decoding (utf-8 in this case).
>
> I think it is not good to do this, better stick with bytes.
>
> Primitive streams should be binary only, interpreting them as characters is easily done wrapping a ZnCharacter[Read|Write]Stream on them.
>
> But this whole situation is a mess.
>
>> Just as a sidenote: Although the GT-Spotter/XML Integration relies on FileReference it assumes that the file is in the DiskFilesystem (some methods only pass a Path). Is this intentional? If not I'd try to fix it on the run.
>>
>> Thanks,
>>
>> Udo
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

stepharo
In reply to this post by Sven Van Caekenberghe-2
Hi sven

I would like to get Xtreams (with a slightly modified API - no ++ and
--) in Pharo 6.0.
So I hope that we will be able to clean the stream part of Pharo.

Stef


Le 29/2/16 23:06, Sven Van Caekenberghe a écrit :

>> On 29 Feb 2016, at 20:27, Udo Schneider <[hidden email]> wrote:
>>
>> All,
>>
>> is there any reason why MemoryFileSystemFile>>#readStream forces it's content to a String (#aString)?
>>
>> readStream
>> ^ ReadStream on: self bytes  asString from: 1 to: size
>>
>> I'm parsing XML files from an in-memory ZIP Archive and had some real problem with non-ASCII characters. Took me some while to figure out reading from the in-memory-archive returns a String. This prevents the XML Parser from doing a PI based decoding (utf-8 in this case).
> I think it is not good to do this, better stick with bytes.
>
> Primitive streams should be binary only, interpreting them as characters is easily done wrapping a ZnCharacter[Read|Write]Stream on them.
>
> But this whole situation is a mess.
>
>> Just as a sidenote: Although the GT-Spotter/XML Integration relies on FileReference it assumes that the file is in the DiskFilesystem (some methods only pass a Path). Is this intentional? If not I'd try to fix it on the run.
>>
>> Thanks,
>>
>> Udo
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Sven Van Caekenberghe-2

> On 01 Mar 2016, at 07:07, stepharo <[hidden email]> wrote:
>
> Hi sven
>
> I would like to get Xtreams (with a slightly modified API - no ++ and --) in Pharo 6.0.

Yes that would be a good idea.

> So I hope that we will be able to clean the stream part of Pharo.

But the problem is the users of streams. The stream API is too wide, people expect hundreds of methods to be there, mixing characters and bytes, encodings, line end conventions, infinite buffering, arbitrary positioning, and so on. Many of these characteristics can be elegantly composed, just when you need then.

Right now we have some cool streams in the image, with minimal APIs, targeted at just one function. Like Guile's new file streams, the Zn streams, Zdc socket streams, we just have to use them.

> Stef
>
>
> Le 29/2/16 23:06, Sven Van Caekenberghe a écrit :
>>> On 29 Feb 2016, at 20:27, Udo Schneider <[hidden email]> wrote:
>>>
>>> All,
>>>
>>> is there any reason why MemoryFileSystemFile>>#readStream forces it's content to a String (#aString)?
>>>
>>> readStream
>>> ^ ReadStream on: self bytes  asString from: 1 to: size
>>>
>>> I'm parsing XML files from an in-memory ZIP Archive and had some real problem with non-ASCII characters. Took me some while to figure out reading from the in-memory-archive returns a String. This prevents the XML Parser from doing a PI based decoding (utf-8 in this case).
>> I think it is not good to do this, better stick with bytes.
>>
>> Primitive streams should be binary only, interpreting them as characters is easily done wrapping a ZnCharacter[Read|Write]Stream on them.
>>
>> But this whole situation is a mess.
>>
>>> Just as a sidenote: Although the GT-Spotter/XML Integration relies on FileReference it assumes that the file is in the DiskFilesystem (some methods only pass a Path). Is this intentional? If not I'd try to fix it on the run.
>>>
>>> Thanks,
>>>
>>> Udo
>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Udo Schneider
In reply to this post by Sven Van Caekenberghe-2
On 29/02/16 23:06, Sven Van Caekenberghe wrote:
> I think it is not good to do this, better stick with bytes.
Should I fix it and submit a slice or use a workaround in my code?

CU,

Udo




Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Max Leske

> On 01 Mar 2016, at 20:51, Udo Schneider <[hidden email]> wrote:
>
> On 29/02/16 23:06, Sven Van Caekenberghe wrote:
>> I think it is not good to do this, better stick with bytes.
> Should I fix it and submit a slice or use a workaround in my code?

I don’t think it makes sense to change #readStream. I do think though, that the tools would profit from an improved XML handling.

Max

>
> CU,
>
> Udo
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

Sven Van Caekenberghe-2
using #asString there is wrong, IMHO, if the memory stream is on bytes it should return a read stream reading bytes, if it is on characters, the read stream should return characters.

> On 01 Mar 2016, at 21:13, Max Leske <[hidden email]> wrote:
>
>
>> On 01 Mar 2016, at 20:51, Udo Schneider <[hidden email]> wrote:
>>
>> On 29/02/16 23:06, Sven Van Caekenberghe wrote:
>>> I think it is not good to do this, better stick with bytes.
>> Should I fix it and submit a slice or use a workaround in my code?
>
> I don’t think it makes sense to change #readStream. I do think though, that the tools would profit from an improved XML handling.
>
> Max
>
>>
>> CU,
>>
>> Udo
>>
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: MemoryFileSystemFile>>#readStream forces String

stepharo
In reply to this post by Sven Van Caekenberghe-2

> Yes that would be a good idea.
>
>> So I hope that we will be able to clean the stream part of Pharo.
> But the problem is the users of streams. The stream API is too wide, people expect hundreds of methods to be there, mixing characters and bytes, encodings, line end conventions, infinite buffering, arbitrary positioning, and so on. Many of these characteristics can be elegantly composed, just when you need then.
>
> Right now we have some cool streams in the image, with minimal APIs, targeted at just one function. Like Guile's new file streams, the Zn streams, Zdc socket streams, we just have to use them.
So what could be a plan?
Where should we spend energy?

Stef