Pharo 7 file streams guideline

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Damien Pollet-2
On Tue, 31 Jul 2018 at 18:28, Damien Pollet <[hidden email]> wrote:
Hi Sven… a couple questions:

For context, I'm considering options in Clap, for providing accessors to Stdio that:
- are convenient in most cases
- discourage users from explicitly referencing the Stdio global (so that one can inject other streams instead if needed)

For instance, it seems to make sense to provide stdout already wrapped with ZnNewLineWriterStream, but that precludes users from buffering.
 


- is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?
E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

- what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?



On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:


> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
>  Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

  Stdio stdout

should return return a character write stream with UTF-8 encoding while

  Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.
Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Guillermo Polito
In reply to this post by Damien Pollet-2


On Tue, Jul 31, 2018 at 6:29 PM Damien Pollet <[hidden email]> wrote:
Hi Sven… a couple questions:

- is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?
E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

Well, I'd say that we could check if performance-wise there is a difference... I don't think there will be much of a difference, but, who knows ^^.
 

- what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?

Check what we did in FileReference & co.
Opening a File reference returns by default a utf8 buffered stream (in that order). And we have convenience methods to specify other encodings and to get directly the binary stream (which will be buffered).

The idea is that FileSystem (among others like managing memory file systems) provides a high level API with convenience methods to avoid putting the burden of the composition in the user.
The File package stays small and flexible providing direct access to the physical file system.

Following this same idea, to me Clap should define several convenient ways to access standard input/output.
Like that, other Stdio users can define their own too.

Also, maybe Clap can provide the same API as FileSystem (#writeStreamEncoded:do:, #readStreamEncoded:do: & co) just for coherence?
 



On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:


> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
>  Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

  Stdio stdout

should return return a character write stream with UTF-8 encoding while

  Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.


--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Guillermo Polito
In reply to this post by Damien Pollet-2


On Tue, Jul 31, 2018 at 6:41 PM Damien Pollet <[hidden email]> wrote:
On Tue, 31 Jul 2018 at 18:28, Damien Pollet <[hidden email]> wrote:
Hi Sven… a couple questions:

For context, I'm considering options in Clap, for providing accessors to Stdio that:
- are convenient in most cases
- discourage users from explicitly referencing the Stdio global (so that one can inject other streams instead if needed)
 
Yes. And to me it's ok if Clap defines his own good usages of streams. Clap's usage of stream is not a typical file access.
 
For instance, it seems to make sense to provide stdout already wrapped with ZnNewLineWriterStream,

Also, ZnNewLineWriterStream needs only to be used when interacting with the external word.
The rationale of keeping it separate is twofold:
- Zn* streams are meant to be reusable with other kind of streams that are not files (such as streams in memory) where the newline conventions do not need to be *always* enforced because the image has internally its own convention (crs).
- Zn* streams are meant to be reusable with other kind of streams that are not files (such as sockets) where maybe we want *also* to enforce line ending convention.
 
but that precludes users from buffering.

Yeap. This makes me remember I tried a buffered file stream on /dev/urandom and the buffering introduced some funny effects :)
 


- is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?
E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

- what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?



On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:


> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
>  Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

  Stdio stdout

should return return a character write stream with UTF-8 encoding while

  Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.


--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Guillermo Polito


On Wed, Aug 1, 2018 at 11:19 AM Guillermo Polito <[hidden email]> wrote:


On Tue, Jul 31, 2018 at 6:41 PM Damien Pollet <[hidden email]> wrote:
On Tue, 31 Jul 2018 at 18:28, Damien Pollet <[hidden email]> wrote:
Hi Sven… a couple questions:

For context, I'm considering options in Clap, for providing accessors to Stdio that:
- are convenient in most cases
- discourage users from explicitly referencing the Stdio global (so that one can inject other streams instead if needed)
 
Yes. And to me it's ok if Clap defines his own good usages of streams. Clap's usage of stream is not a typical file access.
 
For instance, it seems to make sense to provide stdout already wrapped with ZnNewLineWriterStream,

Also, ZnNewLineWriterStream needs only to be used when interacting with the external word.
The rationale of keeping it separate is twofold:
- Zn* streams are meant to be reusable with other kind of streams that are not files (such as streams in memory) where the newline conventions do not need to be *always* enforced because the image has internally its own convention (crs).
- Zn* streams are meant to be reusable with other kind of streams that are not files (such as sockets) where maybe we want *also* to enforce line ending convention.

And I'll just add that this also makes it simple to "skip" line end convention transformations.
 
 
but that precludes users from buffering.

Yeap. This makes me remember I tried a buffered file stream on /dev/urandom and the buffering introduced some funny effects :)
 


- is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?
E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

- what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?



On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:


> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
>  Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

  Stdio stdout

should return return a character write stream with UTF-8 encoding while

  Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.


--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13



--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Henrik Sperre Johansen
In reply to this post by Guillermo Polito
Guillermo Polito wrote
> On Tue, Jul 31, 2018 at 6:29 PM Damien Pollet &lt;

> damien.pollet+pharo@

> &gt;
> wrote:
>
>> Hi Sven… a couple questions:
>>
>> - is there a preferred order of composition between the encoding and
>> buffering streams ? If yes, it the same for read and write stream, or
>> reversed ?
>> E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded,
>> but buffering it as well would be a problem for interactive applications.
>>
>
> Well, I'd say that we could check if performance-wise there is a
> difference... I don't think there will be much of a difference, but, who
> knows ^^.

It can be a world of difference, depending on what operations are expensive
on the terminal stream.
For example, with a buffer size N,
file <-> buffer <-> utf8 encoding does 1 N-byte write/read whenever buffer
fills/need filling,
file <-> utf8 encoding <-> buffer does N 1-4 byte writes/reads whenever
buffer fills/need filling.

TLDR; Always put the buffering between where small reads/writes occur
(encoding, code doing looped #nextPut:'s), and where reads/writes are
expensive (files, sockets, etc).


Guillermo Polito wrote

>>
>> - what's your opinion on convenience composition messages, e.g.
>> aBinaryStream buffered decoded: 'utf-8' ?
>>
>
> Check what we did in FileReference & co.
> Opening a File reference returns by default a utf8 buffered stream (in
> that
> order). And we have convenience methods to specify other encodings and to
> get directly the binary stream (which will be buffered).
>
> The idea is that FileSystem (among others like managing memory file
> systems) provides a high level API with convenience methods to avoid
> putting the burden of the composition in the user.
> The File package stays small and flexible providing direct access to the
> physical file system.
>
> Following this same idea, to me Clap should define several convenient ways
> to access standard input/output.
> Like that, other Stdio users can define their own too.
>
> Also, maybe Clap can provide the same API as FileSystem
> (#writeStreamEncoded:do:, #readStreamEncoded:do: & co) just for coherence?

The problem with using buffered write stream composition in a non-scoped
manner, is you have to remember to explicitly #close the streams in order
for buffers to flush correctly.

That is, unless one could have a #finalize on buffered streams; and ensure
it runs before finalizer on the wrapped stream (which would cause the
terminal to close, and subsequent #flush to fail), I'm not sure if that's
possible/how it already works...

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: Pharo 7 file streams guideline

Sven Van Caekenberghe-2
In reply to this post by Damien Pollet-2


> On 31 Jul 2018, at 18:28, Damien Pollet <[hidden email]> wrote:
>
> Hi Sven… a couple questions:

Very interesting questions, Damien.

> - is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?
> E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

Buffering typically makes a huge (an order of magnitude) difference since in most cases, bathing up a number of more expensive operations is a good idea. How big a difference depends on the access pattern (many small operations are more costly and big IO operation will not see much difference - actually ZnBuffered[Read|Write]Stream bypasses the buffer if more than have the buffer size is requested).

Buffering makes sense both at the lower binary level as well as at the character level. Benchmarks will tell.

I never thought about having a different buffering approach to reading and writing, but indeed that could make sense too.

There is also a cost to buffering: an additional indirection, management and memory usage.

> - what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?

I like these kinds of messages, but I keep saying that I prefer smaller stream APIs, not larger ones, hence I am hesitant to add more API (since every new stream has to understand these new cool messages).

> On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> > On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
> >
> > Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
> >
> > Maybe
> >
> >  Stdio stdoutAsText
> >
> > might be an idea, but this is so uncommon that I am not sure this is a good idea.
>
> Given all remarks and comments (thanks BTW), I now think that
>
> - textual stdio streams are the more common case
> - binary stdio streams are the primitive ones that are seldom used
> - another encoding than UTF-8 seems uncommon
> - these are streams that exist and need no real opening/closing
>
> So,
>
>   Stdio stdout
>
> should return return a character write stream with UTF-8 encoding while
>
>   Stdio binaryStdout
>
> should be the lower level binary one.
> This would be more in line with the other streams.
> A non-UTF-8 encoding can be used as per Pavel's example.


12