Smalltalk › Pharo › Pharo Smalltalk Users

Pharo 7 file streams guideline

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

26 messages Options

Pavel Krivanek-3

Pharo 7 file streams guideline

Hello,

I've prepared a draft of a short document that should help you with the transition to the "new" file streams API in Pharo 7.

https://github.com/pavel-krivanek/pharoMaterials/blob/master/Filestreams.MD

Pull requests are welcome.

Cheers,

-- Pavel

Ben Coman

Re: Pharo 7 file streams guideline

On 23 July 2018 at 15:38, Pavel Krivanek <[hidden email]> wrote:

> Hello,
>
> I've prepared a draft of a short document that should help you with the
> transition to the "new" file streams API in Pharo 7.
>
> https://github.com/pavel-krivanek/pharoMaterials/blob/master/Filestreams.MD
>
> Pull requests are welcome.
>
> Cheers,
> -- Pavel
>
>
>

Some discussion prior to any pull request (and while I'm away from
Pharo machine)

I like all the new code examples until "Write a UTF-8 text to STDOUT"
and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
nextPutAll: 'a ≠ b' ]" would better fit the pattern of the other new
code.
(presuming "Stdio stdout" returns a FileReference, oherwise maybe
"Stdio stdoutRef" or "Stdio stdout asFileReference")

Under "Positionable streams", rather than needing "3 timesRepeat: [
stream next ]"
would it be worth having #utf8position: to more intention revealing?

cheers -ben

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

> On 23 Jul 2018, at 11:13, Ben Coman <[hidden email]> wrote:
>
> On 23 July 2018 at 15:38, Pavel Krivanek <[hidden email]> wrote:
>> Hello,
>>
>> I've prepared a draft of a short document that should help you with the
>> transition to the "new" file streams API in Pharo 7.
>>
>> https://github.com/pavel-krivanek/pharoMaterials/blob/master/Filestreams.MD
>>
>> Pull requests are welcome.
>>
>> Cheers,
>> -- Pavel
>>
>>
>>
>
> Some discussion prior to any pull request (and while I'm away from
> Pharo machine)
>
> I like all the new code examples until "Write a UTF-8 text to STDOUT"
> and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
> nextPutAll: 'a ≠ b' ]" would better fit the pattern of the other new
> code.
> (presuming "Stdio stdout" returns a FileReference, oherwise maybe
> "Stdio stdoutRef" or "Stdio stdout asFileReference")

Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.

Maybe

Stdio stdoutAsText

might be an idea, but this is so uncommon that I am not sure this is a good idea.

> Under "Positionable streams", rather than needing "3 timesRepeat: [
> stream next ]"
> would it be worth having #utf8position: to more intention revealing?

no more new API, please.

Positioning in a variable length encoded stream is plain hard - see my other mail.

> cheers -ben
>

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

In reply to this post by Pavel Krivanek-3

> On 23 Jul 2018, at 09:38, Pavel Krivanek <[hidden email]> wrote:
>
> Hello,
>
> I've prepared a draft of a short document that should help you with the transition to the "new" file streams API in Pharo 7.
>
> https://github.com/pavel-krivanek/pharoMaterials/blob/master/Filestreams.MD
>
> Pull requests are welcome.
>
> Cheers,
> -- Pavel

Great contribution, thanks !

I see no factual errors, everything is clear, cookbook style.

The last two sections might be more confusing than helpful. I would go more for 'all streams are buffered, don't worry' and 'beware positioning is byte based'.

Regarding positioning, some additional points that might no be so well known.

- Positioning in a variable length encoded stream is plain hard (and basically comes down to linear searching)

- All ZnCharacterEncoders understand #backOnStream: that does the right thing

- There is also ZnPositionableReadStream that can help (read the class comment)

Sven

Richard O'Keefe

Re: Pharo 7 file streams guideline

In reply to this post by Pavel Krivanek-3

I am a little confused by Filestreams.MD

#position and #position: report and set the number of past items.

So when you open a stream, #position is 0.  So why does the first

"Positionable streams" example use ... position: 4 ... while the

second one skips 3 characters?  ... position: 3 ... would be

just as broken.

Oh, I think a clarification is needed when talking about UTF-8.

To the best of my knowledge you don't need a Byte-Order-Mark at

the beginning of a UTF-8 stream because there is no byte order

issue to result, but apparently many Windows programs like to

add one.  Does/will Pharo add one when writing a UTF-8 file?

Does/will it skip one when reading a UTF-8 file?

I find the new approach produces unattractive code.  I can do

all of the examples simply in a system that

(a) implements the ANSI Smalltalk FileStream class methods

(b) supports '/dev/stdin' and '/dev/stdout' as file names

(c) interprets the external type #'text' either as #'utf8'

or as "whatever $LC_CTYPE says" and also supports #'utf8'

and #'cp1250' as external types.

Oh, and defines

Object>>bindOwn: aBlock

   ^(self respondsTo: #close)

      ifTrue:  [aBlock ensure: [self close]]

      ifFalse: [aBlock value]

What document should I read to get a mental model of the new

system and understand its rationale?

On 23 July 2018 at 19:38, Pavel Krivanek <[hidden email]> wrote:

Hello,

I've prepared a draft of a short document that should help you with the transition to the "new" file streams API in Pharo 7.

https://github.com/pavel-krivanek/pharoMaterials/blob/master/Filestreams.MD

Pull requests are welcome.

Cheers,
-- Pavel

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

> On 23 Jul 2018, at 18:52, Richard O'Keefe <[hidden email]> wrote:
>
> Oh, I think a clarification is needed when talking about UTF-8.

Why ?

> To the best of my knowledge you don't need a Byte-Order-Mark at
> the beginning of a UTF-8 stream because there is no byte order
> issue to result,

Nothing was said about BOMs.

> but apparently many Windows programs like to add one.

Apparently yes.

> Does/will Pharo add one when writing a UTF-8 file?

No

> Does/will it skip one when reading a UTF-8 file?

Yes

> I find the new approach produces unattractive code. I can do
> all of the examples simply in a system that
> (a) implements the ANSI Smalltalk FileStream class methods

Given a FileReference object of some kind you ask for the streams you need. Seems pretty OO if you ask me.

Do you prefer a global class side factory facade ?

Both approaches are not exclusive per se, we just want a clear break/difference (because the resulting streams are not 100% the same).

> (b) supports '/dev/stdin' and '/dev/stdout' as file names

Maybe, with special casing. I like 'Stdio stdout' better as it is more cross platform.

> (c) interprets the external type #'text' either as #'utf8' LC_CTYPE

Maybe, but that is not very Smalltalk like is it ? You want users to set environment variables ? Anyway this is less important point as UTF-8 is (should be) the general default.

> What document should I read to get a mental model of the new
> system and understand its rationale?

ML discussions over the year, I guess.

We are generally against big hairy complex classes and prefer simpler ones.

alistairgrant

Re: Pharo 7 file streams guideline

In reply to this post by Sven Van Caekenberghe-2

Hi Pavel & Sven,

Thanks for writing this, it is a great quick reference.

On Mon, 23 Jul 2018 at 12:08, Sven Van Caekenberghe <[hidden email]> wrote:

>
>
>
> > On 23 Jul 2018, at 11:13, Ben Coman <[hidden email]> wrote:
> >
> > I like all the new code examples until "Write a UTF-8 text to STDOUT"
> > and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
> > nextPutAll: 'a ≠ b' ]" would better fit the pattern of the other new
> > code.
> > (presuming "Stdio stdout" returns a FileReference, oherwise maybe
> > "Stdio stdoutRef" or "Stdio stdout asFileReference")
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
> Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

I've written this code enough times that I'd like to see it included. :-)

Maybe

Stdout utf8Stdout

(following the pattern of ByteArray>>utf8Decoded, String>>utf8Encoded)

?

Thanks again,
Alistair

Denis Kudriashov

Re: Pharo 7 file streams guideline

Hi.

I wonder does not stdout and stdin are always about text input/output?

I never saw examples when somebody explicitly write raw bytes into these streams.

If I am right then it is better to introduce binaryStdout and binaryStdin messages. And make stdout and stdin use most common encoding by default. How it is done in Java?

2018-07-23 19:19 GMT+01:00 Alistair Grant <[hidden email]>:

Hi Pavel & Sven,

Thanks for writing this, it is a great quick reference.

On Mon, 23 Jul 2018 at 12:08, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
>
> > On 23 Jul 2018, at 11:13, Ben Coman <[hidden email]> wrote:
> >
> > I like all the new code examples until "Write a UTF-8 text to STDOUT"
> > and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
> > nextPutAll: 'a ≠ b' ]" would better fit the pattern of the other new
> > code.
> > (presuming "Stdio stdout" returns a FileReference, oherwise maybe
> > "Stdio stdoutRef" or "Stdio stdout asFileReference")
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
> Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

I've written this code enough times that I'd like to see it included. :-)

Maybe

Stdout utf8Stdout

(following the pattern of ByteArray>>utf8Decoded, String>>utf8Encoded)

?

Thanks again,
Alistair

Ben Coman

Re: Pharo 7 file streams guideline

On 24 July 2018 at 02:38, Denis Kudriashov <[hidden email]> wrote:

>
> 2018-07-23 19:19 GMT+01:00 Alistair Grant <[hidden email]>:
>>
>> Hi Pavel & Sven,
>>
>> Thanks for writing this, it is a great quick reference.
>>
>>
>> On Mon, 23 Jul 2018 at 12:08, Sven Van Caekenberghe <[hidden email]> wrote:
>> >
>> >
>> >
>> > > On 23 Jul 2018, at 11:13, Ben Coman <[hidden email]> wrote:
>> > >
>> > > I like all the new code examples until "Write a UTF-8 text to STDOUT"
>> > > and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
>> > > nextPutAll: 'a ≠ b' ]" would better fit the pattern of the other new
>> > > code.
>> > > (presuming "Stdio stdout" returns a FileReference, oherwise maybe
>> > > "Stdio stdoutRef" or "Stdio stdout asFileReference")
>> >
>> > Stdio stdout and friends just return a binary stream, hence they need
>> > wrapping for encoding.
>> >
>> > Maybe
>> >
>> > Stdio stdoutAsText
>> >
>> > might be an idea, but this is so uncommon that I am not sure this is a
>> > good idea.
>>
>> I've written this code enough times that I'd like to see it included. :-)
>>
>> Maybe
>>
>> Stdout utf8Stdout
>>
>> (following the pattern of ByteArray>>utf8Decoded, String>>utf8Encoded)
>>
>> ?
>>
>> Thanks again,
>> Alistair
>>
>
> I wonder does not stdout and stdin are always about text input/output?
> I never saw examples when somebody explicitly write raw bytes into these
> streams.

Its done... https://subosito.com/posts/imagemagick/
but I guess its not the usual case.

> If I am right then it is better to introduce binaryStdout and binaryStdin
> messages. And make stdout and stdin use most common encoding by default. How
> it is done in Java?
>
>

It might be nice to further the new convention "use file references as
entry points to file streams"
so the new API can be reused...

StdioRef stdout writeStreamDo: [ :stream | stream nextPutAll: 'a ≠ b' ].
StdioRef stdout writeStreamEncoded: 'cp-1250' do: [ :stream | stream
nextPutAll: 'Příliš žluťoučký kůň úpěl ďábelské ódy.' ].
StdioRef stdout binaryWriteStreamDo: [ :stream | stream nextPutAll: #[1 2 3] ].

cheers -ben

David T. Lewis

Re: Pharo 7 file streams guideline

In reply to this post by Denis Kudriashov

On Mon, Jul 23, 2018 at 07:38:16PM +0100, Denis Kudriashov wrote:
> Hi.
>
> I wonder does not stdout and stdin are always about text input/output?

No. Consider the case of reading and writing serialized objects on stdin
and stdout, possibly between two images sending serialized objects to one
another.

> I never saw examples when somebody explicitly write raw bytes into these
> streams.

http://wiki.squeak.org/squeak/6176

I have not checked recently, but it should still work on Pharo with Fuel
serialization.

>
> If I am right then it is better to introduce binaryStdout and binaryStdin
> messages. And make stdout and stdin use most common encoding by default.
> How it is done in Java?
>

You are probably right that this is the best default, since it would be the
most common case.

Dave

>
> 2018-07-23 19:19 GMT+01:00 Alistair Grant <[hidden email]>:
>
> > Hi Pavel & Sven,
> >
> > Thanks for writing this, it is a great quick reference.
> >
> >
> > On Mon, 23 Jul 2018 at 12:08, Sven Van Caekenberghe <[hidden email]> wrote:
> > >
> > >
> > >
> > > > On 23 Jul 2018, at 11:13, Ben Coman <[hidden email]> wrote:
> > > >
> > > > I like all the new code examples until "Write a UTF-8 text to STDOUT"
> > > > and I wonder "Stdio stdout writeStreamDo: [ :stream | stream
> > > > nextPutAll: 'a ??? b' ]" would better fit the pattern of the other new
> > > > code.
> > > > (presuming "Stdio stdout" returns a FileReference, oherwise maybe
> > > > "Stdio stdoutRef" or "Stdio stdout asFileReference")
> > >
> > > Stdio stdout and friends just return a binary stream, hence they need
> > wrapping for encoding.
> > >
> > > Maybe
> > >
> > > Stdio stdoutAsText
> > >
> > > might be an idea, but this is so uncommon that I am not sure this is a
> > good idea.
> >
> > I've written this code enough times that I'd like to see it included. :-)
> >
> > Maybe
> >
> > Stdout utf8Stdout
> >
> > (following the pattern of ByteArray>>utf8Decoded, String>>utf8Encoded)
> >
> > ?
> >
> > Thanks again,
> > Alistair
> >
> >

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

In reply to this post by Sven Van Caekenberghe-2

> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
> Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

Stdio stdout

should return return a character write stream with UTF-8 encoding while

Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.

alistairgrant

Re: Pharo 7 file streams guideline

On Tue., 24 Jul. 2018, 10:13 Sven Van Caekenberghe, <[hidden email]> wrote:

> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
> Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

Stdio stdout

should return return a character write stream with UTF-8 encoding while

Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.

I didn't suggest this earlier because it isn't backward compatible. But I do think it is the better solution.

Cheers,
Alistair
(on phone)

Damien Pollet-2

Re: Pharo 7 file streams guideline

On Tue, 24 Jul 2018 at 11:39, Alistair Grant <[hidden email]> wrote:

> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
So,

Stdio stdout

should return return a character write stream with UTF-8 encoding while

Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.

+1

I didn't suggest this earlier because it isn't backward compatible. But I do think it is the better solution.

I had a look at Stdio recently for Clap. The current implementation with Stdio stdout returning the binary stream is a bit confusing, but at least you can wrap it.

The above proposition with an explicit binaryStdout for the lower level uncommon case would be much clearer indeed.

Related issue: command line arguments come from VM system attributes as ByteStrings… and thus interpreted as iso-8859-1, which is incorrect in most cases nowadays, even though it seems to work as long as you only use ASCII. Decoding them is easy enough, but it requires two copies (asByteString utf8Decoded)

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

> On 25 Jul 2018, at 13:39, Damien Pollet <[hidden email]> wrote:
>
> On Tue, 24 Jul 2018 at 11:39, Alistair Grant <[hidden email]> wrote:
> > On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
> So,
>
> Stdio stdout
>
> should return return a character write stream with UTF-8 encoding while
>
> Stdio binaryStdout
>
> should be the lower level binary one.
> This would be more in line with the other streams.
> A non-UTF-8 encoding can be used as per Pavel's example.
>
> +1
>
> I didn't suggest this earlier because it isn't backward compatible. But I do think it is the better solution.
>
> +2
>
> I had a look at Stdio recently for Clap. The current implementation with Stdio stdout returning the binary stream is a bit confusing, but at least you can wrap it.
> The above proposition with an explicit binaryStdout for the lower level uncommon case would be much clearer indeed.
>
> Related issue: command line arguments come from VM system attributes as ByteStrings… and thus interpreted as iso-8859-1, which is incorrect in most cases nowadays, even though it seems to work as long as you only use ASCII. Decoding them is easy enough, but it requires two copies (asByteString utf8Decoded)

Yes this is a really big issue. Anything coming in as command line arg or environment variable (or clipboard) is in a basically unknown OS determined encoding. I would assume/hope the UTF-8 is the sensible default today, but apparently not. And it is hard to find a cross platform solution.

We've had serious issues already with this, like $HOME set to a non-ASCII path that then breaks almost everything.

Damien Pollet-2

Re: Pharo 7 file streams guideline

On Wed, 25 Jul 2018 at 13:48, Sven Van Caekenberghe <[hidden email]> wrote:

> On 25 Jul 2018, at 13:39, Damien Pollet <[hidden email]> wrote:
> Related issue: command line arguments come from VM system attributes as ByteStrings… and thus interpreted as iso-8859-1, which is incorrect in most cases nowadays, even though it seems to work as long as you only use ASCII. Decoding them is easy enough, but it requires two copies (asByteString utf8Decoded)

Yes this is a really big issue. Anything coming in as command line arg or environment variable (or clipboard) is in a basically unknown OS determined encoding. I would assume/hope the UTF-8 is the sensible default today, but apparently not. And it is hard to find a cross platform solution.

My point here was that it would make more sense for those to be passed into the image as ByteArrays, revealing the fact that their encoding is unknown. Currently the bytes are correct, but since they've been shoved into ByteStrings by the VM, the characters will be wrong unless your system happens to be using Latin 1.

I suppose we can either have a setting for decoding (since it's pretty much arbitrary), or heuristics like checking LC_CTYPE or whatever. Pablo mentioned the Locale class, but it doesn't seem to detect anything correct from the environment.

David T. Lewis

Re: Pharo 7 file streams guideline

On Wed, Jul 25, 2018 at 02:20:30PM +0200, Damien Pollet wrote:

> On Wed, 25 Jul 2018 at 13:48, Sven Van Caekenberghe <[hidden email]> wrote:
>
> > > On 25 Jul 2018, at 13:39, Damien Pollet <[hidden email]>
> > wrote:
> > > Related issue: command line arguments come from VM system attributes as
> > ByteStrings??? and thus interpreted as iso-8859-1, which is incorrect in most
> > cases nowadays, even though it seems to work as long as you only use ASCII.
> > Decoding them is easy enough, but it requires two copies (asByteString
> > utf8Decoded)
> >
> > Yes this is a really big issue. Anything coming in as command line arg or
> > environment variable (or clipboard) is in a basically unknown OS determined
> > encoding. I would assume/hope the UTF-8 is the sensible default today, but
> > apparently not. And it is hard to find a cross platform solution.
> >
>
> My point here was that it would make more sense for those to be passed into
> the image as ByteArrays, revealing the fact that their encoding is unknown.
> Currently the bytes are correct, but since they've been shoved into
> ByteStrings by the VM, the characters will be wrong unless your system
> happens to be using Latin 1.

That sounds right to me.

Having said that, there should be no need to change the VM interface to do
this. A ByteString is by definition an array of 8 bit wide characters, and
conversion between ByteString and ByteArray is trivial. Any necessary changes
can be done without touching the VM.

Dave

>
> I suppose we can either have a setting for decoding (since it's pretty much
> arbitrary), or heuristics like checking LC_CTYPE or whatever. Pablo
> mentioned the Locale class, but it doesn't seem to detect anything correct
> from the environment.

Richard O'Keefe

Re: Pharo 7 file streams guideline

In reply to this post by Sven Van Caekenberghe-2

Yes there was no mention of BOMs, but there *was* mention of

#position, and the presence or absence of byte order marks

makes a difference.

As for mailing list discussions over a year, that is not the

kind of single coherent source I was hoping for.

As someone *using* the system classes, I don't give a damn

how big they are.  What I care about is the complexity of

*my* code, and it looks as though the new interface will

make my code bigger, more error-prone, and less portable.

As for the "big hairy classes" sneer, the file stream

classes put together in my system are about the same size

as the Dictionary class proper.

I don't "want to make users set environment variables".

What I was suggesting was that if a user does go to the

trouble of setting relevant environment variables, the

system should have the decency to pay attention to them.

handling /dev/std... is pretty trivial in UNIX; in Windows

nothing about stdin is easy (XP, Vista, 7, 8, and 10 differ

amongst themselves in several annoying ways).

I can make no sense of the comment that interpreting #'text'

as #utf8 (always, as a design choice) or as whatever the

user chose for $LC_CTYPE (always, as a design choice) is

"not very Smalltalk like".  Both design choices are fully

consistent with the standard -- to the limited extent that

UTF8 processing *can* be consistent with the standard.

On 24 July 2018 at 05:40, Sven Van Caekenberghe <[hidden email]> wrote:

> On 23 Jul 2018, at 18:52, Richard O'Keefe <[hidden email]> wrote:
>
> Oh, I think a clarification is needed when talking about UTF-8.

Why ?

> To the best of my knowledge you don't need a Byte-Order-Mark at
> the beginning of a UTF-8 stream because there is no byte order
> issue to result,

Nothing was said about BOMs.

> but apparently many Windows programs like to add one.

Apparently yes.

> Does/will Pharo add one when writing a UTF-8 file?

No

> Does/will it skip one when reading a UTF-8 file?

Yes

> I find the new approach produces unattractive code. I can do
> all of the examples simply in a system that
> (a) implements the ANSI Smalltalk FileStream class methods

Given a FileReference object of some kind you ask for the streams you need. Seems pretty OO if you ask me.

Do you prefer a global class side factory facade ?

Both approaches are not exclusive per se, we just want a clear break/difference (because the resulting streams are not 100% the same).

> (b) supports '/dev/stdin' and '/dev/stdout' as file names

Maybe, with special casing. I like 'Stdio stdout' better as it is more cross platform.

> (c) interprets the external type #'text' either as #'utf8' LC_CTYPE

Maybe, but that is not very Smalltalk like is it ? You want users to set environment variables ? Anyway this is less important point as UTF-8 is (should be) the general default.

> What document should I read to get a mental model of the new
> system and understand its rationale?

ML discussions over the year, I guess.

We are generally against big hairy complex classes and prefer simpler ones.

Sven Van Caekenberghe-2

Re: Pharo 7 file streams guideline

Richard,

I am only engaging in a discussion with you in order to explain what we did and why. The changes that we did were years in the making and are now being pushed into the system. The discussions happened long ago, we are not going to revert them.

Of course you are entitled to have your own opinion and disagree.

I do not know what 'your system' is that you are referring to.

Pharo has a particular philosophy that includes a complex relationship with the concepts of Smalltalk and especially ANSI Smalltalk. In one sentence, we want to have to liberty to make (breaking) changes and don't want to be stuck in the past. At the same time we want our user base to be able to follow by adapting their code.

Regards,

Sven

> On 26 Jul 2018, at 17:06, Richard O'Keefe <[hidden email]> wrote:
>
> Yes there was no mention of BOMs, but there *was* mention of
> #position, and the presence or absence of byte order marks
> makes a difference.
>
> As for mailing list discussions over a year, that is not the
> kind of single coherent source I was hoping for.
>
> As someone *using* the system classes, I don't give a damn
> how big they are. What I care about is the complexity of
> *my* code, and it looks as though the new interface will
> make my code bigger, more error-prone, and less portable.
> As for the "big hairy classes" sneer, the file stream
> classes put together in my system are about the same size
> as the Dictionary class proper.
>
> I don't "want to make users set environment variables".
> What I was suggesting was that if a user does go to the
> trouble of setting relevant environment variables, the
> system should have the decency to pay attention to them.
>
> handling /dev/std... is pretty trivial in UNIX; in Windows
> nothing about stdin is easy (XP, Vista, 7, 8, and 10 differ
> amongst themselves in several annoying ways).
>
> I can make no sense of the comment that interpreting #'text'
> as #utf8 (always, as a design choice) or as whatever the
> user chose for $LC_CTYPE (always, as a design choice) is
> "not very Smalltalk like". Both design choices are fully
> consistent with the standard -- to the limited extent that
> UTF8 processing *can* be consistent with the standard.
>
>
> On 24 July 2018 at 05:40, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> > On 23 Jul 2018, at 18:52, Richard O'Keefe <[hidden email]> wrote:
> >
> > Oh, I think a clarification is needed when talking about UTF-8.
>
> Why ?
>
> > To the best of my knowledge you don't need a Byte-Order-Mark at
> > the beginning of a UTF-8 stream because there is no byte order
> > issue to result,
>
> Nothing was said about BOMs.
>
> > but apparently many Windows programs like to add one.
>
> Apparently yes.
>
> > Does/will Pharo add one when writing a UTF-8 file?
>
> No
>
> > Does/will it skip one when reading a UTF-8 file?
>
> Yes
>
> > I find the new approach produces unattractive code. I can do
> > all of the examples simply in a system that
> > (a) implements the ANSI Smalltalk FileStream class methods
>
> Given a FileReference object of some kind you ask for the streams you need. Seems pretty OO if you ask me.
>
> Do you prefer a global class side factory facade ?
>
> Both approaches are not exclusive per se, we just want a clear break/difference (because the resulting streams are not 100% the same).
>
> > (b) supports '/dev/stdin' and '/dev/stdout' as file names
>
> Maybe, with special casing. I like 'Stdio stdout' better as it is more cross platform.
>
> > (c) interprets the external type #'text' either as #'utf8' LC_CTYPE
>
> Maybe, but that is not very Smalltalk like is it ? You want users to set environment variables ? Anyway this is less important point as UTF-8 is (should be) the general default.
>
> > What document should I read to get a mental model of the new
> > system and understand its rationale?
>
> ML discussions over the year, I guess.
>
> We are generally against big hairy complex classes and prefer simpler ones.
>
>
>

NorbertHartl

Re: Pharo 7 file streams guideline

> Am 26.07.2018 um 17:59 schrieb Sven Van Caekenberghe <[hidden email]>:
>
> Richard,
>
> I am only engaging in a discussion with you in order to explain what we did and why. The changes that we did were years in the making and are now being pushed into the system. The discussions happened long ago, we are not going to revert them.
>
> Of course you are entitled to have your own opinion and disagree.
>
> I do not know what 'your system' is that you are referring to.
>
> Pharo has a particular philosophy that includes a complex relationship with the concepts of Smalltalk and especially ANSI Smalltalk. In one sentence, we want to have to liberty to make (breaking) changes and don't want to be stuck in the past. At the same time we want our user base to be able to follow by adapting their code.
>

+1

Norbert

> Regards,
>
> Sven
>
>> On 26 Jul 2018, at 17:06, Richard O'Keefe <[hidden email]> wrote:
>>
>> Yes there was no mention of BOMs, but there *was* mention of
>> #position, and the presence or absence of byte order marks
>> makes a difference.
>>
>> As for mailing list discussions over a year, that is not the
>> kind of single coherent source I was hoping for.
>>
>> As someone *using* the system classes, I don't give a damn
>> how big they are. What I care about is the complexity of
>> *my* code, and it looks as though the new interface will
>> make my code bigger, more error-prone, and less portable.
>> As for the "big hairy classes" sneer, the file stream
>> classes put together in my system are about the same size
>> as the Dictionary class proper.
>>
>> I don't "want to make users set environment variables".
>> What I was suggesting was that if a user does go to the
>> trouble of setting relevant environment variables, the
>> system should have the decency to pay attention to them.
>>
>> handling /dev/std... is pretty trivial in UNIX; in Windows
>> nothing about stdin is easy (XP, Vista, 7, 8, and 10 differ
>> amongst themselves in several annoying ways).
>>
>> I can make no sense of the comment that interpreting #'text'
>> as #utf8 (always, as a design choice) or as whatever the
>> user chose for $LC_CTYPE (always, as a design choice) is
>> "not very Smalltalk like". Both design choices are fully
>> consistent with the standard -- to the limited extent that
>> UTF8 processing *can* be consistent with the standard.
>>
>>
>> On 24 July 2018 at 05:40, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>>
>>> On 23 Jul 2018, at 18:52, Richard O'Keefe <[hidden email]> wrote:
>>>
>>> Oh, I think a clarification is needed when talking about UTF-8.
>>
>> Why ?
>>
>>> To the best of my knowledge you don't need a Byte-Order-Mark at
>>> the beginning of a UTF-8 stream because there is no byte order
>>> issue to result,
>>
>> Nothing was said about BOMs.
>>
>>> but apparently many Windows programs like to add one.
>>
>> Apparently yes.
>>
>>> Does/will Pharo add one when writing a UTF-8 file?
>>
>> No
>>
>>> Does/will it skip one when reading a UTF-8 file?
>>
>> Yes
>>
>>> I find the new approach produces unattractive code. I can do
>>> all of the examples simply in a system that
>>> (a) implements the ANSI Smalltalk FileStream class methods
>>
>> Given a FileReference object of some kind you ask for the streams you need. Seems pretty OO if you ask me.
>>
>> Do you prefer a global class side factory facade ?
>>
>> Both approaches are not exclusive per se, we just want a clear break/difference (because the resulting streams are not 100% the same).
>>
>>> (b) supports '/dev/stdin' and '/dev/stdout' as file names
>>
>> Maybe, with special casing. I like 'Stdio stdout' better as it is more cross platform.
>>
>>> (c) interprets the external type #'text' either as #'utf8' LC_CTYPE
>>
>> Maybe, but that is not very Smalltalk like is it ? You want users to set environment variables ? Anyway this is less important point as UTF-8 is (should be) the general default.
>>
>>> What document should I read to get a mental model of the new
>>> system and understand its rationale?
>>
>> ML discussions over the year, I guess.
>>
>> We are generally against big hairy complex classes and prefer simpler ones.
>>
>>
>>
>
>

Damien Pollet-2

Re: Pharo 7 file streams guideline

In reply to this post by Sven Van Caekenberghe-2

Hi Sven… a couple questions:

- is there a preferred order of composition between the encoding and buffering streams ? If yes, it the same for read and write stream, or reversed ?

E.g. if Stdio binaryStdin was implemented, Stdio stdin should be decoded, but buffering it as well would be a problem for interactive applications.

- what's your opinion on convenience composition messages, e.g. aBinaryStream buffered decoded: 'utf-8' ?

On Tue, 24 Jul 2018 at 10:13, Sven Van Caekenberghe <[hidden email]> wrote:

> On 23 Jul 2018, at 12:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Stdio stdout and friends just return a binary stream, hence they need wrapping for encoding.
>
> Maybe
>
> Stdio stdoutAsText
>
> might be an idea, but this is so uncommon that I am not sure this is a good idea.

Given all remarks and comments (thanks BTW), I now think that

- textual stdio streams are the more common case
- binary stdio streams are the primitive ones that are seldom used
- another encoding than UTF-8 seems uncommon
- these are streams that exist and need no real opening/closing

So,

Stdio stdout

should return return a character write stream with UTF-8 encoding while

Stdio binaryStdout

should be the lower level binary one.
This would be more in line with the other streams.
A non-UTF-8 encoding can be used as per Pavel's example.