Smalltalk › Squeak › Squeak - Dev

FileStream and TextConverters etc (reposting from another thread)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Göran Krampe

FileStream and TextConverters etc (reposting from another thread)

Hi!

I decided to repost this bit below since I think most people missed it
(it was in the SqueakMap thread):
---------
Btw, yesterday I was staring at the MultiByteFileStream stuff and...
well, IMHO it would have been better *for me* (other users may have
other stories to tell) if the default was binary and not ascii. The
principle of least surprise. If I open a filestream and don't tell it
*anything*, then I would expect it to just feed me the bits and bytes -
as Strings or ByteArrays, but not doing any conversions or line end
mumbo jumbo or any other non expected "nice things". An example of this
is inspecting a file in the file list - I really appreciated the fact
that filelist didn't do *any* conversion on the stuff it showed me - now
it does. And I also wonder where the hex view went... anyway:

Yesterday my collegue wanted to save stuff with platform specific line
endings (wantsLineEndConversion: true etc) but NOT doing any other
conversions. We realized that you can't set converter to nil - it will
lazily set itself to the default platform converter (seems to me at
least). And if you tell the stream to be binary it will not do line end
conversions.

What I ended up doing was creating NullTextConverter (which does no
conversion at all, trivial to write) and then it worked fine. It seems
to me that a
cleaner approach here would be to:

1. Do line end conversions or not regardless of the 2 choices below..
2. Binary or ascii - only decides if we use ByteArrays or Strings,
doesn't concern conversions or line ends.
3. Selection of converter where we also have a NullConverter that does
nothing.

IMHO (having not dissected this in total detail) the above three options
should be combinable. So for example, in our case we have utf8 strings
that we want to write out *as is* and use #cr to get platform specific
line endings.
-------------

I also think that a default FileStream should not do any line end
conversions or conversions at all by default (but still use Strings
instead of ByteArrays). In other words - I would like the "least
surprise" principle to hold. Am I alone in this idea? I love the work of
Yoshiki and friends in this area - I just want to iron out the small
"gotchas" with it.

Now, Yoshiki and all the rest of you - feel free to correct me with the
real facts. :)

regards, Göran

Andreas.Raab

Re: FileStream and TextConverters etc (reposting from another thread)

See http://minnow.cc.gatech.edu/squeak/3342

Cheers,
- Andreas

PS. The above is a way of saying KISS. [Standard|CrLf|MultiByte]
FileStream is *heavily* overloaded with lots of dependent
responsibilities and the best way to deal with these issues top to
bottom is to refactor the dimensions somewhat - for example by making
"text writing" (lf conversion, encodings) an independent dimension from
"byte writing" (file i/o).

[hidden email] wrote:

> Hi!
>
> I decided to repost this bit below since I think most people missed it
> (it was in the SqueakMap thread):
> ---------
> Btw, yesterday I was staring at the MultiByteFileStream stuff and...
> well, IMHO it would have been better *for me* (other users may have
> other stories to tell) if the default was binary and not ascii. The
> principle of least surprise. If I open a filestream and don't tell it
> *anything*, then I would expect it to just feed me the bits and bytes -
> as Strings or ByteArrays, but not doing any conversions or line end
> mumbo jumbo or any other non expected "nice things". An example of this
> is inspecting a file in the file list - I really appreciated the fact
> that filelist didn't do *any* conversion on the stuff it showed me - now
> it does. And I also wonder where the hex view went... anyway:
>
> Yesterday my collegue wanted to save stuff with platform specific line
> endings (wantsLineEndConversion: true etc) but NOT doing any other
> conversions. We realized that you can't set converter to nil - it will
> lazily set itself to the default platform converter (seems to me at
> least). And if you tell the stream to be binary it will not do line end
> conversions.
>
> What I ended up doing was creating NullTextConverter (which does no
> conversion at all, trivial to write) and then it worked fine. It seems
> to me that a
> cleaner approach here would be to:
>
> 1. Do line end conversions or not regardless of the 2 choices below..
> 2. Binary or ascii - only decides if we use ByteArrays or Strings,
> doesn't concern conversions or line ends.
> 3. Selection of converter where we also have a NullConverter that does
> nothing.
>
> IMHO (having not dissected this in total detail) the above three options
> should be combinable. So for example, in our case we have utf8 strings
> that we want to write out *as is* and use #cr to get platform specific
> line endings.
> -------------
>
> I also think that a default FileStream should not do any line end
> conversions or conversions at all by default (but still use Strings
> instead of ByteArrays). In other words - I would like the "least
> surprise" principle to hold. Am I alone in this idea? I love the work of
> Yoshiki and friends in this area - I just want to iron out the small
> "gotchas" with it.
>
> Now, Yoshiki and all the rest of you - feel free to correct me with the
> real facts. :)
>
> regards, Göran
>
>

Yoshiki Ohshima

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Göran Krampe

Göran,

> Btw, yesterday I was staring at the MultiByteFileStream stuff and...
> well, IMHO it would have been better *for me* (other users may have
> other stories to tell) if the default was binary and not ascii. The
> principle of least surprise. If I open a filestream and don't tell it
> *anything*, then I would expect it to just feed me the bits and bytes -
> as Strings or ByteArrays, but not doing any conversions or line end
> mumbo jumbo or any other non expected "nice things". An example of this
> is inspecting a file in the file list - I really appreciated the fact
> that filelist didn't do *any* conversion on the stuff it showed me - now
> it does. And I also wonder where the hex view went... anyway:

Again, "Strings" now include WideStrings, so "no conversion" would not
work for the users of such strings.

> What I ended up doing was creating NullTextConverter (which does no
> conversion at all, trivial to write) and then it worked fine.

Sorry about that, but we actually have it.... It is just called
Latin1TextConverter. (There was some argument for intentional
revealing names and we were almost about to add a empty subclass of
Latin1TextConverter, but we didn't get around it.)

> It seems
> to me that a
> cleaner approach here would be to:
>
> 1. Do line end conversions or not regardless of the 2 choices below..
> 2. Binary or ascii - only decides if we use ByteArrays or Strings,
> doesn't concern conversions or line ends.
> 3. Selection of converter where we also have a NullConverter that does
> nothing.
>
> IMHO (having not dissected this in total detail) the above three options
> should be combinable. So for example, in our case we have utf8 strings
> that we want to write out *as is* and use #cr to get platform specific
> line endings.

Mostly I agree, as we do have almost independent choice of 1. and
2., as well as NullConverter under the name of Latin1TextConverter.

But, isn't the combination of #binary and a line end conversion confusing?

> I also think that a default FileStream should not do any line end
> conversions or conversions at all by default (but still use Strings
> instead of ByteArrays). In other words - I would like the "least
> surprise" principle to hold. Am I alone in this idea? I love the work of
> Yoshiki and friends in this area - I just want to iron out the small
> "gotchas" with it.
>
> Now, Yoshiki and all the rest of you - feel free to correct me with the
> real facts. :)

I wrote a reply to you on this regard last week. For the least
surprise principle, I would say using UTF8 conversion for text would
make sense.

And, as Andreas wrote, the best thing is to separate the concerns.
If somebody manages to separate the fileOut and fileIn aspect from
FileStream (there were discussions to move to XML-based external
format...), it would be a great advance in that front.

-- Yoshiki

Yoshiki Ohshima

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Göran Krampe

Göran,

In 3.9a 7021 and 3.8-6665, we have "view as hex" item in the file
list content pane menu. Also, we have "view as encoded text" to
choose different encoding.

But, I agree, at least the file list has to do sensible stuff
because people may edit the content and save it. The default encoding
for the file list content pane should be non-conversion, as there can
be some binary files (or can get it from a preference). And once the
user sets to certain encoding for a file, the file somehow remember it
(perhaps until the file list instance is closed) and use the encoding
for subsequent actions.

Also, it might make more sense that setting the converter to nil
sets it to Latin1TextConverter (NullTextConverter), as you expected.

-- Yoshiki

Göran Krampe

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Andreas.Raab

Hi!

Andreas Raab <[hidden email]> wrote:
> See http://minnow.cc.gatech.edu/squeak/3342

Ah. Right, had forgotten about that. Nice.

> Cheers,
> - Andreas
>
> PS. The above is a way of saying KISS. [Standard|CrLf|MultiByte]
> FileStream is *heavily* overloaded with lots of dependent
> responsibilities and the best way to deal with these issues top to
> bottom is to refactor the dimensions somewhat - for example by making
> "text writing" (lf conversion, encodings) an independent dimension from
> "byte writing" (file i/o).

Sounds right to me. So essentially I assume you are saying we should
ideally "clean" the FileStream tree so that it doesn't overlap with
TextFile?

regards, Göran

Göran Krampe

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Yoshiki Ohshima

Hi Yoshiki!

Yoshiki Ohshima <[hidden email]> wrote:

> Göran,
>
> > Btw, yesterday I was staring at the MultiByteFileStream stuff and...
> > well, IMHO it would have been better *for me* (other users may have
> > other stories to tell) if the default was binary and not ascii. The
> > principle of least surprise. If I open a filestream and don't tell it
> > *anything*, then I would expect it to just feed me the bits and bytes -
> > as Strings or ByteArrays, but not doing any conversions or line end
> > mumbo jumbo or any other non expected "nice things". An example of this
> > is inspecting a file in the file list - I really appreciated the fact
> > that filelist didn't do *any* conversion on the stuff it showed me - now
> > it does. And I also wonder where the hex view went... anyway:
>
> Again, "Strings" now include WideStrings, so "no conversion" would not
> work for the users of such strings.

Hmmmm. Looking at defs of instvar converter in MultiByteFileStream I
can't say it looks fully "sound" (in a 7022 image).

First of all it is lazily set in #converter using logic that looks
"right" to me - it ends up with the Latin1TextConverter for me, which
you below explain indeed is the Null converter (and yes, looking at it,
it sure is - not sure why you do the "(Character value: aCharacter
charCode)" though).

But then #open:forWrite: has logic setting it to MacRoman or Utf8 - and
#reset sets it lazily to utf8. I don't get it. :)

> > What I ended up doing was creating NullTextConverter (which does no
> > conversion at all, trivial to write) and then it worked fine.
>
> Sorry about that, but we actually have it.... It is just called
> Latin1TextConverter. (There was some argument for intentional
> revealing names and we were almost about to add a empty subclass of
> Latin1TextConverter, but we didn't get around it.)

As always a slightly more "revealing" class comment would have saved me.
:) It doesn't mention that the internal encoding these days actually is
iso8859-1 (right?) and that this converter does no conversion at all.

> > It seems
> > to me that a
> > cleaner approach here would be to:
> >
> > 1. Do line end conversions or not regardless of the 2 choices below..
> > 2. Binary or ascii - only decides if we use ByteArrays or Strings,
> > doesn't concern conversions or line ends.
> > 3. Selection of converter where we also have a NullConverter that does
> > nothing.
> >
> > IMHO (having not dissected this in total detail) the above three options
> > should be combinable. So for example, in our case we have utf8 strings
> > that we want to write out *as is* and use #cr to get platform specific
> > line endings.
>
> Mostly I agree, as we do have almost independent choice of 1. and
> 2., as well as NullConverter under the name of Latin1TextConverter.
>
> But, isn't the combination of #binary and a line end conversion confusing?

Possibly. :) Given that I can trust the stream to not muck about with
the strings I feed it (which Latin1TextConverter indeed seems to make
sure) then sure, perhaps we could say that binary means no line end
conversions at all.

Ehm, btw, does this mean that the only way to make it do no conversions
and still operate using Strings and not ByteArrays is to use the latin1
converter which operates one character at a time? Because that is way
too slow.

And yes, Andreas is probably right that most of all these issues should
better be dealt with in the TextFiles package.

> > I also think that a default FileStream should not do any line end
> > conversions or conversions at all by default (but still use Strings
> > instead of ByteArrays). In other words - I would like the "least
> > surprise" principle to hold. Am I alone in this idea? I love the work of
> > Yoshiki and friends in this area - I just want to iron out the small
> > "gotchas" with it.
> >
> > Now, Yoshiki and all the rest of you - feel free to correct me with the
> > real facts. :)
>
> I wrote a reply to you on this regard last week.

Yes, I just thought it was a reply to my first mentioning of this and
not the second, sorry. And I also think most people didn't read that
thread too closely. :)

> For the least
> surprise principle, I would say using UTF8 conversion for text would
> make sense.

Not to me. Least surprise to me is "no conversion" - which indeed the
system default converter would have given me (albeit one char at a
time).

> And, as Andreas wrote, the best thing is to separate the concerns.
> If somebody manages to separate the fileOut and fileIn aspect from
> FileStream (there were discussions to move to XML-based external
> format...), it would be a great advance in that front.
>
> -- Yoshiki

Right.

regards, Göran

Andreas.Raab

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Göran Krampe

[hidden email] wrote:
> Sounds right to me. So essentially I assume you are saying we should
> ideally "clean" the FileStream tree so that it doesn't overlap with
> TextFile?

I'd start out with the new functionality and leave the various file
stream classes alone for the time being. There is no reason break other
code more than necessary since a reasonable text stream can always just
operate on a binary FileStream backend.

Cheers,
- Andreas

Edgar J. De Cleene

Re: FileStream and TextConverters etc (reposting from another thread)

In reply to this post by Göran Krampe

[hidden email] puso en su mail :

> And I also wonder where the hex view went
I complain about this too a while ago...

Very informative mail about String, ByteArrays and other PrematureGrayHair
classes

Congratulations for hard work on SM !!

Edgar

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar