Hi!
I decided to repost this bit below since I think most people missed it (it was in the SqueakMap thread): --------- Btw, yesterday I was staring at the MultiByteFileStream stuff and... well, IMHO it would have been better *for me* (other users may have other stories to tell) if the default was binary and not ascii. The principle of least surprise. If I open a filestream and don't tell it *anything*, then I would expect it to just feed me the bits and bytes - as Strings or ByteArrays, but not doing any conversions or line end mumbo jumbo or any other non expected "nice things". An example of this is inspecting a file in the file list - I really appreciated the fact that filelist didn't do *any* conversion on the stuff it showed me - now it does. And I also wonder where the hex view went... anyway: Yesterday my collegue wanted to save stuff with platform specific line endings (wantsLineEndConversion: true etc) but NOT doing any other conversions. We realized that you can't set converter to nil - it will lazily set itself to the default platform converter (seems to me at least). And if you tell the stream to be binary it will not do line end conversions. What I ended up doing was creating NullTextConverter (which does no conversion at all, trivial to write) and then it worked fine. It seems to me that a cleaner approach here would be to: 1. Do line end conversions or not regardless of the 2 choices below.. 2. Binary or ascii - only decides if we use ByteArrays or Strings, doesn't concern conversions or line ends. 3. Selection of converter where we also have a NullConverter that does nothing. IMHO (having not dissected this in total detail) the above three options should be combinable. So for example, in our case we have utf8 strings that we want to write out *as is* and use #cr to get platform specific line endings. ------------- I also think that a default FileStream should not do any line end conversions or conversions at all by default (but still use Strings instead of ByteArrays). In other words - I would like the "least surprise" principle to hold. Am I alone in this idea? I love the work of Yoshiki and friends in this area - I just want to iron out the small "gotchas" with it. Now, Yoshiki and all the rest of you - feel free to correct me with the real facts. :) regards, Göran |
See http://minnow.cc.gatech.edu/squeak/3342
Cheers, - Andreas PS. The above is a way of saying KISS. [Standard|CrLf|MultiByte] FileStream is *heavily* overloaded with lots of dependent responsibilities and the best way to deal with these issues top to bottom is to refactor the dimensions somewhat - for example by making "text writing" (lf conversion, encodings) an independent dimension from "byte writing" (file i/o). [hidden email] wrote: > Hi! > > I decided to repost this bit below since I think most people missed it > (it was in the SqueakMap thread): > --------- > Btw, yesterday I was staring at the MultiByteFileStream stuff and... > well, IMHO it would have been better *for me* (other users may have > other stories to tell) if the default was binary and not ascii. The > principle of least surprise. If I open a filestream and don't tell it > *anything*, then I would expect it to just feed me the bits and bytes - > as Strings or ByteArrays, but not doing any conversions or line end > mumbo jumbo or any other non expected "nice things". An example of this > is inspecting a file in the file list - I really appreciated the fact > that filelist didn't do *any* conversion on the stuff it showed me - now > it does. And I also wonder where the hex view went... anyway: > > Yesterday my collegue wanted to save stuff with platform specific line > endings (wantsLineEndConversion: true etc) but NOT doing any other > conversions. We realized that you can't set converter to nil - it will > lazily set itself to the default platform converter (seems to me at > least). And if you tell the stream to be binary it will not do line end > conversions. > > What I ended up doing was creating NullTextConverter (which does no > conversion at all, trivial to write) and then it worked fine. It seems > to me that a > cleaner approach here would be to: > > 1. Do line end conversions or not regardless of the 2 choices below.. > 2. Binary or ascii - only decides if we use ByteArrays or Strings, > doesn't concern conversions or line ends. > 3. Selection of converter where we also have a NullConverter that does > nothing. > > IMHO (having not dissected this in total detail) the above three options > should be combinable. So for example, in our case we have utf8 strings > that we want to write out *as is* and use #cr to get platform specific > line endings. > ------------- > > I also think that a default FileStream should not do any line end > conversions or conversions at all by default (but still use Strings > instead of ByteArrays). In other words - I would like the "least > surprise" principle to hold. Am I alone in this idea? I love the work of > Yoshiki and friends in this area - I just want to iron out the small > "gotchas" with it. > > Now, Yoshiki and all the rest of you - feel free to correct me with the > real facts. :) > > regards, Göran > > |
In reply to this post by Göran Krampe
Göran,
> Btw, yesterday I was staring at the MultiByteFileStream stuff and... > well, IMHO it would have been better *for me* (other users may have > other stories to tell) if the default was binary and not ascii. The > principle of least surprise. If I open a filestream and don't tell it > *anything*, then I would expect it to just feed me the bits and bytes - > as Strings or ByteArrays, but not doing any conversions or line end > mumbo jumbo or any other non expected "nice things". An example of this > is inspecting a file in the file list - I really appreciated the fact > that filelist didn't do *any* conversion on the stuff it showed me - now > it does. And I also wonder where the hex view went... anyway: Again, "Strings" now include WideStrings, so "no conversion" would not work for the users of such strings. > What I ended up doing was creating NullTextConverter (which does no > conversion at all, trivial to write) and then it worked fine. Sorry about that, but we actually have it.... It is just called Latin1TextConverter. (There was some argument for intentional revealing names and we were almost about to add a empty subclass of Latin1TextConverter, but we didn't get around it.) > It seems > to me that a > cleaner approach here would be to: > > 1. Do line end conversions or not regardless of the 2 choices below.. > 2. Binary or ascii - only decides if we use ByteArrays or Strings, > doesn't concern conversions or line ends. > 3. Selection of converter where we also have a NullConverter that does > nothing. > > IMHO (having not dissected this in total detail) the above three options > should be combinable. So for example, in our case we have utf8 strings > that we want to write out *as is* and use #cr to get platform specific > line endings. Mostly I agree, as we do have almost independent choice of 1. and 2., as well as NullConverter under the name of Latin1TextConverter. But, isn't the combination of #binary and a line end conversion confusing? > I also think that a default FileStream should not do any line end > conversions or conversions at all by default (but still use Strings > instead of ByteArrays). In other words - I would like the "least > surprise" principle to hold. Am I alone in this idea? I love the work of > Yoshiki and friends in this area - I just want to iron out the small > "gotchas" with it. > > Now, Yoshiki and all the rest of you - feel free to correct me with the > real facts. :) I wrote a reply to you on this regard last week. For the least surprise principle, I would say using UTF8 conversion for text would make sense. And, as Andreas wrote, the best thing is to separate the concerns. If somebody manages to separate the fileOut and fileIn aspect from FileStream (there were discussions to move to XML-based external format...), it would be a great advance in that front. -- Yoshiki |
In reply to this post by Göran Krampe
Göran,
> Btw, yesterday I was staring at the MultiByteFileStream stuff and... > well, IMHO it would have been better *for me* (other users may have > other stories to tell) if the default was binary and not ascii. The > principle of least surprise. If I open a filestream and don't tell it > *anything*, then I would expect it to just feed me the bits and bytes - > as Strings or ByteArrays, but not doing any conversions or line end > mumbo jumbo or any other non expected "nice things". An example of this > is inspecting a file in the file list - I really appreciated the fact > that filelist didn't do *any* conversion on the stuff it showed me - now > it does. And I also wonder where the hex view went... anyway: In 3.9a 7021 and 3.8-6665, we have "view as hex" item in the file list content pane menu. Also, we have "view as encoded text" to choose different encoding. But, I agree, at least the file list has to do sensible stuff because people may edit the content and save it. The default encoding for the file list content pane should be non-conversion, as there can be some binary files (or can get it from a preference). And once the user sets to certain encoding for a file, the file somehow remember it (perhaps until the file list instance is closed) and use the encoding for subsequent actions. Also, it might make more sense that setting the converter to nil sets it to Latin1TextConverter (NullTextConverter), as you expected. -- Yoshiki |
In reply to this post by Andreas.Raab
Hi!
Andreas Raab <[hidden email]> wrote: > See http://minnow.cc.gatech.edu/squeak/3342 Ah. Right, had forgotten about that. Nice. > Cheers, > - Andreas > > PS. The above is a way of saying KISS. [Standard|CrLf|MultiByte] > FileStream is *heavily* overloaded with lots of dependent > responsibilities and the best way to deal with these issues top to > bottom is to refactor the dimensions somewhat - for example by making > "text writing" (lf conversion, encodings) an independent dimension from > "byte writing" (file i/o). Sounds right to me. So essentially I assume you are saying we should ideally "clean" the FileStream tree so that it doesn't overlap with TextFile? regards, Göran |
In reply to this post by Yoshiki Ohshima
Hi Yoshiki!
Yoshiki Ohshima <[hidden email]> wrote: > Göran, > > > Btw, yesterday I was staring at the MultiByteFileStream stuff and... > > well, IMHO it would have been better *for me* (other users may have > > other stories to tell) if the default was binary and not ascii. The > > principle of least surprise. If I open a filestream and don't tell it > > *anything*, then I would expect it to just feed me the bits and bytes - > > as Strings or ByteArrays, but not doing any conversions or line end > > mumbo jumbo or any other non expected "nice things". An example of this > > is inspecting a file in the file list - I really appreciated the fact > > that filelist didn't do *any* conversion on the stuff it showed me - now > > it does. And I also wonder where the hex view went... anyway: > > Again, "Strings" now include WideStrings, so "no conversion" would not > work for the users of such strings. Hmmmm. Looking at defs of instvar converter in MultiByteFileStream I can't say it looks fully "sound" (in a 7022 image). First of all it is lazily set in #converter using logic that looks "right" to me - it ends up with the Latin1TextConverter for me, which you below explain indeed is the Null converter (and yes, looking at it, it sure is - not sure why you do the "(Character value: aCharacter charCode)" though). But then #open:forWrite: has logic setting it to MacRoman or Utf8 - and #reset sets it lazily to utf8. I don't get it. :) > > What I ended up doing was creating NullTextConverter (which does no > > conversion at all, trivial to write) and then it worked fine. > > Sorry about that, but we actually have it.... It is just called > Latin1TextConverter. (There was some argument for intentional > revealing names and we were almost about to add a empty subclass of > Latin1TextConverter, but we didn't get around it.) As always a slightly more "revealing" class comment would have saved me. :) It doesn't mention that the internal encoding these days actually is iso8859-1 (right?) and that this converter does no conversion at all. > > It seems > > to me that a > > cleaner approach here would be to: > > > > 1. Do line end conversions or not regardless of the 2 choices below.. > > 2. Binary or ascii - only decides if we use ByteArrays or Strings, > > doesn't concern conversions or line ends. > > 3. Selection of converter where we also have a NullConverter that does > > nothing. > > > > IMHO (having not dissected this in total detail) the above three options > > should be combinable. So for example, in our case we have utf8 strings > > that we want to write out *as is* and use #cr to get platform specific > > line endings. > > Mostly I agree, as we do have almost independent choice of 1. and > 2., as well as NullConverter under the name of Latin1TextConverter. > > But, isn't the combination of #binary and a line end conversion confusing? Possibly. :) Given that I can trust the stream to not muck about with the strings I feed it (which Latin1TextConverter indeed seems to make sure) then sure, perhaps we could say that binary means no line end conversions at all. Ehm, btw, does this mean that the only way to make it do no conversions and still operate using Strings and not ByteArrays is to use the latin1 converter which operates one character at a time? Because that is way too slow. And yes, Andreas is probably right that most of all these issues should better be dealt with in the TextFiles package. > > I also think that a default FileStream should not do any line end > > conversions or conversions at all by default (but still use Strings > > instead of ByteArrays). In other words - I would like the "least > > surprise" principle to hold. Am I alone in this idea? I love the work of > > Yoshiki and friends in this area - I just want to iron out the small > > "gotchas" with it. > > > > Now, Yoshiki and all the rest of you - feel free to correct me with the > > real facts. :) > > I wrote a reply to you on this regard last week. Yes, I just thought it was a reply to my first mentioning of this and not the second, sorry. And I also think most people didn't read that thread too closely. :) > For the least > surprise principle, I would say using UTF8 conversion for text would > make sense. Not to me. Least surprise to me is "no conversion" - which indeed the system default converter would have given me (albeit one char at a time). > And, as Andreas wrote, the best thing is to separate the concerns. > If somebody manages to separate the fileOut and fileIn aspect from > FileStream (there were discussions to move to XML-based external > format...), it would be a great advance in that front. > > -- Yoshiki Right. regards, Göran |
In reply to this post by Göran Krampe
[hidden email] wrote:
> Sounds right to me. So essentially I assume you are saying we should > ideally "clean" the FileStream tree so that it doesn't overlap with > TextFile? I'd start out with the new functionality and leave the various file stream classes alone for the time being. There is no reason break other code more than necessary since a reasonable text stream can always just operate on a binary FileStream backend. Cheers, - Andreas |
In reply to this post by Göran Krampe
[hidden email] puso en su mail :
> And I also wonder where the hex view went I complain about this too a while ago... Very informative mail about String, ByteArrays and other PrematureGrayHair classes Congratulations for hard work on SM !! Edgar __________________________________________________ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar |
Free forum by Nabble | Edit this page |