MutliByteFileStream and CrLfFileStream (and windows)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
cbc
Reply | Threaded
Open this post in threaded view
|

MutliByteFileStream and CrLfFileStream (and windows)

cbc
Hi.

so, for a while now, I've been annoyed that the newer Squeak's didn't really handle Windows nicely - at least, not writing to files to handle outside of Squeak.  Today I finally dug into the code to find out why.

Many moons ago, MultiByteFileStream was created as the replacement fo rCrLfFileStream, and most references to CrLfFileStream went away.  Which is even commented in the code(!), as:
It also combined the good old CrLfFileStream.  CrLfFileStream class>>new now returns an instance of MultiByteFileStream.

However, the conversion wasn't complete.  In particular:
FileDirectory>>newFileNamed: (and related things) just get a new MutliByteFileStream, which doesn't activate any of the CrLf lineEnding magic - it assume there is no line ending conversions wanted at all.
In fact, no way to use MutliBytFileStream (except as CrLfFileStream new, from that obsoleted class) really use it (exceptions being FileList and BDFFontReader).  This is annoying on a system that doesn't use cr as the default line endings (are there any left?).

Further, if you do set it up to detect line endings, as far as I can tell, it never actually does any conversions!

Would anyone mind if I finished putting the CrLfFileStream functionality into MultiByteFileStream?  and if I did it, would anyone else use it?

-cbc


Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

Eliot Miranda-2



On Wed, Aug 13, 2014 at 1:57 PM, Chris Cunningham <[hidden email]> wrote:
Hi.

so, for a while now, I've been annoyed that the newer Squeak's didn't really handle Windows nicely - at least, not writing to files to handle outside of Squeak.  Today I finally dug into the code to find out why.

Many moons ago, MultiByteFileStream was created as the replacement fo rCrLfFileStream, and most references to CrLfFileStream went away.  Which is even commented in the code(!), as:
It also combined the good old CrLfFileStream.  CrLfFileStream class>>new now returns an instance of MultiByteFileStream.

However, the conversion wasn't complete.  In particular:
FileDirectory>>newFileNamed: (and related things) just get a new MutliByteFileStream, which doesn't activate any of the CrLf lineEnding magic - it assume there is no line ending conversions wanted at all.
In fact, no way to use MutliBytFileStream (except as CrLfFileStream new, from that obsoleted class) really use it (exceptions being FileList and BDFFontReader).  This is annoying on a system that doesn't use cr as the default line endings (are there any left?).

Further, if you do set it up to detect line endings, as far as I can tell, it never actually does any conversions!

Would anyone mind if I finished putting the CrLfFileStream functionality into MultiByteFileStream?  and if I did it, would anyone else use it?

I would be pleased!  I can't promise to use it, but I /want/ to produce a build server for WIndows that includes producing VM source, building it, etc, and as part of that I would test that sources can be produced *without* CR-LF, but with plain LF line-endings.

--
best,
Eliot


cbc
Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

cbc
Good.  In that case I'll try to be a little more formal in the work, and share it, too.
Can't promise it will be fast, though - off hours only.

-cbc


On Wed, Aug 13, 2014 at 2:02 PM, Eliot Miranda <[hidden email]> wrote:



On Wed, Aug 13, 2014 at 1:57 PM, Chris Cunningham <[hidden email]> wrote:
Hi.

so, for a while now, I've been annoyed that the newer Squeak's didn't really handle Windows nicely - at least, not writing to files to handle outside of Squeak.  Today I finally dug into the code to find out why.

Many moons ago, MultiByteFileStream was created as the replacement fo rCrLfFileStream, and most references to CrLfFileStream went away.  Which is even commented in the code(!), as:
It also combined the good old CrLfFileStream.  CrLfFileStream class>>new now returns an instance of MultiByteFileStream.

However, the conversion wasn't complete.  In particular:
FileDirectory>>newFileNamed: (and related things) just get a new MutliByteFileStream, which doesn't activate any of the CrLf lineEnding magic - it assume there is no line ending conversions wanted at all.
In fact, no way to use MutliBytFileStream (except as CrLfFileStream new, from that obsoleted class) really use it (exceptions being FileList and BDFFontReader).  This is annoying on a system that doesn't use cr as the default line endings (are there any left?).

Further, if you do set it up to detect line endings, as far as I can tell, it never actually does any conversions!

Would anyone mind if I finished putting the CrLfFileStream functionality into MultiByteFileStream?  and if I did it, would anyone else use it?

I would be pleased!  I can't promise to use it, but I /want/ to produce a build server for WIndows that includes producing VM source, building it, etc, and as part of that I would test that sources can be produced *without* CR-LF, but with plain LF line-endings.

--
best,
Eliot






cbc
Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

cbc
hmm.  So, I was going back to find an old version of CrLfFileStream before MutliByteFileStream - that was a long time ago.  While looking, though, I found where at least the writing seems to have stopped doing the conversion:

Name: Multilingual-ul.146
Author: ul
Time: 17 May 2011, 5:31:08.561 pm
UUID: 8c0359ba-6c47-a946-9707-cc7c37dc8d44
Ancestors: Multilingual-ul.145

MultiByteFileStream changes:
- assume that wantsLineEndConversion is properly initialized
- removed the line end conversion code from #nextPut:

Levente, do you happen to remember why you made MultiByteFileStream>>nextPut: not do the line end conversion code 3 years ago?

(And apparently I've been annoyed about this for 3 years - but not enough to actually ask until now - bad me)

-cbc


On Wed, Aug 13, 2014 at 3:30 PM, Chris Cunningham <[hidden email]> wrote:
Good.  In that case I'll try to be a little more formal in the work, and share it, too.
Can't promise it will be fast, though - off hours only.

-cbc


On Wed, Aug 13, 2014 at 2:02 PM, Eliot Miranda <[hidden email]> wrote:



On Wed, Aug 13, 2014 at 1:57 PM, Chris Cunningham <[hidden email]> wrote:
Hi.

so, for a while now, I've been annoyed that the newer Squeak's didn't really handle Windows nicely - at least, not writing to files to handle outside of Squeak.  Today I finally dug into the code to find out why.

Many moons ago, MultiByteFileStream was created as the replacement fo rCrLfFileStream, and most references to CrLfFileStream went away.  Which is even commented in the code(!), as:
It also combined the good old CrLfFileStream.  CrLfFileStream class>>new now returns an instance of MultiByteFileStream.

However, the conversion wasn't complete.  In particular:
FileDirectory>>newFileNamed: (and related things) just get a new MutliByteFileStream, which doesn't activate any of the CrLf lineEnding magic - it assume there is no line ending conversions wanted at all.
In fact, no way to use MutliBytFileStream (except as CrLfFileStream new, from that obsoleted class) really use it (exceptions being FileList and BDFFontReader).  This is annoying on a system that doesn't use cr as the default line endings (are there any left?).

Further, if you do set it up to detect line endings, as far as I can tell, it never actually does any conversions!

Would anyone mind if I finished putting the CrLfFileStream functionality into MultiByteFileStream?  and if I did it, would anyone else use it?

I would be pleased!  I can't promise to use it, but I /want/ to produce a build server for WIndows that includes producing VM source, building it, etc, and as part of that I would test that sources can be produced *without* CR-LF, but with plain LF line-endings.

--
best,
Eliot







Reply | Threaded
Open this post in threaded view
|

MutliByteFileStream and CrLfFileStream (and windows)

Louis LaBrunda
Hi Chris,

I don't know if this will help but VA Smalltalk has a global variable (a
string in a pool dictionary) called "LineDelimiter".  When the image starts
up it checks to see what platform (OS) it is running on and sets it to CrLf
or Lf as appropriate.  This allows developers to use LineDelimiter wherever
they want (where they need to match the platform) and not need to check the
platform all over the place.  Maybe it will make life a little easier for
you.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com


Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

Frank Shearar-3
On 14 August 2014 14:23, Louis LaBrunda <[hidden email]> wrote:
> Hi Chris,
>
> I don't know if this will help but VA Smalltalk has a global variable (a
> string in a pool dictionary) called "LineDelimiter".  When the image starts
> up it checks to see what platform (OS) it is running on and sets it to CrLf
> or Lf as appropriate.  This allows developers to use LineDelimiter wherever
> they want (where they need to match the platform) and not need to check the
> platform all over the place.  Maybe it will make life a little easier for
> you.

Having a global setting's OK, as long as it's possible to override the
setting. That's so that (a) one can write tests that will pass on any
OS and (b) so that when you write to files in a git repository you can
use LF line endings and not have to configure autocrlf conversion (on
Windows machines).

frank

> Lou
> -----------------------------------------------------------
> Louis LaBrunda
> Keystone Software Corp.
> SkypeMe callto://PhotonDemon
> mailto:[hidden email] http://www.Keystone-Software.com
>
>

Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

Levente Uzonyi-2
In reply to this post by cbc
On Wed, 13 Aug 2014, Chris Cunningham wrote:

> hmm.  So, I was going back to find an old version of CrLfFileStream before MutliByteFileStream - that was a long time ago.  While looking, though, I found where at least the writing seems to have stopped
> doing the conversion:
> Name: Multilingual-ul.146
> Author: ul
> Time: 17 May 2011, 5:31:08.561 pm
> UUID: 8c0359ba-6c47-a946-9707-cc7c37dc8d44
> Ancestors: Multilingual-ul.145
>
> MultiByteFileStream changes:
> - assume that wantsLineEndConversion is properly initialized
> - removed the line end conversion code from #nextPut:
>
> Levente, do you happen to remember why you made MultiByteFileStream>>nextPut: not do the line end conversion code 3 years ago?
Because line end conversion is also done by the TextConverter (see
#installLineEndConventionInConverter), so it created double line endings
(cr -> crlf -> crlflf) in some cases, and it was unnecessary anyway.

I just checked if it works, and it does on linux:

| filename result |
filename := UUID new asString36, '.txt'.
FileStream newFileNamed: filename do: [ :file |
  file
  lineEndConvention: #crlf;
  nextPut: Character cr;
  nextPutAll: String cr;
  lineEndConvention: #lf;
  nextPut: Character cr;
  nextPutAll: String cr;
  lineEndConvention: #cr;
  nextPut: Character cr;
  nextPutAll: String cr;
  lineEndConvention: nil;
  nextPut: Character cr;
  nextPutAll: String cr ].
result := FileStream readOnlyFileNamed: filename do: [ :file | file binary upToEnd ].
FileDirectory default deleteFileNamed: filename.
self assert: result = #[13 10 13 10 10 10 13 13 13 13]

Does it work on your machine?


Levente

>
> (And apparently I've been annoyed about this for 3 years - but not enough to actually ask until now - bad me)
>
> -cbc
>
>
> On Wed, Aug 13, 2014 at 3:30 PM, Chris Cunningham <[hidden email]> wrote:
>       Good.  In that case I'll try to be a little more formal in the work, and share it, too.Can't promise it will be fast, though - off hours only.
>
> -cbc
>
>
> On Wed, Aug 13, 2014 at 2:02 PM, Eliot Miranda <[hidden email]> wrote:
>
>
>
>       On Wed, Aug 13, 2014 at 1:57 PM, Chris Cunningham <[hidden email]> wrote:
>             Hi.
> so, for a while now, I've been annoyed that the newer Squeak's didn't really handle Windows nicely - at least, not writing to files to handle outside of Squeak.  Today I finally dug into the
> code to find out why.
>
> Many moons ago, MultiByteFileStream was created as the replacement fo rCrLfFileStream, and most references to CrLfFileStream went away.  Which is even commented in the code(!), as:
> It also combined the good old CrLfFileStream.  CrLfFileStream class>>new now returns an instance of MultiByteFileStream.
>
> However, the conversion wasn't complete.  In particular:
> FileDirectory>>newFileNamed: (and related things) just get a new MutliByteFileStream, which doesn't activate any of the CrLf lineEnding magic - it assume there is no line ending conversions
> wanted at all.
> In fact, no way to use MutliBytFileStream (except as CrLfFileStream new, from that obsoleted class) really use it (exceptions being FileList and BDFFontReader).  This is annoying on a system
> that doesn't use cr as the default line endings (are there any left?).
>
> Further, if you do set it up to detect line endings, as far as I can tell, it never actually does any conversions!
>
> Would anyone mind if I finished putting the CrLfFileStream functionality into MultiByteFileStream?  and if I did it, would anyone else use it?
>
>
> I would be pleased!  I can't promise to use it, but I /want/ to produce a build server for WIndows that includes producing VM source, building it, etc, and as part of that I would test that
> sources can be produced *without* CR-LF, but with plain LF line-endings.
>
> --
> best,Eliot
>
>
>
>
>
>
>

cbc
Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

cbc
In reply to this post by Louis LaBrunda
Hi Louis,

Yes, this is good, and Squeak already has this.  It is accessible with MultiByteFileStream class>>lineEndDefault (or the class variable LineEndDefault) in that class. 

Except that it isn't working - it is hooked up to the startUp routine to determine what it should be, but does not appear to be called.  I'll need to look into that.

-cbc


On Thu, Aug 14, 2014 at 6:23 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Chris,

I don't know if this will help but VA Smalltalk has a global variable (a
string in a pool dictionary) called "LineDelimiter".  When the image starts
up it checks to see what platform (OS) it is running on and sets it to CrLf
or Lf as appropriate.  This allows developers to use LineDelimiter wherever
they want (where they need to match the platform) and not need to check the
platform all over the place.  Maybe it will make life a little easier for
you.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com





cbc
Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

cbc
In reply to this post by Levente Uzonyi-2

On Thu, Aug 14, 2014 at 8:30 AM, Levente Uzonyi <[hidden email]> wrote:
On Wed, 13 Aug 2014, Chris Cunningham wrote:
[snip] 
MultiByteFileStream changes:
- assume that wantsLineEndConversion is properly initialized
- removed the line end conversion code from #nextPut:

Levente, do you happen to remember why you made MultiByteFileStream>>nextPut: not do the line end conversion code 3 years ago?

Because line end conversion is also done by the TextConverter (see
#installLineEndConventionInConverter), so it created double line endings
(cr -> crlf -> crlflf) in some cases, and it was unnecessary anyway.

I just checked if it works, and it does on linux:

| filename result |
filename := UUID new asString36, '.txt'.
FileStream newFileNamed: filename do: [ :file |
        file
                lineEndConvention: #crlf;
                nextPut: Character cr;
                nextPutAll: String cr;
                lineEndConvention: #lf;
                nextPut: Character cr;
                nextPutAll: String cr;
                lineEndConvention: #cr;
                nextPut: Character cr;
                nextPutAll: String cr;
                lineEndConvention: nil;
                nextPut: Character cr;
                nextPutAll: String cr ].
result := FileStream readOnlyFileNamed: filename do: [ :file | file binary upToEnd ].
FileDirectory default deleteFileNamed: filename.
self assert: result = #[13 10 13 10 10 10 13 13 13 13]

Does it work on your machine?

Yes, I get the same results on my system.  It appears to work just fine IF you tell the file each time that you open it what line ending you want. So the mechanism works.

If you let it default, however, it does not conversion at all:

| filename result |
filename := UUID new asString36, '.txt'.
FileStream newFileNamed: filename do: [ :file |
        file
                nextPut: Character cr;
                nextPutAll: String cr ].
result := FileStream readOnlyFileNamed: filename do: [ :file | file binary upToEnd ].
FileDirectory default deleteFileNamed: filename.
self assert: result = #[13 13]

( I assume you get the same, right?)

There seem to be at least three things wrong that I see:
1. Determine the right default line endings by platform isn't happening.  (something is going wrong with the startUp code, at least for MultiByteFileStream).
2. The variable wantsLineEndConversion is not set.
3. The variable lineEndConvention is not set etiher.
I think these need to be fixed.  When these aren't set, it doesn't install anything in the converter.

| filename |
filename := UUID new asString36, '.txt'.
FileStream newFileNamed: filename do: [ :file |
        self 
assert: file converter notNil;
assert: file wantsLineEndConversion;
assert: file lineEndConvention notNil.
].
FileDirectory default deleteFileNamed: filename.

And, Levente, thank you for the resopnse, this helps.

-cbc

Levente



cbc
Reply | Threaded
Open this post in threaded view
|

Re: MutliByteFileStream and CrLfFileStream (and windows)

cbc
On Thu, Aug 14, 2014 at 9:02 AM, Chris Cunningham <[hidden email]> wrote:
[snip] 

There seem to be at least three things wrong that I see:
1. Determine the right default line endings by platform isn't happening.  (something is going wrong with the startUp code, at least for MultiByteFileStream).
Fixed in The Inbox: Files-cbc.137.mcz 
 
2. The variable wantsLineEndConversion is not set.
3. The variable lineEndConvention is not set etiher.