problems with line separators in Linux

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

problems with line separators in Linux

Ralph Boland
Ever since I started using Squeak with Squeak 3.6
 (I only use Linux, currently Ubuntu 9.10)
I have always had trouble with line separators.
I am checking out  Squeak 4.1 and things have
changed though I am not sure if they are any better.

If I have  FileStream>>contreteStream return MultiByteFileStream
(the default) then when I fileOut code the .st file consists of a single
line so that I cannot use utilities such as  wc and vi on these files.
If I modify concreteStream to return  CrLfFileStream then the problem
goes away but a host other of problems occur.

1) It used to be that if you looked at the versions of a method
each version would be written on a single line  (I believe linefeeds
were used instead of carriage returns).
This no longer happens.  Instead every line of a version is separated
by a blank line.  An improvement I suppose; it is more readable.
(I believe what is happening is that end line separator contains
a line feed and a carriage return and both are treated as line
separators.)

2) It used to be that if you wrote out a file (with concreteStream returning
CrLfFileStream) then when you filed in the file using:
  (FileStream oldFileNamed:  'filename.st')  fileIn
you got carriage returns for line separators.
Now you get linefeeds.
This causes problems with Menu labels as in:

aMenu labels:
'find class... (f)
recent classes... (r)
browse all
browse
...

because carriage returns are expected.
Consequentially your Menu has a single entry. :-(
I expect there are other problems as well.
Now, even if you set  concreteStream  back to the default the same
problem occurs.  This works in 3.10.2 so we have gone backwards here.

3) If I cut and paste from a different  4.1 image I lose
my line separators altogether.  this is unchanged from before.

4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
haven't had a problem but I have only loaded 2 .mcz files.
In prior versions of Squeak loading a .mcz file would sometimes succeed.

Are other Linux users having similar experiences?

Frankly, I think only one of  Cr  and  Lf  should be accepted in
Smalltalk code, the
other generating a syntax error except inside strings and inside
strings it should
have to be escaped somehow.

If  Cr is the character chosen for line separators then it should be
impossible to
write:

returnAString
      ^'a two line string where
the line separator is a linefeed'

The fact that the above code is legal leads to subtle errors such
as those above.  A blatant compiler error is preferred.


One final curiousity:  Why is the following method written as it is
(in both 4.1 and 3.10.2)?

Method  CrLfFileStream>>new

        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.


I presume it is correct but a comment explaining why wouldn't hurt.

Regards,

Ralph Boland



--
Quantum Theory cannot save us from the tyranny of a deterministic universe
but it does give God something to do

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux

Nicolas Cellier
Hi Ralph,

Here is a little highlight on the CR/LF strategy.

1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
external world whether we like it or not.

2) Given 1), Smalltalk main strategy - and in particular Squeak - has
always been this one:
2.a) convert every input from external-world to CR,
2.b) convert every output to external world to platform-specific preference.

Historically, this was implemented in CrLfFileStream.
But it has been superseded by MultiByteFileStream.
If you inspect it's API, you'll see it provides both automated
platform guessed or programmable lineEndConvention.

3) If all tools used to develop applications
(Smalltalk/ruby/python/perl/javascript/.Net/etc...)
  did provide APIS making these applications insensitive to line end
conventions,
  then we would be in a better world and would have to care less about
line end conventions.
  This is easier than to impose a uniform line-end convention to the
world, because it enables a smooth transition.

4) Until strategy 2) is perfect and absolutely no LF-in-image
CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
They are sensitive to line-end-conventions and break chains made of
multiple heterogeneous tools.

5) I observed, you observed, everyone observed recurrent deficiencies
in either 2.a) or 2.b) or both...

6) So my logical conclusion is to propose a complementary strategy:
6.a) Let Smalltalk algorithms work pan-line-ending-conventions.

Observe how any decent file editor (notepad vim etc...) works
transparently whatever line-end-conventions.
IMO, it's a shame that the so-called reference Object-Oriented
language cannot deal with mixed line-end-conventions.

7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
to reach goal 3). This is two-fold:
7-a) let display CR-LF or LF or CR as a single line break (changes in
CharacterScanner and co)
7-b) let Stream and String handle CR-LF or LF or CR delimited lines.

Note that I cared to provide decently optimized implementations (often
more optimized than previous CR-only algorithms).

8) Of course, in order to profit by new 7-b) facilities, there's a
little change of API.
We need to replace some old-fashioned idioms (myStream upTo: Character
cr) with modernized pan-line-ending-wise (myStream nextLine).

9) I did not apply these changes very deeply to Squeak nor Pharo, but
at a few places here and there...
So there is still a bit of work to reach goal 3)
(parsing the menus specs is just an example of it)

10) This 6.a) strategy could eventually replace 2.a), but it does not
have to, and we didn't went this way...
So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
has been with this respect.

11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
applications are line-ending sensitive, then WE must care of producing
the expected convention.

Conclusions

So my opinion is that 6.a) did not make our life worse.
On the contrary, Squeak and Pharo are moving toward what I would call
a better behaved I.T. world citizen.
They now offers an API to handle line-endings transparently inside the image.
This is at the price of not-so-much complexity, and no noticeable slow down.
But now we have to learn new idioms (and I don't see nextLine as more
complex than upTo: Character cr)...
... and apply it were due (like parsing menu specs) to obtain a
homogeneous behaviour- goal 3)

We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
be achieved.
And maybe in the future, we will be able to get rid of 2.b) too when
all applications will be line-ending-insensitive.
In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
avoid LF leaking in or CR leaking out the image.
But untill 2) strategy is perfect, then we just act as one of the bad
world citizen perpetuating line-ending problems.
IMO reaching goal 3) is easier than reaching goal 2).

That's only my personal opinion, but it's based on pragmatic years of
using bad line-ending behaved apps and trying to program a bit better
ones.

There are alternate possible strategies, like in CUIS: display a boxed
[LF] explicitely in text editors so as to provide visual control to
programmers...

Not sure I sold my POV. It's quite opposite to your proposition.
You don't have to adhere, but at least you have some rationale.

Cheers

Nicolas

2010/6/11 Ralph Boland <[hidden email]>:

> Ever since I started using Squeak with Squeak 3.6
>  (I only use Linux, currently Ubuntu 9.10)
> I have always had trouble with line separators.
> I am checking out  Squeak 4.1 and things have
> changed though I am not sure if they are any better.
>
> If I have  FileStream>>contreteStream return MultiByteFileStream
> (the default) then when I fileOut code the .st file consists of a single
> line so that I cannot use utilities such as  wc and vi on these files.
> If I modify concreteStream to return  CrLfFileStream then the problem
> goes away but a host other of problems occur.
>
> 1) It used to be that if you looked at the versions of a method
> each version would be written on a single line  (I believe linefeeds
> were used instead of carriage returns).
> This no longer happens.  Instead every line of a version is separated
> by a blank line.  An improvement I suppose; it is more readable.
> (I believe what is happening is that end line separator contains
> a line feed and a carriage return and both are treated as line
> separators.)
>
> 2) It used to be that if you wrote out a file (with concreteStream returning
> CrLfFileStream) then when you filed in the file using:
>  (FileStream oldFileNamed:  'filename.st')  fileIn
> you got carriage returns for line separators.
> Now you get linefeeds.
> This causes problems with Menu labels as in:
>
> aMenu labels:
> 'find class... (f)
> recent classes... (r)
> browse all
> browse
> ...
>
> because carriage returns are expected.
> Consequentially your Menu has a single entry. :-(
> I expect there are other problems as well.
> Now, even if you set  concreteStream  back to the default the same
> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>
> 3) If I cut and paste from a different  4.1 image I lose
> my line separators altogether.  this is unchanged from before.
>
> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
> haven't had a problem but I have only loaded 2 .mcz files.
> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>
> Are other Linux users having similar experiences?
>
> Frankly, I think only one of  Cr  and  Lf  should be accepted in
> Smalltalk code, the
> other generating a syntax error except inside strings and inside
> strings it should
> have to be escaped somehow.
>
> If  Cr is the character chosen for line separators then it should be
> impossible to
> write:
>
> returnAString
>      ^'a two line string where
> the line separator is a linefeed'
>
> The fact that the above code is legal leads to subtle errors such
> as those above.  A blatant compiler error is preferred.
>
>
> One final curiousity:  Why is the following method written as it is
> (in both 4.1 and 3.10.2)?
>
> Method  CrLfFileStream>>new
>
>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>
>
> I presume it is correct but a comment explaining why wouldn't hurt.
>
> Regards,
>
> Ralph Boland
>
>
>
> --
> Quantum Theory cannot save us from the tyranny of a deterministic universe
> but it does give God something to do
>
>

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux

Nicolas Cellier
2010/6/11 Nicolas Cellier <[hidden email]>:

> Hi Ralph,
>
> Here is a little highlight on the CR/LF strategy.
>
> 1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
> external world whether we like it or not.
>
> 2) Given 1), Smalltalk main strategy - and in particular Squeak - has
> always been this one:
> 2.a) convert every input from external-world to CR,
> 2.b) convert every output to external world to platform-specific preference.
>
> Historically, this was implemented in CrLfFileStream.
> But it has been superseded by MultiByteFileStream.
> If you inspect it's API, you'll see it provides both automated
> platform guessed or programmable lineEndConvention.
>
> 3) If all tools used to develop applications
> (Smalltalk/ruby/python/perl/javascript/.Net/etc...)
>  did provide APIS making these applications insensitive to line end
> conventions,
>  then we would be in a better world and would have to care less about
> line end conventions.
>  This is easier than to impose a uniform line-end convention to the
> world, because it enables a smooth transition.
>
> 4) Until strategy 2) is perfect and absolutely no LF-in-image
> CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
> They are sensitive to line-end-conventions and break chains made of
> multiple heterogeneous tools.
>
> 5) I observed, you observed, everyone observed recurrent deficiencies
> in either 2.a) or 2.b) or both...
>
> 6) So my logical conclusion is to propose a complementary strategy:
> 6.a) Let Smalltalk algorithms work pan-line-ending-conventions.
>
> Observe how any decent file editor (notepad vim etc...) works
> transparently whatever line-end-conventions.
> IMO, it's a shame that the so-called reference Object-Oriented
> language cannot deal with mixed line-end-conventions.
>
> 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
> to reach goal 3). This is two-fold:
> 7-a) let display CR-LF or LF or CR as a single line break (changes in
> CharacterScanner and co)
> 7-b) let Stream and String handle CR-LF or LF or CR delimited lines.
>
> Note that I cared to provide decently optimized implementations (often
> more optimized than previous CR-only algorithms).
>
> 8) Of course, in order to profit by new 7-b) facilities, there's a
> little change of API.
> We need to replace some old-fashioned idioms (myStream upTo: Character
> cr) with modernized pan-line-ending-wise (myStream nextLine).
>
> 9) I did not apply these changes very deeply to Squeak nor Pharo, but
> at a few places here and there...
> So there is still a bit of work to reach goal 3)
> (parsing the menus specs is just an example of it)
>
> 10) This 6.a) strategy could eventually replace 2.a), but it does not
> have to, and we didn't went this way...
> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
> has been with this respect.
>
> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
> applications are line-ending sensitive, then WE must care of producing
> the expected convention.
>
> Conclusions
>
> So my opinion is that 6.a) did not make our life worse.
> On the contrary, Squeak and Pharo are moving toward what I would call
> a better behaved I.T. world citizen.
> They now offers an API to handle line-endings transparently inside the image.
> This is at the price of not-so-much complexity, and no noticeable slow down.
> But now we have to learn new idioms (and I don't see nextLine as more
> complex than upTo: Character cr)...
> ... and apply it were due (like parsing menu specs) to obtain a
> homogeneous behaviour- goal 3)
>
> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
> be achieved.

Opps, once 3) will be achieved

> And maybe in the future, we will be able to get rid of 2.b) too when
> all applications will be line-ending-insensitive.
> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
> avoid LF leaking in or CR leaking out the image.
> But untill 2) strategy is perfect, then we just act as one of the bad
> world citizen perpetuating line-ending problems.
> IMO reaching goal 3) is easier than reaching goal 2).
>
> That's only my personal opinion, but it's based on pragmatic years of
> using bad line-ending behaved apps and trying to program a bit better
> ones.
>
> There are alternate possible strategies, like in CUIS: display a boxed
> [LF] explicitely in text editors so as to provide visual control to
> programmers...
>
> Not sure I sold my POV. It's quite opposite to your proposition.
> You don't have to adhere, but at least you have some rationale.
>
> Cheers
>
> Nicolas
>
> 2010/6/11 Ralph Boland <[hidden email]>:
>> Ever since I started using Squeak with Squeak 3.6
>>  (I only use Linux, currently Ubuntu 9.10)
>> I have always had trouble with line separators.
>> I am checking out  Squeak 4.1 and things have
>> changed though I am not sure if they are any better.
>>
>> If I have  FileStream>>contreteStream return MultiByteFileStream
>> (the default) then when I fileOut code the .st file consists of a single
>> line so that I cannot use utilities such as  wc and vi on these files.
>> If I modify concreteStream to return  CrLfFileStream then the problem
>> goes away but a host other of problems occur.
>>
>> 1) It used to be that if you looked at the versions of a method
>> each version would be written on a single line  (I believe linefeeds
>> were used instead of carriage returns).
>> This no longer happens.  Instead every line of a version is separated
>> by a blank line.  An improvement I suppose; it is more readable.
>> (I believe what is happening is that end line separator contains
>> a line feed and a carriage return and both are treated as line
>> separators.)
>>
>> 2) It used to be that if you wrote out a file (with concreteStream returning
>> CrLfFileStream) then when you filed in the file using:
>>  (FileStream oldFileNamed:  'filename.st')  fileIn
>> you got carriage returns for line separators.
>> Now you get linefeeds.
>> This causes problems with Menu labels as in:
>>
>> aMenu labels:
>> 'find class... (f)
>> recent classes... (r)
>> browse all
>> browse
>> ...
>>
>> because carriage returns are expected.
>> Consequentially your Menu has a single entry. :-(
>> I expect there are other problems as well.
>> Now, even if you set  concreteStream  back to the default the same
>> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>>
>> 3) If I cut and paste from a different  4.1 image I lose
>> my line separators altogether.  this is unchanged from before.
>>
>> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
>> haven't had a problem but I have only loaded 2 .mcz files.
>> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>>
>> Are other Linux users having similar experiences?
>>
>> Frankly, I think only one of  Cr  and  Lf  should be accepted in
>> Smalltalk code, the
>> other generating a syntax error except inside strings and inside
>> strings it should
>> have to be escaped somehow.
>>
>> If  Cr is the character chosen for line separators then it should be
>> impossible to
>> write:
>>
>> returnAString
>>      ^'a two line string where
>> the line separator is a linefeed'
>>
>> The fact that the above code is legal leads to subtle errors such
>> as those above.  A blatant compiler error is preferred.
>>
>>
>> One final curiousity:  Why is the following method written as it is
>> (in both 4.1 and 3.10.2)?
>>
>> Method  CrLfFileStream>>new
>>
>>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>>
>>
>> I presume it is correct but a comment explaining why wouldn't hurt.
>>
>> Regards,
>>
>> Ralph Boland
>>
>>
>>
>> --
>> Quantum Theory cannot save us from the tyranny of a deterministic universe
>> but it does give God something to do
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux

Nicolas Cellier
So, I just commited a few trunk changes toward the goal of making
Squeak immune to line endings.
There might be a bit more idioms to fix, but that's certainly not that
difficult.
See also a shorter manifest at
http://code.google.com/p/pharo/issues/detail?id=2538.
That does not prevents us to continue improving the conversion strategy.

I should better stop speaking alone now ;)

Nicolas

2010/6/11 Nicolas Cellier <[hidden email]>:

> 2010/6/11 Nicolas Cellier <[hidden email]>:
>> Hi Ralph,
>>
>> Here is a little highlight on the CR/LF strategy.
>>
>> 1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
>> external world whether we like it or not.
>>
>> 2) Given 1), Smalltalk main strategy - and in particular Squeak - has
>> always been this one:
>> 2.a) convert every input from external-world to CR,
>> 2.b) convert every output to external world to platform-specific preference.
>>
>> Historically, this was implemented in CrLfFileStream.
>> But it has been superseded by MultiByteFileStream.
>> If you inspect it's API, you'll see it provides both automated
>> platform guessed or programmable lineEndConvention.
>>
>> 3) If all tools used to develop applications
>> (Smalltalk/ruby/python/perl/javascript/.Net/etc...)
>>  did provide APIS making these applications insensitive to line end
>> conventions,
>>  then we would be in a better world and would have to care less about
>> line end conventions.
>>  This is easier than to impose a uniform line-end convention to the
>> world, because it enables a smooth transition.
>>
>> 4) Until strategy 2) is perfect and absolutely no LF-in-image
>> CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
>> They are sensitive to line-end-conventions and break chains made of
>> multiple heterogeneous tools.
>>
>> 5) I observed, you observed, everyone observed recurrent deficiencies
>> in either 2.a) or 2.b) or both...
>>
>> 6) So my logical conclusion is to propose a complementary strategy:
>> 6.a) Let Smalltalk algorithms work pan-line-ending-conventions.
>>
>> Observe how any decent file editor (notepad vim etc...) works
>> transparently whatever line-end-conventions.
>> IMO, it's a shame that the so-called reference Object-Oriented
>> language cannot deal with mixed line-end-conventions.
>>
>> 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
>> to reach goal 3). This is two-fold:
>> 7-a) let display CR-LF or LF or CR as a single line break (changes in
>> CharacterScanner and co)
>> 7-b) let Stream and String handle CR-LF or LF or CR delimited lines.
>>
>> Note that I cared to provide decently optimized implementations (often
>> more optimized than previous CR-only algorithms).
>>
>> 8) Of course, in order to profit by new 7-b) facilities, there's a
>> little change of API.
>> We need to replace some old-fashioned idioms (myStream upTo: Character
>> cr) with modernized pan-line-ending-wise (myStream nextLine).
>>
>> 9) I did not apply these changes very deeply to Squeak nor Pharo, but
>> at a few places here and there...
>> So there is still a bit of work to reach goal 3)
>> (parsing the menus specs is just an example of it)
>>
>> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> have to, and we didn't went this way...
>> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
>> has been with this respect.
>>
>> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
>> applications are line-ending sensitive, then WE must care of producing
>> the expected convention.
>>
>> Conclusions
>>
>> So my opinion is that 6.a) did not make our life worse.
>> On the contrary, Squeak and Pharo are moving toward what I would call
>> a better behaved I.T. world citizen.
>> They now offers an API to handle line-endings transparently inside the image.
>> This is at the price of not-so-much complexity, and no noticeable slow down.
>> But now we have to learn new idioms (and I don't see nextLine as more
>> complex than upTo: Character cr)...
>> ... and apply it were due (like parsing menu specs) to obtain a
>> homogeneous behaviour- goal 3)
>>
>> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
>> be achieved.
>
> Opps, once 3) will be achieved
>
>> And maybe in the future, we will be able to get rid of 2.b) too when
>> all applications will be line-ending-insensitive.
>> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
>> avoid LF leaking in or CR leaking out the image.
>> But untill 2) strategy is perfect, then we just act as one of the bad
>> world citizen perpetuating line-ending problems.
>> IMO reaching goal 3) is easier than reaching goal 2).
>>
>> That's only my personal opinion, but it's based on pragmatic years of
>> using bad line-ending behaved apps and trying to program a bit better
>> ones.
>>
>> There are alternate possible strategies, like in CUIS: display a boxed
>> [LF] explicitely in text editors so as to provide visual control to
>> programmers...
>>
>> Not sure I sold my POV. It's quite opposite to your proposition.
>> You don't have to adhere, but at least you have some rationale.
>>
>> Cheers
>>
>> Nicolas
>>
>> 2010/6/11 Ralph Boland <[hidden email]>:
>>> Ever since I started using Squeak with Squeak 3.6
>>>  (I only use Linux, currently Ubuntu 9.10)
>>> I have always had trouble with line separators.
>>> I am checking out  Squeak 4.1 and things have
>>> changed though I am not sure if they are any better.
>>>
>>> If I have  FileStream>>contreteStream return MultiByteFileStream
>>> (the default) then when I fileOut code the .st file consists of a single
>>> line so that I cannot use utilities such as  wc and vi on these files.
>>> If I modify concreteStream to return  CrLfFileStream then the problem
>>> goes away but a host other of problems occur.
>>>
>>> 1) It used to be that if you looked at the versions of a method
>>> each version would be written on a single line  (I believe linefeeds
>>> were used instead of carriage returns).
>>> This no longer happens.  Instead every line of a version is separated
>>> by a blank line.  An improvement I suppose; it is more readable.
>>> (I believe what is happening is that end line separator contains
>>> a line feed and a carriage return and both are treated as line
>>> separators.)
>>>
>>> 2) It used to be that if you wrote out a file (with concreteStream returning
>>> CrLfFileStream) then when you filed in the file using:
>>>  (FileStream oldFileNamed:  'filename.st')  fileIn
>>> you got carriage returns for line separators.
>>> Now you get linefeeds.
>>> This causes problems with Menu labels as in:
>>>
>>> aMenu labels:
>>> 'find class... (f)
>>> recent classes... (r)
>>> browse all
>>> browse
>>> ...
>>>
>>> because carriage returns are expected.
>>> Consequentially your Menu has a single entry. :-(
>>> I expect there are other problems as well.
>>> Now, even if you set  concreteStream  back to the default the same
>>> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>>>
>>> 3) If I cut and paste from a different  4.1 image I lose
>>> my line separators altogether.  this is unchanged from before.
>>>
>>> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
>>> haven't had a problem but I have only loaded 2 .mcz files.
>>> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>>>
>>> Are other Linux users having similar experiences?
>>>
>>> Frankly, I think only one of  Cr  and  Lf  should be accepted in
>>> Smalltalk code, the
>>> other generating a syntax error except inside strings and inside
>>> strings it should
>>> have to be escaped somehow.
>>>
>>> If  Cr is the character chosen for line separators then it should be
>>> impossible to
>>> write:
>>>
>>> returnAString
>>>      ^'a two line string where
>>> the line separator is a linefeed'
>>>
>>> The fact that the above code is legal leads to subtle errors such
>>> as those above.  A blatant compiler error is preferred.
>>>
>>>
>>> One final curiousity:  Why is the following method written as it is
>>> (in both 4.1 and 3.10.2)?
>>>
>>> Method  CrLfFileStream>>new
>>>
>>>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>>>
>>>
>>> I presume it is correct but a comment explaining why wouldn't hurt.
>>>
>>> Regards,
>>>
>>> Ralph Boland
>>>
>>>
>>>
>>> --
>>> Quantum Theory cannot save us from the tyranny of a deterministic universe
>>> but it does give God something to do
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux

Ralph Boland
In reply to this post by Ralph Boland
...
> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not
> >> have to, and we didn't went this way...
> >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always
> >> has been with this respect.
> >
> > Except that now the conversion of Lf in Linux files to  Cr in Squeak no longer
> > occurs and this breaks things such as Menu labels.  Thus things that used
> > to work now don't.
> >

> I don't see what change could cause this problem...


I checked out loading a .st file in both  Squeak 10.2 and Squeak 4.1.
filing in the
following file:

'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010
at 9:30:22 pm'!
Object subclass: #Junk
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'Kernel-Objects'!

!Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'!
junk

        | a |
        a := 'abc
        def
        ghi'.
        self halt.
        a := a.! !

In  Squeak 10.2
  a)  If  concreteStream returns  CrLfFileStream:
       ClassCategoryReader  eventually calls  scanFrom: aStream where  aStream
       is a  MultiByteFileStream and the next chunk of text is:

junk

        | a |
        a := 'abc
        def
        ghi'.
        self halt.
        a := a.! !

At this point 'aStream nextChunkText' is called.
which does:  string := self nextChunk.
nextChunk then does a 'self skipSeparators' and then calls  'self
next' in a loop.

The 'self next'  reads the next character and does a 'self doConversion' test
which returns true so if the character read is a  Lf character it is
 converted into a Cr character.

b)  If  concreteStream returns  MultiByteFileStream:
     The same thing happens except  doConversion returns false and so
      Lf characters are NOT converted into Cr characters.

In  Squeak 4.1 everything is the same up to the point  'aStream
nextChunkText' is called.
nextChunkText calls:

        '^converter nextChunkTextFromStream: self'

where converter is a MacRomanTextConverter.

Following the trail from here looks completely different than the
Squeak 10.2 code.
In particular I could not find where an attempt to convert  Lf
characters to Cr characters
was supposed to occur let alone why it failed if  concreteStream
returns CrLfFileStream.


Note that in  10.2  if   concreteStream returns  MultiByteFileStream:
then  Lf  characters
are NOT converted into  Cr characters.   I would have expected  Lf
characters and
Cr,Lf  character pairs to be converted to  Cr characters regardless of
what  concreteStream
returns.  We do at this point know we are reading  Squeak code so Lfs
are inappropriate.
>From my point of view there is no need for the 'doConversion' test at
all except in strings
where the user may intensionally want  Lf or  Cr,Lf for some odd
reason and we shouldn't
break his/her code.  In that case no conversion should be done under
any circumstances
so the code is wrong both ways:  it fails to convert when it should
and converts when
it shouldn't.

Since I couldn't figure out how 4.1 handles things I can't say if it
does any better.

Hope this explains a few things.

> The recent commit should solve the menu problem in presence of LF leakage.

How do I install the version with this fix rather than the version of  4.1 found
on the Squeak page?

Regards,

Ralph Boland

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux

Nicolas Cellier
2010/6/12 Ralph Boland <[hidden email]>:

> ...
>> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> >> have to, and we didn't went this way...
>> >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always
>> >> has been with this respect.
>> >
>> > Except that now the conversion of Lf in Linux files to  Cr in Squeak no longer
>> > occurs and this breaks things such as Menu labels.  Thus things that used
>> > to work now don't.
>> >
>
>> I don't see what change could cause this problem...
>
>
> I checked out loading a .st file in both  Squeak 10.2 and Squeak 4.1.
> filing in the
> following file:
>
> 'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010
> at 9:30:22 pm'!
> Object subclass: #Junk
>        instanceVariableNames: ''
>        classVariableNames: ''
>        poolDictionaries: ''
>        category: 'Kernel-Objects'!
>
> !Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'!
> junk
>
>        | a |
>        a := 'abc
>        def
>        ghi'.
>        self halt.
>        a := a.! !
>
> In  Squeak 10.2
>  a)  If  concreteStream returns  CrLfFileStream:
>       ClassCategoryReader  eventually calls  scanFrom: aStream where  aStream
>       is a  MultiByteFileStream and the next chunk of text is:
>
> junk
>
>        | a |
>        a := 'abc
>        def
>        ghi'.
>        self halt.
>        a := a.! !
>
> At this point 'aStream nextChunkText' is called.
> which does:  string := self nextChunk.
> nextChunk then does a 'self skipSeparators' and then calls  'self
> next' in a loop.
>
> The 'self next'  reads the next character and does a 'self doConversion' test
> which returns true so if the character read is a  Lf character it is
>  converted into a Cr character.
>
> b)  If  concreteStream returns  MultiByteFileStream:
>     The same thing happens except  doConversion returns false and so
>      Lf characters are NOT converted into Cr characters.
>
> In  Squeak 4.1 everything is the same up to the point  'aStream
> nextChunkText' is called.
> nextChunkText calls:
>
>        '^converter nextChunkTextFromStream: self'
>
> where converter is a MacRomanTextConverter.
>
> Following the trail from here looks completely different than the
> Squeak 10.2 code.
> In particular I could not find where an attempt to convert  Lf
> characters to Cr characters
> was supposed to occur let alone why it failed if  concreteStream
> returns CrLfFileStream.
>
>
> Note that in  10.2  if   concreteStream returns  MultiByteFileStream:
> then  Lf  characters
> are NOT converted into  Cr characters.   I would have expected  Lf
> characters and
> Cr,Lf  character pairs to be converted to  Cr characters regardless of
> what  concreteStream
> returns.  We do at this point know we are reading  Squeak code so Lfs
> are inappropriate.
> >From my point of view there is no need for the 'doConversion' test at
> all except in strings
> where the user may intensionally want  Lf or  Cr,Lf for some odd
> reason and we shouldn't
> break his/her code.  In that case no conversion should be done under
> any circumstances
> so the code is wrong both ways:  it fails to convert when it should
> and converts when
> it shouldn't.
>
> Since I couldn't figure out how 4.1 handles things I can't say if it
> does any better.
>
> Hope this explains a few things.
>
>> The recent commit should solve the menu problem in presence of LF leakage.
>
> How do I install the version with this fix rather than the version of  4.1 found
> on the Squeak page?
>
> Regards,
>
> Ralph Boland
>
>

ABOUT THE IMPLEMENTATION:
------------------------------------------------

The two inst.vars of interest in MultiByteFileStream are
- wantsLineEndConversion
- lineEndConvention

There are two ways to set the line ending convention.
1) myStream wantsLineEndConversion: true.
  if you follow the code, you will discover that:
        lineEndConvention ifNil: [ self detectLineEndConvention ].

  detectLineEndConvention scan the an input file to guess the line
ending convention at first line break
  (this won't work if you have mixed conventions in your file...)

  detectLineEndConvention set the lineEndConvention to LineEndDefault
for output files (if empty)
  This default is guessed at image startup based on underlying OS (see
guessDefaultLineEndConvention).

2) myStream lineEndConvention: aSymbol (#cr #lf #crlf or nil).
  handling of line endings does not happens entirely here though...
        self installLineEndConventionInConverter.
  this leads you to:
        TextConverter>>installLineEndConvention:

This is an optimization.
Since TextConverter already converts characters with proper encoding,
it also handles line ending with no extra cost.

For output, you can see part of the job is performed in
MultiByteFileStream>>nextPut: too...
These handling assume the Smalltalk String is made of CR only... (so
not immune to LF !!!).

Anyway, the MultiByteFileStream is an ugly part of the system, so I
don't encourage you to loose time in its tricky implementation.


ABOUT THE DEFAULT BEHAVIOUR:
-----------------------------------------------------

The first method is invoked when you use CrLfFileStream to create your stream.
This second method is not sent.

So by default, NO conversion is performed, unless you EXPLICITELY use
  - CrLfFileStream
  - or #wantsLineEndConversion: message after stream creation
  - or #lineEndConvention: message

Not sure it was much different in 3.10.2, no time to inquire...

Hope this helps

Nicolas