Re: problems with line separators in Linux (Nicolas Cellier)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux (Nicolas Cellier)

Ralph Boland
...
> 10) This 6.a) strategy could eventually replace 2.a), but it does not
> have to, and we didn't went this way...
> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > always
> has been with this respect.

Except that now the conversion of Lf in Linux files to  Cr in Squeak no longer
occurs and this breaks things such as Menu labels.  Thus things that used
to work now don't.

> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
> applications are line-ending sensitive, then WE must care of producing
> the expected convention.

> Conclusions

> So my opinion is that 6.a) did not make our life worse.
> On the contrary, Squeak and Pharo are moving toward what I would call
> a better behaved I.T. world citizen.
> They now offers an API to handle line-endings transparently inside the image.
> This is at the price of not-so-much complexity, and no noticeable slow down.
> But now we have to learn new idioms (and I don't see nextLine as more
> complex than upTo: Character cr)...
> ... and apply it were due (like parsing menu specs) to obtain a
> homogeneous behaviour- goal 3)

> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
> be achieved.
> And maybe in the future, we will be able to get rid of 2.b) too when
> all applications will be line-ending-insensitive.
> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
> avoid LF leaking in or CR leaking out the image.
> But untill 2) strategy is perfect, then we just act as one of the bad
> world citizen perpetuating line-ending problems.
> IMO reaching goal 3) is easier than reaching goal 2).

> That's only my personal opinion, but it's based on pragmatic years of
> using bad line-ending behaved apps and trying to program a bit better
> ones.

> There are alternate possible strategies, like in CUIS: display a boxed
> [LF] explicitely in text editors so as to provide visual control to
> programmers...

> Not sure I sold my POV. It's quite opposite to your proposition.
> You don't have to adhere, but at least you have some rationale.

I consider getting  2a) and 2b) both quite important to work and
much more important than  6).  I suggest getting  2) to work first
and then worry about 6).

Also, beware with 6) that you don't want to fileIn a file from Linux
(or other operating system) and then fileOut the same file
only to find that a diff or cmp of the original and new versions of
the file reports
that they are different.  Similarly you don't want to fileOut a
method/Class/etc.
and then file it in again only to find that any diff like utility now
reports that the
original and new version are different because the  Cr-Lf  representation
of line separators has changed.
Of course, If you do 6) properly, it will make this problem less
likely rather than
more.

Beware too that any files, not necessarily .st files or Squeak files
that use both
Cr and Lf with distinct meanings will expect both to be loaded or
filed out without
any conversion of either character.

> Cheers

> Nicolas

How can I be cheerful when after 6 years of using Squeak I am still running
into these problem!  :-(  :-)

Regards,

Ralph Boland

Reply | Threaded
Open this post in threaded view
|

Re: problems with line separators in Linux (Nicolas Cellier)

Nicolas Cellier
2010/6/11 Ralph Boland <[hidden email]>:

> ...
>> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> have to, and we didn't went this way...
>> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > always
>> has been with this respect.
>
> Except that now the conversion of Lf in Linux files to  Cr in Squeak no longer
> occurs and this breaks things such as Menu labels.  Thus things that used
> to work now don't.
>

I don't see what change could cause this problem...
The recent commit should solve the menu problem in presence of LF leakage.

>> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
>> applications are line-ending sensitive, then WE must care of producing
>> the expected convention.
>
>> Conclusions
>
>> So my opinion is that 6.a) did not make our life worse.
>> On the contrary, Squeak and Pharo are moving toward what I would call
>> a better behaved I.T. world citizen.
>> They now offers an API to handle line-endings transparently inside the image.
>> This is at the price of not-so-much complexity, and no noticeable slow down.
>> But now we have to learn new idioms (and I don't see nextLine as more
>> complex than upTo: Character cr)...
>> ... and apply it were due (like parsing menu specs) to obtain a
>> homogeneous behaviour- goal 3)
>
>> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
>> be achieved.
>> And maybe in the future, we will be able to get rid of 2.b) too when
>> all applications will be line-ending-insensitive.
>> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
>> avoid LF leaking in or CR leaking out the image.
>> But untill 2) strategy is perfect, then we just act as one of the bad
>> world citizen perpetuating line-ending problems.
>> IMO reaching goal 3) is easier than reaching goal 2).
>
>> That's only my personal opinion, but it's based on pragmatic years of
>> using bad line-ending behaved apps and trying to program a bit better
>> ones.
>
>> There are alternate possible strategies, like in CUIS: display a boxed
>> [LF] explicitely in text editors so as to provide visual control to
>> programmers...
>
>> Not sure I sold my POV. It's quite opposite to your proposition.
>> You don't have to adhere, but at least you have some rationale.
>
> I consider getting  2a) and 2b) both quite important to work and
> much more important than  6).  I suggest getting  2) to work first
> and then worry about 6).
>

Yes, they are important !
But I bet making Squeak immune to line ending in parallel is an easier task.
This is because we are in control of in image behaviour, but not that
much of external world standards.

I see the tasks more as parallel.

> Also, beware with 6) that you don't want to fileIn a file from Linux
> (or other operating system) and then fileOut the same file
> only to find that a diff or cmp of the original and new versions of
> the file reports
> that they are different.  Similarly you don't want to fileOut a
> method/Class/etc.
> and then file it in again only to find that any diff like utility now
> reports that the
> original and new version are different because the  Cr-Lf  representation
> of line separators has changed.
> Of course, If you do 6) properly, it will make this problem less
> likely rather than
> more.
>

Sure, I even got problems because a LF was missing in last line,
abusing some unix tools...
Making Squeak immune to line ending conventions does not solve this
problem at all.
Also note that Smalltalk code is still using #cr to produce line
breaks in many places inside the image. I don't think that would be
easily changed, and not sure at all we should take this path.
So we have to care of platform conventions/network conventions at
least on output and this is supposed to be handled by
MultiByteFileStream.

Alternatively, you can also secure your tool chain using proper filters
That's not that uneasy in unix (dos2unix, tr '\r' '\n', ...).
Also inquire whether diff does not have the right options to deal with
foreign line endings...

> Beware too that any files, not necessarily .st files or Squeak files
> that use both
> Cr and Lf with distinct meanings will expect both to be loaded or
> filed out without
> any conversion of either character.
>

I'm not aware of any such case. I presume this is rare.
Since this is application dependent, it cannot be Smalltalk job to
make a wrong guess based on host platform.
Instead it relies entirely on progarmmer's knowledge.
All we can do is providing facility for handling transparent line
ending, and that currently is available thru MultiByteFileStream API.
Also the #crlf and #lf facilities were added recently where #cr was
implemented, so the programmer can be in control.

>> Cheers
>
>> Nicolas
>
> How can I be cheerful when after 6 years of using Squeak I am still running
> into these problem!  :-(  :-)
>

:)
Sure, it's a shame !
But you know things are getting more complex because network
conventions are not necessarily that of our local file system/local OS
tools.
So basing the whole strategy on host platform guess can't work...
My bet is that more and more tools will be able to handle mixed conventions.

> Regards,
>
> Ralph Boland
>
>