Fixing CRLF issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing CRLF issues

K. K. Subramaniam
Hi,

I bumped into CRLF pollution in SqueakV39.sources file and also saw the bug
report
http://bugs.squeak.org/view.php?id=6173
by Andrew.

IMHO, line endings is platform-specific (not filesystem specific). A fileout
on a FAT32 filesystem on a Linux box, should put LF in the file not CRLF.

Reader should be prepared to handle any convention while writers should be
using platform-native convention for text files.

Shouldn't this be handled in a primitive or a System Attribute? Attempting to
guess line ending in Squeak code sounds regressive.

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: Fixing CRLF issues

Prof. Andrew P. Black
Subbu, 

I think that you are right when you say "line endings is platform-specific (not filesystem specific). A fileout on a FAT32 filesystem on a Linux box, should put LF in the file not CRLF."

My proposed fix would have made line-endings system specific for Mac, but not for Linux.  Hard to test this one unless you have all the systems.  Please embellish my fix so that it is right for Linux, and add to the bug report.

Reader should be prepared to handle any convention while writers should be 
using platform-native convention for text files.

That is in fact what the code does.  For existing files, the CRLF code will detect what is there.  The CRLF platform-specific detection applies only to new files, and only to CRLF text files.

Shouldn't this be handled in a primitive or a System Attribute? Attempting to 
guess line ending in Squeak code sounds regressive.

I don't see why there is need for a primitive, when a couple of lines of Smalltalk can do the right thing easily enough.  The only problem seems to be getting these fixes into the release.   The bug database is in danger of becoming a black hole

Andrew

On 7 May 2007, at 10:52, subbukk wrote:

Hi,

I bumped into CRLF pollution in SqueakV39.sources file and also saw the bug 
report
by Andrew.

IMHO, line endings is platform-specific (not filesystem specific). A fileout 
on a FAT32 filesystem on a Linux box, should put LF in the file not CRLF.

Reader should be prepared to handle any convention while writers should be 
using platform-native convention for text files.

Shouldn't this be handled in a primitive or a System Attribute? Attempting to 
guess line ending in Squeak code sounds regressive.

Regards .. Subbu


Andrew P. Black
Department of Computer Science
Portland State University
+1 503 725 2411





Reply | Threaded
Open this post in threaded view
|

Re: Fixing CRLF issues

K. K. Subramaniam
On Tuesday 08 May 2007 12:35 pm, Andrew P. Black wrote:
> Subbu,
>...
> My proposed fix would have made line-endings system specific for Mac,
> but not for Linux.  Hard to test this one unless you have all the
> systems....
I am not sure if putting platform-specific logic in Squeak image is a good
idea. The code soon turns ugly as more platforms get added. Such assumptions
are best encapsulated in VMs.

The confusion with line endings is because there are two text line objects -
say SqueakLine and MachLine. SqueakLine uses '\r' for EOL (end of line)
marker while MachLine EOL marker is specific to the underlying VM. Since
Squeak objects can cross machine boundaries, all lines in Squeak must be
SqueakLines. We could use methods like:
Stream>>isEOL
         "check for EOL marker in current position for the current VM
Stream>>skipEOL
         "skip EOL marker if present. return number of octets skipped"
Stream>>putEOL
         "put EOL marker on a stream"
to deal with MachLines. I wish I knew enough Squeak to put out a patch
file :-(.

> > Reader should be prepared to handle any convention while writers
> > should be
> > using platform-native convention for text files.
>
> That is in fact what the code does.  For existing files, the CRLF
> code will detect what is there.  The CRLF platform-specific detection
> applies only to new files, and only to CRLF text files.
I meant text *lines* and not text *files*. Sorry. Files should be treated
ByteArray and ASCII is one of the interpretation for ByteArray sequences.
ASCII sequences can come embedded in "binary" files too.

> > Shouldn't this be handled in a primitive or a System Attribute?
> > Attempting to
> > guess line ending in Squeak code sounds regressive.
>
> I don't see why there is need for a primitive, when a couple of lines
> of Smalltalk can do the right thing easily enough...
Code like beginsWith('darwin') soon gets to be unmanageable. The file
read/write system primitive should be able to check for EOL or EOF markers in
a much more portable way (like writing to /dev/stdout or /dev/console).

This is just my $0.02,
Subbu