Re: [Newbies] Display artifacts in comments/desc.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

K. K. Subramaniam
On Monday 30 April 2007 7:12 pm, Edgar J. De Cleene wrote:

> El 4/30/07 10:22 AM, "subbukk" <[hidden email]> escribió:
> > Hi,
> >
> > I see some artifacts (like []) displayed in comments and descriptions in
> > Squeak (3.7-7 vm on Linux, 3.8 image and SqueakV39.sources). I suspect
> > the '\r' line terminator in the *.sources file could be causing it.
> >
> > Is this a bug?
> >
> > TIA .. Subbu
>
> I think what could be eliminated if you do Smalltalk removeAllLineFeeds.
I did this and it reported 44806 methods were stripped of LFs, but I see two
[] glyphs instead of it. The method seems to replace CR with CRLF which makes
it worse.

Strangely,
'hello
world' displayAt: 100@100.

shows up correctly as two line text. So why does the [] appear the browser?

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

K. K. Subramaniam
On Monday 30 April 2007 7:12 pm, Edgar J. De Cleene wrote:

> El 4/30/07 10:22 AM, "subbukk" <[hidden email]> escribió:
> > Hi,
> >
> > I see some artifacts (like []) displayed in comments and descriptions in
> > Squeak (3.7-7 vm on Linux, 3.8 image and SqueakV39.sources). I suspect
> > the '\r' line terminator in the *.sources file could be causing it.
> >
> > Is this a bug?
> >
> > TIA .. Subbu
>
> I think what could be eliminated if you do Smalltalk removeAllLineFeeds.
>
> In Mac , I found this often.

Squeak3.9 handling of CRLF sequences in sources file is defective. Squeak3.8
image correctly strips of LF in CRLF while reading in text from
SqueakV3.sources file. For instance, editStartPage method in Scamper uses
CRLF in SqueakV3.sources file but its Text in codepane strips out LF.There is
a typo in my original request. It should have read "3.9 image and
SqueakV39.sources.

BTW, my original mail should have "3.9 image and SqueakV39.sources". Sorry for
the typo.

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

Bert Freudenberg

On May 7, 2007, at 6:14 , subbukk wrote:

> On Monday 30 April 2007 7:12 pm, Edgar J. De Cleene wrote:
>> El 4/30/07 10:22 AM, "subbukk" <[hidden email]> escribió:
>>> Hi,
>>>
>>> I see some artifacts (like []) displayed in comments and  
>>> descriptions in
>>> Squeak (3.7-7 vm on Linux, 3.8 image and SqueakV39.sources). I  
>>> suspect
>>> the '\r' line terminator in the *.sources file could be causing it.
>>>
>>> Is this a bug?
>>>
>>> TIA .. Subbu
>>
>> I think what could be eliminated if you do Smalltalk  
>> removeAllLineFeeds.
>>
>> In Mac , I found this often.
>
> Squeak3.9 handling of CRLF sequences in sources file is defective.

I do not think it is.

> Squeak3.8
> image correctly strips of LF in CRLF while reading in text from
> SqueakV3.sources file. For instance, editStartPage method in  
> Scamper uses
> CRLF in SqueakV3.sources file but its Text in codepane strips out LF.

I do not think this is the case.

It is just that images up to 3.8 did not *display* LF chars embedded  
in a String because the corresponding glyph in the fonts was blank.  
This has been fixed, we should not have invisible characters anymore  
(which are very annoying), and this just exposes the problem of LFs  
in some method sources. They have been there all along, they just  
were not visible.

Reading in code must not strip LFs unless specifically told to,  
because it is perfectly valid to embed a LF in a String in source  
code if you want to.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

K. K. Subramaniam
On Monday 07 May 2007 10:53 pm, Bert Freudenberg wrote:
> > Squeak3.9 handling of CRLF sequences in sources file is defective.
>
> I do not think it is.
SqueakV39.sources contains the following sequence as seen in a hexeditor for
DateAndTime commentStamp:
I have zero duration\r\n\r\n\r\n
When I browse this class and inspect the Text object in the annotation pane,
the same sequence shows up in its string variable. Since annotations are read
in as line-oriented text from files, the CRLFs should have been replaced with
CRs. Of course, we could take a stance that the sources files is corrupt
since it uses mixed line endings, but then what about text read in from a
changes file or from a filein that came thru email?

> > Squeak3.8
> > image correctly strips of LF in CRLF while reading in text from
> > SqueakV3.sources file. For instance, editStartPage method in
> > Scamper uses
> > CRLF in SqueakV3.sources file but its Text in codepane strips out LF.
>
> I do not think this is the case.
SqueakV3.sources contains the sequence:
editStartPage\r\n\t
But the Text object in the codepane shows the sequence:
editStartPage\r\t

But I did notice that the CRLF was retained in the comments for B3DRotation.
Is it because code strings are parsed while comment strings are not
interpreted?

Regards .. Subbu


Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

Bert Freudenberg

On May 7, 2007, at 14:47 , subbukk wrote:

> On Monday 07 May 2007 10:53 pm, Bert Freudenberg wrote:
>>> Squeak3.9 handling of CRLF sequences in sources file is defective.
>>
>> I do not think it is.
> SqueakV39.sources contains the following sequence as seen in a  
> hexeditor for
> DateAndTime commentStamp:
> I have zero duration\r\n\r\n\r\n
> When I browse this class and inspect the Text object in the  
> annotation pane,
> the same sequence shows up in its string variable. Since  
> annotations are read
> in as line-oriented text from files

They are not. The sources and changes file is *not* a text file even  
though it might look like one to the uninitiated. It's a database of  
data chunks and the image actually stores byte offsets into this  
file. When you move the file to a different platform you *must not*  
change the line ending convention.

> , the CRLFs should have been replaced with
> CRs. Of course, we could take a stance that the sources files is  
> corrupt
> since it uses mixed line endings, but then what about text read in  
> from a
> changes file or from a filein that came thru email?

We might be more tolerant when filing in, this is true. But this  
should be an explicit action because there actually are file-ins that  
contain binary data which we do *no* want to mess with.

>>> Squeak3.8
>>> image correctly strips of LF in CRLF while reading in text from
>>> SqueakV3.sources file. For instance, editStartPage method in
>>> Scamper uses
>>> CRLF in SqueakV3.sources file but its Text in codepane strips out  
>>> LF.
>>
>> I do not think this is the case.
> SqueakV3.sources contains the sequence:
> editStartPage\r\n\t
> But the Text object in the codepane shows the sequence:
> editStartPage\r\t

Let's see.

        (Scamper>>#editStartPage) fileIndex "2"

which means it is in the changes file, not the sources file.

        (Scamper>>#editStartPage) filePosition "10310306"

which tells you the file offset

        (Scamper>>#editStartPage) getSourceFromFile asString asByteArray
                "a ByteArray(101 100 105 116 83 116 97 114 116 80 97 103 101 13  
9 ...)"

which is the source code as retrieved by the browser, note there only  
is a 13 (CR) no LF (10).

        | f | [(f := FileStream readOnlyFileNamed: Smalltalk changesName)  
binary;
                position: 10310306; next: 40] ensure: [f close]

        "a ByteArray(101 100 105 116 83 116 97 114 116 80 97 103 101 13 9 ...)

which confirms that this is actually in the file.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

K. K. Subramaniam
On Tuesday 08 May 2007 12:41 am, Bert Freudenberg wrote:

> On May 7, 2007, at 14:47 , subbukk wrote:
> > SqueakV39.sources contains the following sequence as seen in a
> > hexeditor for
> > DateAndTime commentStamp:
> > I have zero duration\r\n\r\n\r\n
> > When I browse this class and inspect the Text object in the
> > annotation pane,
> > the same sequence shows up in its string variable. Since
> > annotations are read
> > in as line-oriented text from files
>
> They are not. The sources and changes file is *not* a text file even
> though it might look like one to the uninitiated. It's a database of
> data chunks and the image actually stores byte offsets into this
> file. When you move the file to a different platform you *must not*
> change the line ending convention.
By text, I only meant portions of chunk and not the whole file. This is the
reason I used a hexeditor for the whole file. But it is good that you pointed
out the special nature of these files.

> > , the CRLFs should have been replaced with
> > CRs. Of course, we could take a stance that the sources files is
> > corrupt
> > since it uses mixed line endings, but then what about text read in
> > from a
> > changes file or from a filein that came thru email?
>
> We might be more tolerant when filing in, this is true. But this
> should be an explicit action because there actually are file-ins that
> contain binary data which we do *no* want to mess with.
Yes, I understand this. But there are contexts where we know the byteArray to
be a text sequence.

> > SqueakV3.sources contains the sequence:
> > editStartPage\r\n\t
> > But the Text object in the codepane shows the sequence:
> > editStartPage\r\t
>..
> (Scamper>>#editStartPage) getSourceFromFile asString asByteArray
> "a ByteArray(101 100 105 116 83 116 97 114 116 80 97 103 101 13
> 9 ...)"
>
> which is the source code as retrieved by the browser, note there only
> is a 13 (CR) no LF (10).
I stand corrected. I poked directly into the Sources file and forgot to check
the changes file, so this is a wrong example. There are other examples 3.8
image like:
(String>>#asDateAndTime) getSourceFromFile asString asByteArray a ByteArray(97
115 68 97 116 101 65 110 100 84 105 109 101 13 10 13 10 9 34 67 ...
where the CRLF line ending pops up.

I am curious about how these CRLFs got into the chunks in the first place? I
dont know Squeak well enough to track this down quickly, so when I saw the
artifacts, I seized this opportunity to dig into internals.

Bret, thank you very much for explaining your reasoning in detail and in
Squeak code. It helps me learn internals faster.

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

stephane ducasse
do not read the code using emacs or vi.
Use the tools in squeak, sources and changes are internal format of  
squeak saving code.

Stef

On 8 mai 07, at 08:34, subbukk wrote:

> On Tuesday 08 May 2007 12:41 am, Bert Freudenberg wrote:
>> On May 7, 2007, at 14:47 , subbukk wrote:
>>> SqueakV39.sources contains the following sequence as seen in a
>>> hexeditor for
>>> DateAndTime commentStamp:
>>> I have zero duration\r\n\r\n\r\n
>>> When I browse this class and inspect the Text object in the
>>> annotation pane,
>>> the same sequence shows up in its string variable. Since
>>> annotations are read
>>> in as line-oriented text from files
>>
>> They are not. The sources and changes file is *not* a text file even
>> though it might look like one to the uninitiated. It's a database of
>> data chunks and the image actually stores byte offsets into this
>> file. When you move the file to a different platform you *must not*
>> change the line ending convention.
> By text, I only meant portions of chunk and not the whole file.  
> This is the
> reason I used a hexeditor for the whole file. But it is good that  
> you pointed
> out the special nature of these files.
>
>>> , the CRLFs should have been replaced with
>>> CRs. Of course, we could take a stance that the sources files is
>>> corrupt
>>> since it uses mixed line endings, but then what about text read in
>>> from a
>>> changes file or from a filein that came thru email?
>>
>> We might be more tolerant when filing in, this is true. But this
>> should be an explicit action because there actually are file-ins that
>> contain binary data which we do *no* want to mess with.
> Yes, I understand this. But there are contexts where we know the  
> byteArray to
> be a text sequence.
>
>>> SqueakV3.sources contains the sequence:
>>> editStartPage\r\n\t
>>> But the Text object in the codepane shows the sequence:
>>> editStartPage\r\t
>> ..
>> (Scamper>>#editStartPage) getSourceFromFile asString asByteArray
>> "a ByteArray(101 100 105 116 83 116 97 114 116 80 97 103 101 13
>> 9 ...)"
>>
>> which is the source code as retrieved by the browser, note there only
>> is a 13 (CR) no LF (10).
> I stand corrected. I poked directly into the Sources file and  
> forgot to check
> the changes file, so this is a wrong example. There are other  
> examples 3.8
> image like:
> (String>>#asDateAndTime) getSourceFromFile asString asByteArray a  
> ByteArray(97
> 115 68 97 116 101 65 110 100 84 105 109 101 13 10 13 10 9 34 67 ...
> where the CRLF line ending pops up.
>
> I am curious about how these CRLFs got into the chunks in the first  
> place? I
> dont know Squeak well enough to track this down quickly, so when I  
> saw the
> artifacts, I seized this opportunity to dig into internals.
>
> Bret, thank you very much for explaining your reasoning in detail  
> and in
> Squeak code. It helps me learn internals faster.
>
> Regards .. Subbu
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

Bert Freudenberg
In reply to this post by K. K. Subramaniam
On May 8, 2007, at 2:34 , subbukk wrote:

> m curious about how these CRLFs got into the chunks in the first  
> place?

By people and software who think changing and even adding some bytes  
in a file is a jolly good idea.

Traditionally, only CR was used. All was fine. Then some people  
insisted on storing fileouts the "right way" with platform-dependent  
line endings by using CrLfStream (or whatever it was named). Or ftp  
tools tried to be "helpful" by converting CRs to CRLFs in fileouts,  
at least on on one particular platform that Squeak happens to run on.  
Anyway, when people filed these in, the bad characters went unnoticed  
because LF was shown as a zero-width (hence invisible) character.  
Only now with the fixed fonts these show up.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: [Newbies] Display artifacts in comments/desc.

Edgar J. De Cleene
In reply to this post by Bert Freudenberg



El 5/7/07 2:23 PM, "Bert Freudenberg" <[hidden email]> escribió:

>
> On May 7, 2007, at 6:14 , subbukk wrote:
>
>> On Monday 30 April 2007 7:12 pm, Edgar J. De Cleene wrote:
>>> El 4/30/07 10:22 AM, "subbukk" <[hidden email]> escribió:
>>>> Hi,
>>>>
>>>> I see some artifacts (like []) displayed in comments and
>>>> descriptions in
>>>> Squeak (3.7-7 vm on Linux, 3.8 image and SqueakV39.sources). I
>>>> suspect
>>>> the '\r' line terminator in the *.sources file could be causing it.
>>>>
>>>> Is this a bug?
>>>>
>>>> TIA .. Subbu
>>>
>>> I think what could be eliminated if you do Smalltalk
>>> removeAllLineFeeds.
>>>
>>> In Mac , I found this often.
>>
>> Squeak3.9 handling of CRLF sequences in sources file is defective.
>
> I do not think it is.
>
>> Squeak3.8
>> image correctly strips of LF in CRLF while reading in text from
>> SqueakV3.sources file. For instance, editStartPage method in
>> Scamper uses
>> CRLF in SqueakV3.sources file but its Text in codepane strips out LF.
>
> I do not think this is the case.
>
> It is just that images up to 3.8 did not *display* LF chars embedded
> in a String because the corresponding glyph in the fonts was blank.
> This has been fixed, we should not have invisible characters anymore
> (which are very annoying), and this just exposes the problem of LFs
> in some method sources. They have been there all along, they just
> were not visible.
>
> Reading in code must not strip LFs unless specifically told to,
> because it is perfectly valid to embed a LF in a String in source
> code if you want to.
>
> - Bert -
>
>
Just working with Mac and copyng data of Mantis I cook this quick dirty
solution what works on Mac.
I select the text with artifacts (in my case the preamble of a new .cs) and
paste again in PluggableTextMorph

Edgar

>




Clipboard-clipboardText.st (1K) Download Attachment