[BUG] Timestamps don't work for classes with special character names

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[BUG] Timestamps don't work for classes with special character names

Christoph Thiede

Hi all, found just another bug. If you get tired of them, just tell me :-)


Steps to reproduce:

Print it:

class := Object subclass: #CTTèstClass "sic (with accent in name)!"
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'CT-Experiments'.
class compile: 'foo ^ #foo'.
(class >> #foo) timeStamp


Expected output:

Something like 'ct 12/21/2019 15:13'.


Actual output:

''.


Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.


Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.


Cause of infection not yet investigated.


Best,

Christoph



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Tobias Pape

> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
>
> Hi all, found just another bug. If you get tired of them, just tell me :-)
>
> Steps to reproduce:
> Print it:
> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'CT-Experiments'.
> class compile: 'foo ^ #foo'.
> (class >> #foo) timeStamp
>
> Expected output:
> Something like 'ct 12/21/2019 15:13'.
>
> Actual output:
> ''.
>
> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>
> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>
> Cause of infection not yet investigated.

Please look at your .changes file whether at some point \00 bytes appear.

Best regards
        -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Christoph Thiede

Hi Tobias,


what do you mean in detail?


If I create the class via System Browser and add the method, my change file ends with:


Object subclass: #CTTéstClass
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'CT-Experiments'!
!CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
foo! !

However, CompiledMethod >> #timeStamp returns ''.


Here is a snapshot of the #timeStamp stackframe:



Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???


I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:


Which lead me to this:


Does not seem related, but still looks somehow wrong ^^


Best,

Christoph



Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
>
> Hi all, found just another bug. If you get tired of them, just tell me :-)
>
> Steps to reproduce:
> Print it:
> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'CT-Experiments'.
> class compile: 'foo ^ #foo'.
> (class >> #foo) timeStamp
>
> Expected output:
> Something like 'ct 12/21/2019 15:13'.
>
> Actual output:
> ''.
>
> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>
> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>
> Cause of infection not yet investigated.

Please look at your .changes file whether at some point \00 bytes appear.

Best regards
        -Tobias



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Christoph Thiede

Ah ok, the latter was already fixed in Multilingual-nice.249 from the Inbox, nevermind :)


Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Samstag, 21. Dezember 2019 17:36:26
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

Hi Tobias,


what do you mean in detail?


If I create the class via System Browser and add the method, my change file ends with:


Object subclass: #CTTéstClass
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'CT-Experiments'!
!CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
foo! !

However, CompiledMethod >> #timeStamp returns ''.


Here is a snapshot of the #timeStamp stackframe:



Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???


I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:


Which lead me to this:


Does not seem related, but still looks somehow wrong ^^


Best,

Christoph



Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
>
> Hi all, found just another bug. If you get tired of them, just tell me :-)
>
> Steps to reproduce:
> Print it:
> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'CT-Experiments'.
> class compile: 'foo ^ #foo'.
> (class >> #foo) timeStamp
>
> Expected output:
> Something like 'ct 12/21/2019 15:13'.
>
> Actual output:
> ''.
>
> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>
> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>
> Cause of infection not yet investigated.

Please look at your .changes file whether at some point \00 bytes appear.

Best regards
        -Tobias



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Tobias Pape
In reply to this post by Christoph Thiede

> On 21.12.2019, at 17:36, Thiede, Christoph <[hidden email]> wrote:
>
> Hi Tobias,
>
> what do you mean in detail?
>
> If I create the class via System Browser and add the method, my change file ends with:
>
> Object subclass: #CTTéstClass
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'CT-Experiments'!
> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
> foo! !


Good. that was what I thought was important.


>
> However, CompiledMethod >> #timeStamp returns ''.

What is the result of the following?

        (CTTéstClass compiledMethodAt: #foo) preamble


>
> Here is a snapshot of the #timeStamp stackframe:
>
>
>
> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???


I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
This is BAD.

You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.

But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.

Hence stamp is nil.

A wrong but easy fix would be to call #utf8ToSqueak on the preamble.

Best regards
        -Tobias


>
> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
>
> Which lead me to this:
>
> Does not seem related, but still looks somehow wrong ^^
>
> Best,
> Christoph
>
> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>  
>
> > On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
> >
> > Hi all, found just another bug. If you get tired of them, just tell me :-)
> >
> > Steps to reproduce:
> > Print it:
> > class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> > instanceVariableNames: ''
> > classVariableNames: ''
> > poolDictionaries: ''
> > category: 'CT-Experiments'.
> > class compile: 'foo ^ #foo'.
> > (class >> #foo) timeStamp
> >
> > Expected output:
> > Something like 'ct 12/21/2019 15:13'.
> >
> > Actual output:
> > ''.
> >
> > Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
> >
> > Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
> >
> > Cause of infection not yet investigated.
>
> Please look at your .changes file whether at some point \00 bytes appear.
>
> Best regards
>         -Tobias



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Tobias Pape

> On 21.12.2019, at 19:11, Tobias Pape <[hidden email]> wrote:
>
>>
>> On 21.12.2019, at 17:36, Thiede, Christoph <[hidden email]> wrote:
>>
>> Hi Tobias,
>>
>> what do you mean in detail?
>>
>> If I create the class via System Browser and add the method, my change file ends with:
>>
>> Object subclass: #CTTéstClass
>> instanceVariableNames: ''
>> classVariableNames: ''
>> poolDictionaries: ''
>> category: 'CT-Experiments'!
>> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
>> foo! !
>
>
> Good. that was what I thought was important.
>
>
>>
>> However, CompiledMethod >> #timeStamp returns ''.
>
> What is the result of the following?
>
> (CTTéstClass compiledMethodAt: #foo) preamble
>
>
>>
>> Here is a snapshot of the #timeStamp stackframe:
>>
>>
>>
>> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
>
>
> I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> This is BAD.

Oh, and we were warned:

CompiledMethod
getPreambleFrom: aFileStream at: endPosition
        "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."

        | chunkSize chunk |
        chunkSize := 160 min: endPosition.
        [
                | index |
                chunk := aFileStream
                        position: (endPosition - chunkSize + 1 max: 0);
                        basicNext: chunkSize.
                (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
                        ^chunk copyFrom: index + 1 to: chunk size ].
                chunkSize := chunkSize * 2.
                chunkSize <= endPosition ] whileTrue.
        ^chunk


I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.

Best regards
        -Tobias

>
> You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
>
> But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
>
> Hence stamp is nil.
>
> A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
>
> Best regards
> -Tobias
>
>
>>
>> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
>>
>> Which lead me to this:
>>
>> Does not seem related, but still looks somehow wrong ^^
>>
>> Best,
>> Christoph
>>
>> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
>> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
>> An: The general-purpose Squeak developers list
>> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>>
>>
>>> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
>>>
>>> Hi all, found just another bug. If you get tired of them, just tell me :-)
>>>
>>> Steps to reproduce:
>>> Print it:
>>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'CT-Experiments'.
>>> class compile: 'foo ^ #foo'.
>>> (class >> #foo) timeStamp
>>>
>>> Expected output:
>>> Something like 'ct 12/21/2019 15:13'.
>>>
>>> Actual output:
>>> ''.
>>>
>>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>>>
>>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>>>
>>> Cause of infection not yet investigated.
>>
>> Please look at your .changes file whether at some point \00 bytes appear.
>>
>> Best regards
>>        -Tobias



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Christoph Thiede

Hi Tobias, thanks for the pointers!


(CTTéstClass compiledMethodAt: #foo) preamble


Like you said:


I made the following change:

This seems to fix the conversion issues.

Outputs are:


The next problem is the trailing ! for the CTTéstClass preamble.
Here, the integer returned by expandedSourceFileArray >> #filePositionFromSourcePointer: is too large by one.
If have no idea where these constants come from, but as this is a constant method, I don't see how this calculation could be wrong.

I also tried the following:
yielding correctly:

But that seems hacky again.


Looking forward to your reply!


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
Gesendet: Samstag, 21. Dezember 2019 19:22:38
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

> On 21.12.2019, at 19:11, Tobias Pape <[hidden email]> wrote:
>
>>
>> On 21.12.2019, at 17:36, Thiede, Christoph <[hidden email]> wrote:
>>
>> Hi Tobias,
>>
>> what do you mean in detail?
>>
>> If I create the class via System Browser and add the method, my change file ends with:
>>
>> Object subclass: #CTTéstClass
>> instanceVariableNames: ''
>> classVariableNames: ''
>> poolDictionaries: ''
>> category: 'CT-Experiments'!
>> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
>> foo! !
>
>
> Good. that was what I thought was important.
>
>
>>
>> However, CompiledMethod >> #timeStamp returns ''.
>
> What is the result of the following?
>
>        (CTTéstClass compiledMethodAt: #foo) preamble
>
>
>>
>> Here is a snapshot of the #timeStamp stackframe:
>>
>>
>>
>> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
>
>
> I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> This is BAD.

Oh, and we were warned:

CompiledMethod
getPreambleFrom: aFileStream at: endPosition
        "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."

        | chunkSize chunk |
        chunkSize := 160 min: endPosition.
        [
                | index |
                chunk := aFileStream
                        position: (endPosition - chunkSize + 1 max: 0);
                        basicNext: chunkSize.
                (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
                        ^chunk copyFrom: index + 1 to: chunk size ].
                chunkSize := chunkSize * 2.
                chunkSize <= endPosition ] whileTrue.
        ^chunk


I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.

Best regards   
        -Tobias

>
> You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
>
> But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
>
> Hence stamp is nil.
>
> A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
>
> Best regards
>        -Tobias
>
>
>>
>> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
>>
>> Which lead me to this:
>>
>> Does not seem related, but still looks somehow wrong ^^
>>
>> Best,
>> Christoph
>>
>> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
>> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
>> An: The general-purpose Squeak developers list
>> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>>
>>
>>> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
>>>
>>> Hi all, found just another bug. If you get tired of them, just tell me :-)
>>>
>>> Steps to reproduce:
>>> Print it:
>>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'CT-Experiments'.
>>> class compile: 'foo ^ #foo'.
>>> (class >> #foo) timeStamp
>>>
>>> Expected output:
>>> Something like 'ct 12/21/2019 15:13'.
>>>
>>> Actual output:
>>> ''.
>>>
>>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
>>>
>>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
>>>
>>> Cause of infection not yet investigated.
>>
>> Please look at your .changes file whether at some point \00 bytes appear.
>>
>> Best regards
>>        -Tobias






pastedImage.png (307K) Download Attachment
Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Tobias Pape

> On 21.12.2019, at 20:23, Thiede, Christoph <[hidden email]> wrote:
>
> Hi Tobias, thanks for the pointers!
>
> > (CTTéstClass compiledMethodAt: #foo) preamble
>
> Like you said:
>
>
> I made the following change:
>
> This seems to fix the conversion issues.
>
> Outputs are:
>
>
> The next problem is the trailing ! for the CTTéstClass preamble.
> Here, the integer returned by expandedSourceFileArray >> #filePositionFromSourcePointer: is too large by one.
> If have no idea where these constants come from, but as this is a constant method, I don't see how this calculation could be wrong.

Because of utf8. it counts raw bytes, but gets returned in count of unicode codepoints. hence + 1...

>
> I also tried the following:
>
> yielding correctly:

Seems lucky..

>
> But that seems hacky again.
>
> Looking forward to your reply!


Best regards
        -Tobias

PS: maybe copy the code instead of images? its easier to see things then, for me at least :)

>
> Best,
> Christoph
> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
> Gesendet: Samstag, 21. Dezember 2019 19:22:38
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
>  
>
> > On 21.12.2019, at 19:11, Tobias Pape <[hidden email]> wrote:
> >
> >>
> >> On 21.12.2019, at 17:36, Thiede, Christoph <[hidden email]> wrote:
> >>
> >> Hi Tobias,
> >>
> >> what do you mean in detail?
> >>
> >> If I create the class via System Browser and add the method, my change file ends with:
> >>
> >> Object subclass: #CTTéstClass
> >> instanceVariableNames: ''
> >> classVariableNames: ''
> >> poolDictionaries: ''
> >> category: 'CT-Experiments'!
> >> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
> >> foo! !
> >
> >
> > Good. that was what I thought was important.
> >
> >
> >>
> >> However, CompiledMethod >> #timeStamp returns ''.
> >
> > What is the result of the following?
> >
> >        (CTTéstClass compiledMethodAt: #foo) preamble
> >
> >
> >>
> >> Here is a snapshot of the #timeStamp stackframe:
> >>
> >>
> >>
> >> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
> >
> >
> > I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> > This is BAD.
>
> Oh, and we were warned:
>
> CompiledMethod
> getPreambleFrom: aFileStream at: endPosition
>         "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."
>
>         | chunkSize chunk |
>         chunkSize := 160 min: endPosition.
>         [
>                 | index |
>                 chunk := aFileStream
>                         position: (endPosition - chunkSize + 1 max: 0);
>                         basicNext: chunkSize.
>                 (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
>                         ^chunk copyFrom: index + 1 to: chunk size ].
>                 chunkSize := chunkSize * 2.
>                 chunkSize <= endPosition ] whileTrue.
>         ^chunk
>
>
> I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.
>
> Best regards    
>         -Tobias
>
> >
> > You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
> >
> > But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
> >
> > Hence stamp is nil.
> >
> > A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
> >
> > Best regards
> >        -Tobias
> >
> >
> >>
> >> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
> >>
> >> Which lead me to this:
> >>
> >> Does not seem related, but still looks somehow wrong ^^
> >>
> >> Best,
> >> Christoph
> >>
> >> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
> >> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
> >> An: The general-purpose Squeak developers list
> >> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
> >>
> >>
> >>> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
> >>>
> >>> Hi all, found just another bug. If you get tired of them, just tell me :-)
> >>>
> >>> Steps to reproduce:
> >>> Print it:
> >>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> >>> instanceVariableNames: ''
> >>> classVariableNames: ''
> >>> poolDictionaries: ''
> >>> category: 'CT-Experiments'.
> >>> class compile: 'foo ^ #foo'.
> >>> (class >> #foo) timeStamp
> >>>
> >>> Expected output:
> >>> Something like 'ct 12/21/2019 15:13'.
> >>>
> >>> Actual output:
> >>> ''.
> >>>
> >>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
> >>>
> >>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
> >>>
> >>> Cause of infection not yet investigated.
> >>
> >> Please look at your .changes file whether at some point \00 bytes appear.
> >>
> >> Best regards
> >>        -Tobias
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Christoph Thiede

Hi Tobias, sorry for the long delay!


PS: maybe copy the code instead of images? its easier to see things then, for me at least :)


Sorry, you're right. Code is bad for showing the diffs, screenshots are bad for editability :(
Please find the attachment.

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
Gesendet: Samstag, 21. Dezember 2019 20:47:50
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
 

> On 21.12.2019, at 20:23, Thiede, Christoph <[hidden email]> wrote:
>
> Hi Tobias, thanks for the pointers!
>
> > (CTTéstClass compiledMethodAt: #foo) preamble
>
> Like you said:
>
>
> I made the following change:
>
> This seems to fix the conversion issues.
>
> Outputs are:
>
>
> The next problem is the trailing ! for the CTTéstClass preamble.
> Here, the integer returned by expandedSourceFileArray >> #filePositionFromSourcePointer: is too large by one.
> If have no idea where these constants come from, but as this is a constant method, I don't see how this calculation could be wrong.

Because of utf8. it counts raw bytes, but gets returned in count of unicode codepoints. hence + 1...

>
> I also tried the following:
>
> yielding correctly:

Seems lucky..

>
> But that seems hacky again.
>
> Looking forward to your reply!


Best regards
        -Tobias

PS: maybe copy the code instead of images? its easier to see things then, for me at least :)

>
> Best,
> Christoph
> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
> Gesendet: Samstag, 21. Dezember 2019 19:22:38
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names

>
> > On 21.12.2019, at 19:11, Tobias Pape <[hidden email]> wrote:
> >
> >>
> >> On 21.12.2019, at 17:36, Thiede, Christoph <[hidden email]> wrote:
> >>
> >> Hi Tobias,
> >>
> >> what do you mean in detail?
> >>
> >> If I create the class via System Browser and add the method, my change file ends with:
> >>
> >> Object subclass: #CTTéstClass
> >> instanceVariableNames: ''
> >> classVariableNames: ''
> >> poolDictionaries: ''
> >> category: 'CT-Experiments'!
> >> !CTTéstClass methodsFor: 'no messages' stamp: 'ct 12/21/2019 17:18'!
> >> foo! !
> >
> >
> > Good. that was what I thought was important.
> >
> >
> >>
> >> However, CompiledMethod >> #timeStamp returns ''.
> >
> > What is the result of the following?
> >
> >        (CTTéstClass compiledMethodAt: #foo) preamble
> >
> >
> >>
> >> Here is a snapshot of the #timeStamp stackframe:
> >>
> >>
> >>
> >> Please note that "tokens at: tokenCount" returns the correct timestamp, but however, stamp is nil. What is this???
> >
> >
> > I see what the problem is. The .changes file is apparently written UTF-8 coded, but read Latin-1 coded.
> > This is BAD.
>
> Oh, and we were warned:
>
> CompiledMethod
> getPreambleFrom: aFileStream at: endPosition
>         "This method is an ugly hack. This method assumes that source files have ASCII-compatible encoding and that preambles contain no non-ASCII characters."
>
>         | chunkSize chunk |
>         chunkSize := 160 min: endPosition.
>         [
>                 | index |
>                 chunk := aFileStream
>                         position: (endPosition - chunkSize + 1 max: 0);
>                         basicNext: chunkSize.
>                 (index := chunk lastIndexOf: $! startingAt: chunk size) ~= 0 ifTrue: [
>                         ^chunk copyFrom: index + 1 to: chunk size ].
>                 chunkSize := chunkSize * 2.
>                 chunkSize <= endPosition ] whileTrue.
>         ^chunk
>
>
> I have the feeling that the problematic send is #basicNext: in line 10 or so. This seems to circumvent the conversion done by MultiByteFileStream.
>
> Best regards   
>         -Tobias
>
> >
> > You end up with 7 tokens, because you have three for the class name instead of one. This is because the Latin-1 copyright symbol is classified as binary selector, and thus separates the first part of the Class name from the second part. This happens only because utf8 vs latin.
> >
> > But the code path for 7-element tokens is different, and it looks for the #stamp: at a different position.
> >
> > Hence stamp is nil.
> >
> > A wrong but easy fix would be to call #utf8ToSqueak on the preamble.
> >
> > Best regards
> >        -Tobias
> >
> >
> >>
> >> I'm not sure if I understand you correctly, but if you told me to search the hex of my change file for a "zero word", the only occurrence I could find is:
> >>
> >> Which lead me to this:
> >>
> >> Does not seem related, but still looks somehow wrong ^^
> >>
> >> Best,
> >> Christoph
> >>
> >> Von: Squeak-dev <[hidden email]> im Auftrag von Tobias Pape <[hidden email]>
> >> Gesendet: Samstag, 21. Dezember 2019 15:44 Uhr
> >> An: The general-purpose Squeak developers list
> >> Betreff: Re: [squeak-dev] [BUG] Timestamps don't work for classes with special character names
> >>
> >>
> >>> On 21.12.2019, at 15:16, Thiede, Christoph <[hidden email]> wrote:
> >>>
> >>> Hi all, found just another bug. If you get tired of them, just tell me :-)
> >>>
> >>> Steps to reproduce:
> >>> Print it:
> >>> class := Object subclass: #CTTèstClass "sic (with accent in name)!"
> >>> instanceVariableNames: ''
> >>> classVariableNames: ''
> >>> poolDictionaries: ''
> >>> category: 'CT-Experiments'.
> >>> class compile: 'foo ^ #foo'.
> >>> (class >> #foo) timeStamp
> >>>
> >>> Expected output:
> >>> Something like 'ct 12/21/2019 15:13'.
> >>>
> >>> Actual output:
> >>> ''.
> >>>
> >>> Please note that everything would have worked fine if we named class #CTTestClass (without accent) instead.
> >>>
> >>> Do we want to support special class names in general? If yes, this is a bug in my opinion. If no, we should raise an error in the first statement.
> >>>
> >>> Cause of infection not yet investigated.
> >>
> >> Please look at your .changes file whether at some point \00 bytes appear.
> >>
> >> Best regards
> >>        -Tobias
>
>
>
>






CompiledMethod-getPreambleFromat.st (1K) Download Attachment
Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG] Timestamps don't work for classes with special character names

Christoph Thiede
Anyone willing to look into this? I have been testing this for the latest
months and did not receive any errors from there. :)



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!