Parsing privateAuthorsRaw for a changes browser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing privateAuthorsRaw for a changes browser

Eliot Miranda-2
Hi All,

    I had reason to condense changes and then was curious to look for older versions.  But when I came to open a changes browser on the newly condensed changes file the UTF-8 decoder failed to parse the source for SystemNavigation class>>privateAuthorsRaw.  Something breaks the string at the e acute in Stéphane, and then the decoder gets hopelessly confused.

To reproduce:
In a trunk 6.x image do
    Smalltalk condenseChanges
then open a file list, select the changes file, and then click the recent changes button.

here's the SqueakDebug.log:

InvalidUTF8: Invalid utf8: ©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTLYSTEPHANIE!Stephen Smith#sst!Stephen Travis Pope#stp!Stephen Vincent Pair#svp!Steve Davies#sld!Steve Elkins#sge!Steve Fuller#snf!Steve Gilbert#slg!Steve Hunter#skh!Steve Knight#knighty!Steve Mccusker#smcc!Steve Messamore#slm!Steve Sanderson#sms!Steve Wart#swart!Steve Wessels#!Steven Darcy#SMD!Steven Greenberg#greenbes!Steven Rodriguez#optionshiftk!Steven Swerling#sps!Sudheendra Hangal#hangal!Sungjin Chun#chunsj!Suzuki Tetsuya#tetsuya!Syed Abid#taxman!Syed Masoodahmad#masden56!Sylvia Sharma#sharma!Symon Chalk#symonc!Takashi Yamamiya#tak!Tansel Ersavas#mte#MTE!Tarek Demiati#TD!Ted Bracht#TB#TB1!Ted Kaehler#tk!Terry Jenkins#TCJ!Thierry Reignier#TREG!Thijs Janssen#TJ!Thomas Bernitt#tber!Thomas Fröb#thf!Thomas Hemme#Namamazu!Thomas J Keller#TJK!Thomas Kowark#tk!Thomas M. Breuel#tmb!Thomas Mahler#ThMa!Thomas Stambaugh#tms!Thomas Zimmermann#TZ!Tim Cuthbertson#tec!Tim Felgentreff#tfel!Tim Lewis#TimLewis!Tim Olson#tao!Tim Rowledge#TPR#tpr!Timm Knape#tik!Timothy Falconer#teefal!Timothy M#tty!Timothy Retz#tgr!Tobias Isenberg#ti!Tobias Pape#topa!Todd Blanchard#tb!Tom Counsell#tamc!Tom Dailey#td!Tom Koenig#tlk!Tom Plick#tap!Tom Rushworth#tbr!Tommy Thorn#tt!Tomohiro Oda#TO!Tony Garnock-Jones#tonyg!Tony Zampogna#zamp!Torge Husfeldt#th!Torsten Bergmann#tbn#TBN!Torsten Sadowski#ts!Travis Kay#tkay#tlk!Trygve Reenskaug#TRee!Tyler Coumbes#mtc!Tzaddi Beltaine#tsb!Udo Schneider#udos!Vaidotas Didžbalis#vd!Vassili Bykov#vb!Vernon Marsden#vmars!Vijay Mathew Pandyalakal#vmp!Vladimir Janousek#vj!Volker Bäcker#volker!Wally Cash#wac!Walter Wilhelm#ww!Ward Cunningham#ward!Wayne Braun#wb!Wayne D. Elias#wdelias!Webb Mcdonald#wxm!Wilkes Joiner#dwj!Willem van Asperen#wva!William Hess#WFH!William Hidden#whidden!Wolfgang Eder#edw!Wolfgang Helbig#whg!Woon Yeo#!Wuilmer Olaya Bardales#wob!Yagendra Dutt Tripathi#yd!Yang Ha Nguyen#yhm!Yann Monclair#YM!Yanni Chiu#yj!Yasuji Nakayama#yasuji!Yoshiki Ohshima#yo!Yuji Ichikawa#ich!Yunhee Lee#yhl!Yutaka Kamite#yk!Zdenek Novy#Zdenye#ZN!Zeljko Nesic#Poparasan!Zeynep Besen#zeyno'
12 July 2017 9:42:40.918319 am

VM: Mac OS - Smalltalk
Image: Squeak6.0alpha [latest update: #17347]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir /Users/eliot/Squeak/Squeak5.1
Trusted Dir /foobar/tooBar/forSqueak/bogus/
Untrusted Dir /Users/eliot/Library/Preferences/Squeak/Internet/My Squeak/

UTF8TextConverter class>>errorMalformedInput:
Receiver: UTF8TextConverter
Arguments and temporary variables: 
aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc...
Receiver's instance variables: 
superclass: TextConverter
methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc...
format: 65538
instanceVariables: nil
organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc...
subclasses: nil
name: #UTF8TextConverter
classPool: a Dictionary(#StrictUtf8Conversions->nil )
sharedPools: nil
environment: Smalltalk
category: #'Multilingual-TextConversion'
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter class>>decodeByteString:
Receiver: UTF8TextConverter
Arguments and temporary variables: 
aByteString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#M...etc...
outStream: a WriteStream
lastIndex: 1
nextIndex: 1
byte1: 169
byte2: nil
byte3: nil
byte4: nil
unicode: nil
Receiver's instance variables: 
superclass: TextConverter
methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc...
format: 65538
instanceVariables: nil
organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc...
subclasses: nil
name: #UTF8TextConverter
classPool: a Dictionary(#StrictUtf8Conversions->nil )
sharedPools: nil
environment: Smalltalk
category: #'Multilingual-TextConversion'
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter>>decodeString:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc...
result: nil
Receiver's instance variables: 
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter>>nextChunkFromStream:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
input: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.ch...etc...
Receiver's instance variables: 
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

MultiByteFileStream>>nextChunk
Receiver: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes'
Arguments and temporary variables: 

Receiver's instance variables: 


ChangeList class>>browseRecentLogOn:
Receiver: ChangeList
Arguments and temporary variables: 
origChangesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6proj...etc...
end: 13286751
done: false
block: 7195999
pos: 7198297
changesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectL...etc...
position: nil
prevBlock: 7197023
chunk: #('privateAuthorsRaw

^ ''Aaron Reichow#ajr!Abigail Sanchez#as!Adam Eng...etc...
Receiver's instance variables: 
superclass: CodeHolder
methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc...
format: 65548
instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc...
organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc...
subclasses: {ChangeListForProjects . VersionsBrowser}
name: #ChangeList
classPool: nil
sharedPools: nil
environment: nil
category: #'Tools-Changes'

ChangeList class>>browseRecentLogOnPath:
Receiver: ChangeList
Arguments and temporary variables: 
fullName: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes'
Receiver's instance variables: 
superclass: CodeHolder
methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc...
format: 65548
instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc...
organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc...
subclasses: {ChangeListForProjects . VersionsBrowser}
name: #ChangeList
classPool: nil
sharedPools: nil
environment: nil
category: #'Tools-Changes'
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Parsing privateAuthorsRaw for a changes browser

Patrick R.

Hi Eliot,

I started looking into this. So far I could not manage to reproduce this
locally using a new trunk image and using a trunk image from May and
updating it. So far this looks like a mixture of a double encoding and a
wrong decoding issue. The character sequence 'ä' further down (in
Volker Bäcker) would be ä when interpreted as UTF-8 which in
turn when interpreted as UTF-8 is ä, which would be expected in the
string. To get to 'ä' though would require to interpret the ä in
UTF-8 as CP1252 and then encode it again in UTF-8 and decode it once
again using CP1252.

Sanity check before I continue: Does the source code in the method look
right in that image?

(I hope all these weird characters will come through to you :) )

Bests
Patrick


From: Squeak-dev <[hidden email]> on behalf of Eliot Miranda <[hidden email]>
Sent: Wednesday, July 12, 2017 18:51
To: The general-purpose Squeak developers list
Subject: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
 
Hi All,

    I had reason to condense changes and then was curious to look for older versions.  But when I came to open a changes browser on the newly condensed changes file the UTF-8 decoder failed to parse the source for SystemNavigation class>>privateAuthorsRaw.  Something breaks the string at the e acute in Stéphane, and then the decoder gets hopelessly confused.

To reproduce:
In a trunk 6.x image do
    Smalltalk condenseChanges
then open a file list, select the changes file, and then click the recent changes button.

here's the SqueakDebug.log:

InvalidUTF8: Invalid utf8: ©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTLYSTEPHANIE!Stephen Smith#sst!Stephen Travis Pope#stp!Stephen Vincent Pair#svp!Steve Davies#sld!Steve Elkins#sge!Steve Fuller#snf!Steve Gilbert#slg!Steve Hunter#skh!Steve Knight#knighty!Steve Mccusker#smcc!Steve Messamore#slm!Steve Sanderson#sms!Steve Wart#swart!Steve Wessels#!Steven Darcy#SMD!Steven Greenberg#greenbes!Steven Rodriguez#optionshiftk!Steven Swerling#sps!Sudheendra Hangal#hangal!Sungjin Chun#chunsj!Suzuki Tetsuya#tetsuya!Syed Abid#taxman!Syed Masoodahmad#masden56!Sylvia Sharma#sharma!Symon Chalk#symonc!Takashi Yamamiya#tak!Tansel Ersavas#mte#MTE!Tarek Demiati#TD!Ted Bracht#TB#TB1!Ted Kaehler#tk!Terry Jenkins#TCJ!Thierry Reignier#TREG!Thijs Janssen#TJ!Thomas Bernitt#tber!Thomas Fröb#thf!Thomas Hemme#Namamazu!Thomas J Keller#TJK!Thomas Kowark#tk!Thomas M. Breuel#tmb!Thomas Mahler#ThMa!Thomas Stambaugh#tms!Thomas Zimmermann#TZ!Tim Cuthbertson#tec!Tim Felgentreff#tfel!Tim Lewis#TimLewis!Tim Olson#tao!Tim Rowledge#TPR#tpr!Timm Knape#tik!Timothy Falconer#teefal!Timothy M#tty!Timothy Retz#tgr!Tobias Isenberg#ti!Tobias Pape#topa!Todd Blanchard#tb!Tom Counsell#tamc!Tom Dailey#td!Tom Koenig#tlk!Tom Plick#tap!Tom Rushworth#tbr!Tommy Thorn#tt!Tomohiro Oda#TO!Tony Garnock-Jones#tonyg!Tony Zampogna#zamp!Torge Husfeldt#th!Torsten Bergmann#tbn#TBN!Torsten Sadowski#ts!Travis Kay#tkay#tlk!Trygve Reenskaug#TRee!Tyler Coumbes#mtc!Tzaddi Beltaine#tsb!Udo Schneider#udos!Vaidotas Didžbalis#vd!Vassili Bykov#vb!Vernon Marsden#vmars!Vijay Mathew Pandyalakal#vmp!Vladimir Janousek#vj!Volker Bäcker#volker!Wally Cash#wac!Walter Wilhelm#ww!Ward Cunningham#ward!Wayne Braun#wb!Wayne D. Elias#wdelias!Webb Mcdonald#wxm!Wilkes Joiner#dwj!Willem van Asperen#wva!William Hess#WFH!William Hidden#whidden!Wolfgang Eder#edw!Wolfgang Helbig#whg!Woon Yeo#!Wuilmer Olaya Bardales#wob!Yagendra Dutt Tripathi#yd!Yang Ha Nguyen#yhm!Yann Monclair#YM!Yanni Chiu#yj!Yasuji Nakayama#yasuji!Yoshiki Ohshima#yo!Yuji Ichikawa#ich!Yunhee Lee#yhl!Yutaka Kamite#yk!Zdenek Novy#Zdenye#ZN!Zeljko Nesic#Poparasan!Zeynep Besen#zeyno'
12 July 2017 9:42:40.918319 am

VM: Mac OS - Smalltalk
Image: Squeak6.0alpha [latest update: #17347]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir /Users/eliot/Squeak/Squeak5.1
Trusted Dir /foobar/tooBar/forSqueak/bogus/
Untrusted Dir /Users/eliot/Library/Preferences/Squeak/Internet/My Squeak/

UTF8TextConverter class>>errorMalformedInput:
Receiver: UTF8TextConverter
Arguments and temporary variables: 
aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc...
Receiver's instance variables: 
superclass: TextConverter
methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc...
format: 65538
instanceVariables: nil
organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc...
subclasses: nil
name: #UTF8TextConverter
classPool: a Dictionary(#StrictUtf8Conversions->nil )
sharedPools: nil
environment: Smalltalk
category: #'Multilingual-TextConversion'
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter class>>decodeByteString:
Receiver: UTF8TextConverter
Arguments and temporary variables: 
aByteString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#M...etc...
outStream: a WriteStream
lastIndex: 1
nextIndex: 1
byte1: 169
byte2: nil
byte3: nil
byte4: nil
unicode: nil
Receiver's instance variables: 
superclass: TextConverter
methodDict: a MethodDictionary(#backFromStream:->(UTF8TextConverter>>#backFromS...etc...
format: 65538
instanceVariables: nil
organization: ('conversion' backFromStream: decodeString: encodeString: errorMalformedInput:...etc...
subclasses: nil
name: #UTF8TextConverter
classPool: a Dictionary(#StrictUtf8Conversions->nil )
sharedPools: nil
environment: Smalltalk
category: #'Multilingual-TextConversion'
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter>>decodeString:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
aString: '©phane Rollandin#spfa!Stephane Schitter#stefs!Stephanie Hamburg#MUTTL...etc...
result: nil
Receiver's instance variables: 
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

UTF8TextConverter>>nextChunkFromStream:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
input: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.ch...etc...
Receiver's instance variables: 
latin1Map: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...etc...
latin1Encodings: #(nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ...etc...

MultiByteFileStream>>nextChunk
Receiver: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes'
Arguments and temporary variables: 

Receiver's instance variables: 


ChangeList class>>browseRecentLogOn:
Receiver: ChangeList
Arguments and temporary variables: 
origChangesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6proj...etc...
end: 13286751
done: false
block: 7195999
pos: 7198297
changesFile: MultiByteFileStream: '/Users/eliot/Squeak/Squeak5.1/trunk6projectL...etc...
position: nil
prevBlock: 7197023
chunk: #('privateAuthorsRaw

^ ''Aaron Reichow#ajr!Abigail Sanchez#as!Adam Eng...etc...
Receiver's instance variables: 
superclass: CodeHolder
methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc...
format: 65548
instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc...
organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc...
subclasses: {ChangeListForProjects . VersionsBrowser}
name: #ChangeList
classPool: nil
sharedPools: nil
environment: nil
category: #'Tools-Changes'

ChangeList class>>browseRecentLogOnPath:
Receiver: ChangeList
Arguments and temporary variables: 
fullName: '/Users/eliot/Squeak/Squeak5.1/trunk6projectLoad.changes'
Receiver's instance variables: 
superclass: CodeHolder
methodDict: a MethodDictionary(#acceptFrom:->(ChangeList>>#acceptFrom: "a CompiledMethod...etc...
format: 65548
instanceVariables: #('changeList' 'list' 'listIndex' 'listSelections' 'file' 'l...etc...
organization: ('accessing' changeList changes:file: currentChange file listHasSingleEntry...etc...
subclasses: {ChangeListForProjects . VersionsBrowser}
name: #ChangeList
classPool: nil
sharedPools: nil
environment: nil
category: #'Tools-Changes'
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Parsing privateAuthorsRaw for a changes browser

Patrick R.

Well as feared it did not come through. Let me try this again: The string 'ä' would be 'Ã' 

when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which 

is what we actually want. The rest is as described below. 


---


Hi Eliot,
I started looking into this. So far I could not manage to reproduce this
locally using a new trunk image and using a trunk image from May and
updating it. So far this looks like a mixture of a double encoding and a
wrong decoding issue. The character sequence 'ä' further down (in
Volker Bäcker) would be ä when interpreted as UTF-8 which in
turn when interpreted as UTF-8 is ä, which would be expected in the
string. To get to 'ä' though would require to interpret the ä in
UTF-8 as CP1252 and then encode it again in UTF-8 and decode it once
again using CP1252.
Sanity check before I continue: Does the source code in the method look
right in that image?
(I hope all these weird characters will come through to you :) )
Bests
Patrick



Reply | Threaded
Open this post in threaded view
|

Re: Parsing privateAuthorsRaw for a changes browser

Bert Freudenberg
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick <[hidden email]> wrote:

Well as feared it did not come through. Let me try this again: The string 'ä' would be 'Ã' 

when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which 

is what we actually want. The rest is as described below. 

​In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:

'ä' squeakToUtf8 
=> 'ä'
'ä' squeakToUtf8 asByteArray
#[195 164]

'ä' utf8ToSqueak
 'ä'

#[195 164] asString utf8ToSqueak
=> 'ä'

I assume this is a copy-paste error? E.g. I cannot copy+paste the result of

'ä' squeakToUtf8 squeakToUtf8

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Parsing privateAuthorsRaw for a changes browser

Patrick R.

I meant that this:


'ä' squeakToUtf8 squeakToUtf8  asByteArray => #[195 131 194 164]


are the characters which are printed instead of 'ä' in the debug output.


I will look into this tomorrow again. I have not yet investigated the concrete trace to the ChangeList coming from the FileList (so far I have directly opened a ChangeList).


From: Squeak-dev <[hidden email]> on behalf of Bert Freudenberg <[hidden email]>
Sent: Wednesday, July 19, 2017 15:15
To: The general-purpose Squeak developers list
Subject: Re: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
 
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick <[hidden email]> wrote:

Well as feared it did not come through. Let me try this again: The string 'ä' would be 'Ã' 

when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which 

is what we actually want. The rest is as described below. 

​In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:

'ä' squeakToUtf8 
=> 'ä'
'ä' squeakToUtf8 asByteArray
#[195 164]

'ä' utf8ToSqueak
 'ä'

#[195 164] asString utf8ToSqueak
=> 'ä'

I assume this is a copy-paste error? E.g. I cannot copy+paste the result of

'ä' squeakToUtf8 squeakToUtf8

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Parsing privateAuthorsRaw for a changes browser

Nicolas Cellier

I noticed that #setConverterForCode still rely on BOM, but my current .changes does not have a BOM...
Note that there are not so many senders of #writeBOMOn: mainly those who want to fileOut a class/method/etc...
So that explain that I do not have a BOM...

Though (SourceFiles at: 2) has a UTF8TextConverter... Why?
That could be a direct send of #converter:, but I rather think that UTF8 is the default converter when we open the file.
So things work only because we don't #setConverterForCode on the .changes nor .sources...
Except that the path that you used does...

IMO, it's not related to condenseChanges, it should equally fail if you pretend you are Stéphane author, modify a method, and browse recent changes form file list...


2017-07-19 18:55 GMT+02:00 Rein, Patrick <[hidden email]>:

I meant that this:


'ä' squeakToUtf8 squeakToUtf8  asByteArray => #[195 131 194 164]


are the characters which are printed instead of 'ä' in the debug output.


I will look into this tomorrow again. I have not yet investigated the concrete trace to the ChangeList coming from the FileList (so far I have directly opened a ChangeList).


From: Squeak-dev <[hidden email]> on behalf of Bert Freudenberg <[hidden email]>
Sent: Wednesday, July 19, 2017 15:15
To: The general-purpose Squeak developers list
Subject: Re: [squeak-dev] Parsing privateAuthorsRaw for a changes browser
 
On Wed, Jul 19, 2017 at 2:22 PM, Rein, Patrick <[hidden email]> wrote:

Well as feared it did not come through. Let me try this again: The string 'ä' would be 'Ã' 

when interpreted as bytes which encode UTF-8. In turn 'Ã' as bytes encoding UTF-8 is 'ä' which 

is what we actually want. The rest is as described below. 

​In my image (updated from some trunk version) the method looks fine. As for the weird encodings, I think you mean:

'ä' squeakToUtf8 
=> 'ä'
'ä' squeakToUtf8 asByteArray
#[195 164]

'ä' utf8ToSqueak
 'ä'

#[195 164] asString utf8ToSqueak
=> 'ä'

I assume this is a copy-paste error? E.g. I cannot copy+paste the result of

'ä' squeakToUtf8 squeakToUtf8

- Bert -