Loading code with accent in the method or variable name

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading code with accent in the method or variable name

HilaireFernandes
Hi,

I am coming back with the MathOntologie's Alain package.

For the reference, it is a set of classes and methods to let hight school student use Smalltalk in French. It let students solve math problem with a computer language using native tongue.
It is a nice effort to ease the entrance to programming and a nice opportunity for Pharo as well.

Sadly, since Pharo 3 (or 2) it can't load in Pharo.
http://forum.world.st/Gettext-with-pharo-3-and-seaside-3-1-2-tp4776814p4776893.html

In Pharo 4, I tried again: I extracted the .st file from the Monticello package, convert it to UTF8. Both file in the original 8bits format file or the utf-8 file failed to load.

I will try to explore further the case. Any hints more than welcome.

Hilaire
-- 
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

Syntax Error: Unknown character ->.png (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

Sven Van Caekenberghe-2
I created a class with one method:

Hilaire>>#français
        "élève"
       
        ^ self résoudre

I can file that out and back in again in Pharo 4.

Here is the .st file:




I would guess something was done wrong with the original MCZ.

But I did not test adding the above to a real Monticello package.

> On 07 Sep 2015, at 16:15, Hilaire <[hidden email]> wrote:
>
> Hi,
>
> I am coming back with the MathOntologie's Alain package.
>
> For the reference, it is a set of classes and methods to let hight school student use Smalltalk in French. It let students solve math problem with a computer language using native tongue.
> It is a nice effort to ease the entrance to programming and a nice opportunity for Pharo as well.
>
> Sadly, since Pharo 3 (or 2) it can't load in Pharo.
> http://forum.world.st/Gettext-with-pharo-3-and-seaside-3-1-2-tp4776814p4776893.html
>
> In Pharo 4, I tried again: I extracted the .st file from the Monticello package, convert it to UTF8. Both file in the original 8bits format file or the utf-8 file failed to load.
>
> I will try to explore further the case. Any hints more than welcome.
>
> Hilaire
>  
> --
> Dr. Geo
>
> http://drgeo.eu
> http://google.com/+DrgeoEu
> <Syntax Error: Unknown character ->.png>


=?utf-8?Q?Hilaire-franc=CC=A7ais=2Est?= (346 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

HilaireFernandes
So there is hope :)

When debugging the import, I see that at some point the
MultiByteFileStream got it converter changed to MacRomanTextConverter.
It seams to happen somewhere in #parseNextDeclaration. Initially just
before import but a the start of fileIn its converter is UTF8...

Hilaire

Le 07/09/2015 16:30, Sven Van Caekenberghe a écrit :

> I created a class with one method:
>
> Hilaire>>#français
> "élève"
>
> ^ self résoudre
>
> I can file that out and back in again in Pharo 4.
>
> Here is the .st file:
>

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

Sven Van Caekenberghe-2

> On 07 Sep 2015, at 17:03, Hilaire <[hidden email]> wrote:
>
> So there is hope :)

Did you try my file out ?
Can you load it OK ?
Can you file the method out again ?

> When debugging the import, I see that at some point the
> MultiByteFileStream got it converter changed to MacRomanTextConverter.
> It seams to happen somewhere in #parseNextDeclaration. Initially just
> before import but a the start of fileIn its converter is UTF8...

You mean using MCZ ?

> Hilaire
>
> Le 07/09/2015 16:30, Sven Van Caekenberghe a écrit :
>> I created a class with one method:
>>
>> Hilaire>>#français
>> "élève"
>>
>> ^ self résoudre
>>
>> I can file that out and back in again in Pharo 4.
>>
>> Here is the .st file:
>>
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu


Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

HilaireFernandes
In reply to this post by HilaireFernandes
Sven, I see your .st is UTF8 file:

hilaire@pchome /tmp $ file Hilaire-français.st
Hilaire-français.st: UTF-8 Unicode (with BOM) text, with CR line terminators


So I guess it should be ok to filein the Mathontologies source after
converting it to utf8.

hilaire@pchome ~/Travaux/ $ file snapshot/source2.st
snapshot/source2.st: UTF-8 Unicode text, with very long lines

But it turns there are problem in the parsing and stream got converted
set to MacRoman...


Now looking at the MacRomanTextConverter user is funny, we have this
unfactorized codes:


MultiByteBinaryOrTextStream>>setConverterForCode

    | current |
    current := converter saveStateOf: self.
    self position: 0.
    self binary.
    ((self next: 3) =  #[239 187 191]) ifTrue: [
        self converter: UTF8TextConverter new
    ] ifFalse: [
        self converter: MacRomanTextConverter new.
    ].
    converter restoreStateOf: self with: current.
    self text.


MultiByteBinaryOrTextStream>>setEncoderForSourceCodeNamed: streamName

    | l |
    l := streamName asLowercase.
    ((l endsWith: 'cs') or: [
        (l endsWith: 'st') or: [
            (l endsWith: ('st.gz')) or: [
                (l endsWith: ('st.gz'))]]]) ifTrue: [
                    self converter: MacRomanTextConverter new.
                    ^ self.
    ].
    self converter: UTF8TextConverter new.


MultiByteFileStream>>setConverterForCode

    | current |
    (SourceFiles at: 2)
        ifNotNil: [self fullName = (SourceFiles at: 2) fullName ifTrue:
[^ self]].
    current := self converter saveStateOf: self.
    self position: 0.
    self binary.
    ((self next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
        self converter: UTF8TextConverter new
    ] ifFalse: [
        self converter: MacRomanTextConverter new.
    ].
    converter restoreStateOf: self with: current.
    self text.


CodeImporter>>selectTextConverterForCode
    self flag: #fix.  "This should not be here probably."
    "We need to see the first three bytes in order to see the origin of
the file"
    readStream binary.
    ((readStream next: 3) = #[ 16rEF 16rBB 16rBF ]) ifTrue: [
        readStream converter: UTF8TextConverter new
    ] ifFalse: [
        readStream converter: MacRomanTextConverter new.
    ].

    "we restore the position to the start of the file again"
    readStream position: 0.
   
    "We put the file in text mode for the file in"
    readStream text.


AND THE WINNER IS...

#selectTextConverterForCode where the filestream is not detected as UTF8
and used converter is MacRoman...


Forcing to UTF8 the converter there, let the code to be imported. But
there are many questions. Like what should be the detection method for
encoding or why the original source.st is iso-8859 does not get imported?

Hilaire

Le 07/09/2015 17:03, Hilaire a écrit :

> So there is hope :)
>
> When debugging the import, I see that at some point the
> MultiByteFileStream got it converter changed to MacRomanTextConverter.
> It seams to happen somewhere in #parseNextDeclaration. Initially just
> before import but a the start of fileIn its converter is UTF8...
>
> Hilaire
>
> Le 07/09/2015 16:30, Sven Van Caekenberghe a écrit :
>> I created a class with one method:
>>
>> Hilaire>>#français
>> "élève"
>>
>> ^ self résoudre
>>
>> I can file that out and back in again in Pharo 4.
>>
>> Here is the .st file:
>>


--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

HilaireFernandes
In reply to this post by Sven Van Caekenberghe-2
Le 07/09/2015 17:15, Sven Van Caekenberghe a écrit :
> Did you try my file out ?

No, I blindly trusted you.
> Can you load it OK ?

So I try it now and it does not work, but not apparently for encoding
reason. Your source is not complete as the class definition is not included,


> Can you file the method out again ?
>
>> > When debugging the import, I see that at some point the
>> > MultiByteFileStream got it converter changed to MacRomanTextConverter.
>> > It seams to happen somewhere in #parseNextDeclaration. Initially just
>> > before import but a the start of fileIn its converter is UTF8...
> You mean using MCZ ?

Not when using the snapshot/souce.st file included in the MCZ.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

Sven Van Caekenberghe-2

> On 07 Sep 2015, at 17:43, Hilaire <[hidden email]> wrote:
>
> Le 07/09/2015 17:15, Sven Van Caekenberghe a écrit :
>> Did you try my file out ?
>
> No, I blindly trusted you.
>> Can you load it OK ?
>
> So I try it now and it does not work, but not apparently for encoding
> reason. Your source is not complete as the class definition is not included,

Yes it is only one method, the class is called Hilaire and is empty ;-)

The point is, maybe your environment is different than mine.

The code you showed is quite horrible.

(1) encoding detection is theoretically impossible
(2) is should certainly not happen there
(3) MacRoman is totally ancient anyway

>> Can you file the method out again ?
>>
>>>> When debugging the import, I see that at some point the
>>>> MultiByteFileStream got it converter changed to MacRomanTextConverter.
>>>> It seams to happen somewhere in #parseNextDeclaration. Initially just
>>>> before import but a the start of fileIn its converter is UTF8...
>> You mean using MCZ ?
>
> Not when using the snapshot/souce.st file included in the MCZ.
>
> Hilaire
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

HilaireFernandes
Le 07/09/2015 18:59, Sven Van Caekenberghe a écrit :
> The code you showed is quite horrible.
>
> (1) encoding detection is theoretically impossible
> (2) is should certainly not happen there
> (3) MacRoman is totally ancient anyway

It looks like MCZ now save .st as UTF-8, so assuming source is UTF8
could simplify theses code.
I don't know the implication of such an assumption however.

At least I recover the MathOntology code.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

Stephan Eggermont-3
On 07-09-15 21:02, Hilaire wrote:
> It looks like MCZ now save .st as UTF-8, so assuming source is UTF8
> could simplify theses code.
> I don't know the implication of such an assumption however.

Not being able to load old code in MacRoman would be bad.

Stephan


Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

stepharo
In reply to this post by HilaireFernandes
Hilaire can you check in Pharo 5.0 because we reintroduced with Guillermo the saving (which was broken) and loading of methods is non-ascii (support for ]lang[ annotation in method.

Stef

Le 7/9/15 16:15, Hilaire a écrit :
Hi,

I am coming back with the MathOntologie's Alain package.

For the reference, it is a set of classes and methods to let hight school student use Smalltalk in French. It let students solve math problem with a computer language using native tongue.
It is a nice effort to ease the entrance to programming and a nice opportunity for Pharo as well.

Sadly, since Pharo 3 (or 2) it can't load in Pharo.
http://forum.world.st/Gettext-with-pharo-3-and-seaside-3-1-2-tp4776814p4776893.html

In Pharo 4, I tried again: I extracted the .st file from the Monticello package, convert it to UTF8. Both file in the original 8bits format file or the utf-8 file failed to load.

I will try to explore further the case. Any hints more than welcome.

Hilaire
-- 
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

Reply | Threaded
Open this post in threaded view
|

Re: Loading code with accent in the method or variable name

HilaireFernandes
It produces the same error with latest Pharo 5.0 when trying both to
install the original .mcz file or the utf8 converted source.st file.

Hilaire

Le 07/09/2015 21:50, stepharo a écrit :

> Hilaire can you check in Pharo 5.0 because we reintroduced with
> Guillermo the saving (which was broken) and loading of methods is
> non-ascii (support for ]lang[ annotation in method.
>
> Stef
>
> Le 7/9/15 16:15, Hilaire a écrit :
>> Hi,
>>
>> I am coming back with the MathOntologie's Alain package.
>>
>> For the reference, it is a set of classes and methods to let hight
>> school student use Smalltalk in French. It let students solve math
>> problem with a computer language using native tongue.
>> It is a nice effort to ease the entrance to programming and a nice
>> opportunity for Pharo as well.
>>
>> Sadly, since Pharo 3 (or 2) it can't load in Pharo.
>> http://forum.world.st/Gettext-with-pharo-3-and-seaside-3-1-2-tp4776814p4776893.html
>>
>> In Pharo 4, I tried again: I extracted the .st file from the
>> Monticello package, convert it to UTF8. Both file in the original
>> 8bits format file or the utf-8 file failed to load.
>>
>> I will try to explore further the case. Any hints more than welcome.
>>
>> Hilaire
>> --
>> Dr. Geo
>> http://drgeo.eu
>> http://google.com/+DrgeoEu
>


--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu