CommandLine handler and UTF8 path

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

CommandLine handler and UTF8 path

HilaireFernandes
Hi,

I have discovered a problem with my DrGeo command line handler. Its
purpose, under linux only for now, is to let the user open a DrGeo
sketch file from the desktop file manager by double-clicking on the file.

Sadly, it seems it does not work when non ascii characters are present
in the path or the filename.

The command line handler parameters are correctly set form the bash
script, I got this output:

drgeo --sketch=/home/hilaire/Axe-Symétrie/axes de symétrie.fgeo


But once in Pharo, the string representing the path to the file is
wrongly interpreted, I have this error from Pharo:

Error: /home/hilaire/Axe-Symétrie/axes de symétrie.fgeo does not exist!

From the DrGeo.sh startup script, Pharo is fired with this instruction:

exec "$VM/pharo" \
    --plugins "$VM" \
    --encoding utf-8 \
    -vm-display-X11 \
    -title "Dr.Geo" \
    $image \
    $DRGEO_OPT"$filename"


Opening this same file directly form DrGeo works correctly.


Any idea?

Thanks

Hilaire


--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: CommandLine handler and UTF8 path

HilaireFernandes
Responding to myself for archiving purpose.

I forgot again Pharo internal string are not utf-8 but ByteString, so
the string need to be converted back to utf8.
For example in one of the handler method, it can be modified as follow:

DrGeoCommandLineHandler>>loadSketch: aString
 | utfString |
 utfString := aString convertFromWithConverter: UTF8TextConverter new.
 self checkForFile: utfString.
 DrGeo fileFullscreen: utfString

Et voilà.

Hilaire

Le 20/01/2015 15:20, Hilaire a écrit :
> But once in Pharo, the string representing the path to the file is
> wrongly interpreted, I have this error from Pharo:
>
> Error: /home/hilaire/Axe-Symétrie/axes de symétrie.fgeo does not exist!
>


--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: CommandLine handler and UTF8 path

Sven Van Caekenberghe-2

> On 20 Jan 2015, at 16:21, Hilaire <[hidden email]> wrote:
>
> I forgot again Pharo internal string are not utf-8 but ByteString, so
> the string need to be converted back to utf8.

No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.

What you see in your particular case is that an external string comes in, originally encoded in UTF-8, yet the VM or maybe some code at the image level, just took the incoming string or bytes and just converted them directly to a Pharo string, not using the proper encoding.

Your fix corrects that fault by re-interpreting the wrongly decoded characters as bytes using the correct encoder.

  http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

The general problem is probably that the VM or image level does not know what encoder to use (just guessing).

Sven

 
Reply | Threaded
Open this post in threaded view
|

Re: CommandLine handler and UTF8 path

HilaireFernandes
Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.

Thanks for the update, and the reference link.

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: CommandLine handler and UTF8 path

Sven Van Caekenberghe-2
Command line arguments enter the image level via VirtualMachine>>#getSystemAttribute:

At that point they are already Strings.

In your case, they must already be wrong at that point.

> On 20 Jan 2015, at 16:51, Hilaire <[hidden email]> wrote:
>
> Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
>> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.
>
> Thanks for the update, and the reference link.
>
> Hilaire
>
> --
> Dr. Geo - http://drgeo.eu
> iStoa - http://istoa.drgeo.eu
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] CommandLine handler and UTF8 path

Eliot Miranda-2


On Tue, Jan 20, 2015 at 8:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:
Command line arguments enter the image level via VirtualMachine>>#getSystemAttribute:

At that point they are already Strings.

ByteString, according to the primitive.  So if the shell supplies e.g. UTF-8 strings for command-line parameters, which the VM sees as bytes, then the ByteString instances answered by getSystemAttribute: would need decoding, right?
 
In your case, they must already be wrong at that point.

Not necessarily.  The  getSystemAttribute: primitive doesn't do decoding.  Perhaps it should.


> On 20 Jan 2015, at 16:51, Hilaire <[hidden email]> wrote:
>
> Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
>> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.
>
> Thanks for the update, and the reference link.
>
> Hilaire
>
> --
> Dr. Geo - http://drgeo.eu
> iStoa - http://istoa.drgeo.eu
>
>





--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] CommandLine handler and UTF8 path

Sven Van Caekenberghe-2
Hi Eliot,

> On 20 Jan 2015, at 20:38, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Tue, Jan 20, 2015 at 8:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> Command line arguments enter the image level via VirtualMachine>>#getSystemAttribute:
>
> At that point they are already Strings.
>
> ByteString, according to the primitive.  So if the shell supplies e.g. UTF-8 strings for command-line parameters, which the VM sees as bytes, then the ByteString instances answered by getSystemAttribute: would need decoding, right?
>  
> In your case, they must already be wrong at that point.
>
> Not necessarily.  The  getSystemAttribute: primitive doesn't do decoding.  Perhaps it should.

Yes, probably. I just tried on Mac OS X, Pharo 4:

$ export FOO=élève-Français

$ echo $FOO
élève-Français

$ ./pharo Pharo.image eval 'OSPlatform current environment at: #FOO'
'élève-Français'

$ ./pharo Pharo.image eval '(OSPlatform current environment at: #FOO) asByteArray utf8Decoded'
'élève-Français'

The question is, is this true for all platforms ? Windows ?

> > On 20 Jan 2015, at 16:51, Hilaire <[hidden email]> wrote:
> >
> > Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
> >> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.
> >
> > Thanks for the update, and the reference link.
> >
> > Hilaire
> >
> > --
> > Dr. Geo - http://drgeo.eu
> > iStoa - http://istoa.drgeo.eu
> >
> >
>
>
>
>
>
> --
> best,
> Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] CommandLine handler and UTF8 path

Eliot Miranda-2


On Tue, Jan 20, 2015 at 12:35 PM, Sven Van Caekenberghe <[hidden email]> wrote:
Hi Eliot,

> On 20 Jan 2015, at 20:38, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Tue, Jan 20, 2015 at 8:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> Command line arguments enter the image level via VirtualMachine>>#getSystemAttribute:
>
> At that point they are already Strings.
>
> ByteString, according to the primitive.  So if the shell supplies e.g. UTF-8 strings for command-line parameters, which the VM sees as bytes, then the ByteString instances answered by getSystemAttribute: would need decoding, right?
>
> In your case, they must already be wrong at that point.
>
> Not necessarily.  The  getSystemAttribute: primitive doesn't do decoding.  Perhaps it should.

Yes, probably. I just tried on Mac OS X, Pharo 4:

$ export FOO=élève-Français

$ echo $FOO
élève-Français

$ ./pharo Pharo.image eval 'OSPlatform current environment at: #FOO'
'élève-Français'

$ ./pharo Pharo.image eval '(OSPlatform current environment at: #FOO) asByteArray utf8Decoded'
'élève-Français'

The question is, is this true for all platforms ? Windows ?

I'm trying to test this in Pharo 3.  I get

KeyNotFound: key #FOO not found in PlatformIndependentEnvironment
PlatformIndependentEnvironment(OSEnvironment)>>at: in Block: [ KeyNotFound signalFor: aKey ]
UndefinedObject>>ifNil:
PlatformIndependentEnvironment(OSEnvironment)>>at:ifAbsent:
PlatformIndependentEnvironment(OSEnvironment)>>at:
UndefinedObject>>DoIt
OpalCompiler>>evaluate
OpalCompiler(AbstractCompiler)>>evaluate:
SmalltalkImage>>evaluate:
EvaluateCommandLineHandler>>evaluate: in Block: [ ...
BlockClosure>>on:do: 


Does the environment access depend on NativeBoost?



> > On 20 Jan 2015, at 16:51, Hilaire <[hidden email]> wrote:
> >
> > Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
> >> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.
> >
> > Thanks for the update, and the reference link.
> >
> > Hilaire
> >
> > --
> > Dr. Geo - http://drgeo.eu
> > iStoa - http://istoa.drgeo.eu
> >
> >
>
>
>
>
>
> --
> best,
> Eliot





--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] CommandLine handler and UTF8 path

Sven Van Caekenberghe-2

> On 21 Jan 2015, at 23:25, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Tue, Jan 20, 2015 at 12:35 PM, Sven Van Caekenberghe <[hidden email]> wrote:
> Hi Eliot,
>
> > On 20 Jan 2015, at 20:38, Eliot Miranda <[hidden email]> wrote:
> >
> >
> >
> > On Tue, Jan 20, 2015 at 8:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> > Command line arguments enter the image level via VirtualMachine>>#getSystemAttribute:
> >
> > At that point they are already Strings.
> >
> > ByteString, according to the primitive.  So if the shell supplies e.g. UTF-8 strings for command-line parameters, which the VM sees as bytes, then the ByteString instances answered by getSystemAttribute: would need decoding, right?
> >
> > In your case, they must already be wrong at that point.
> >
> > Not necessarily.  The  getSystemAttribute: primitive doesn't do decoding.  Perhaps it should.
>
> Yes, probably. I just tried on Mac OS X, Pharo 4:
>
> $ export FOO=élève-Français
>
> $ echo $FOO
> élève-Français
>
> $ ./pharo Pharo.image eval 'OSPlatform current environment at: #FOO'
> 'élève-Français'
>
> $ ./pharo Pharo.image eval '(OSPlatform current environment at: #FOO) asByteArray utf8Decoded'
> 'élève-Français'
>
> The question is, is this true for all platforms ? Windows ?
>
> I'm trying to test this in Pharo 3.  I get
>
> KeyNotFound: key #FOO not found in PlatformIndependentEnvironment
> PlatformIndependentEnvironment(OSEnvironment)>>at: in Block: [ KeyNotFound signalFor: aKey ]
> UndefinedObject>>ifNil:
> PlatformIndependentEnvironment(OSEnvironment)>>at:ifAbsent:
> PlatformIndependentEnvironment(OSEnvironment)>>at:
> UndefinedObject>>DoIt
> OpalCompiler>>evaluate
> OpalCompiler(AbstractCompiler)>>evaluate:
> SmalltalkImage>>evaluate:
> EvaluateCommandLineHandler>>evaluate: in Block: [ ...
> BlockClosure>>on:do:
>
>
> Does the environment access depend on NativeBoost?

Yes, I believe so.

> > > On 20 Jan 2015, at 16:51, Hilaire <[hidden email]> wrote:
> > >
> > > Le 20/01/2015 16:34, Sven Van Caekenberghe a écrit :
> > >> No they are not - Strings and Characters in Pharo are using plain Unicode encoding internally.
> > >
> > > Thanks for the update, and the reference link.
> > >
> > > Hilaire
> > >
> > > --
> > > Dr. Geo - http://drgeo.eu
> > > iStoa - http://istoa.drgeo.eu
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > best,
> > Eliot
>
>
>
>
>
> --
> best,
> Eliot