Smalltalk › Pharo › Pharo Smalltalk Users

Ridiculous we are

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

43 messages Options

123

HilaireFernandes

Ridiculous we are

Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu

http://drgeo.eu

abergel

Re: Ridiculous we are

:-(

I will soon face the same problem I fear, when I will start my lecture…

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Sep 22, 2014, at 5:07 PM, Hilaire <[hidden email]> wrote:

Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu

Juraj Kubelka-5

Re: Ridiculous we are

In reply to this post by HilaireFernandes

CONTENTS DELETED

The author has deleted this message.

HilaireFernandes

Re: Ridiculous we are

In reply to this post by abergel

You can use screenshot.

But back to the issue, in other part of DrGeo, when saving/loading
sketch, path or filename with accent, space are ok.
So not sure what's going on.

Hilaire

Le 22/09/2014 22:15, Alexandre Bergel a écrit :
> :-(
>
> I will soon face the same problem I fear, when I will start my lecture…
>
> Alexandre
>

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu

http://drgeo.eu

stepharo

Re: Ridiculous we are

In reply to this post by HilaireFernandes

Hilaire

These are two days that after upgrading my iPhone, the recovery process
crash.
After two days trying I finally succeeded to upload my recovery to my
iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia
screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is
one order of magnitude
more complex than Pharo but the money injected into Pharo is our
collective time and
it is far from being an order of magnitude smaller than several billions.

Stef

On 22/9/14 22:07, Hilaire wrote:

> Hello,
>
> Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
> accent characters (For example, /home/hilaire/Téléchargement/), loading
> font does not work
>
> However font path seems ok:
> File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
> should be utf-8, right?
>
> I think there are issue on Windows, as some user reported to me.
>
> Holy shit.
>
> Hilaire
>

HilaireFernandes

Re: Ridiculous we are

In reply to this post by Juraj Kubelka-5

The issue is already there
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

I try to document it but it is odd, because for some other part in DrGeo
I don't have issue with accented path.
But should not the path be utf-8 encoded? Or is my fresh linuxmint box
using non utf-8 filename, not it can't be.

Hilaire

Le 22/09/2014 22:20, Juraj Kubelka a écrit :
> Can you create an issue? I am cleaning the fonts and in some case I could consider this issue. If it is problem only on Windows, I will need someone’s assistance.
>

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu

http://drgeo.eu

HilaireFernandes

Re: Ridiculous we are

In reply to this post by stepharo

Le 22/09/2014 22:35, stepharo a écrit :
> So I do not accept the title of your email. Simply I cannot.

Don't worry, it is a temporary cry/yield of frustration.

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu

http://drgeo.eu

philippeback

Re: Ridiculous we are

In reply to this post by stepharo

Also, sometimes things do look like "Téléchargement" but are still Downloads under the hood as the OS translates the UI.

Phil

On Mon, Sep 22, 2014 at 10:35 PM, stepharo <[hidden email]> wrote:

Hilaire

These are two days that after upgrading my iPhone, the recovery process crash.
After two days trying I finally succeeded to upload my recovery to my iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
more complex than Pharo but the money injected into Pharo is our collective time and
it is far from being an order of magnitude smaller than several billions.

Stef

On 22/9/14 22:07, Hilaire wrote:

Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

HilaireFernandes

Re: Ridiculous we are

Le 22/09/2014 23:14, [hidden email] a écrit :
> Also, sometimes things do look like "Téléchargement" but are still
> Downloads under the hood as the OS translates the UI.

Yes, I check within another path of my own like 'été', still same issue.
Strange is I have no issue to search for sketch file with accent. Only
when loading the font.
Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu

http://drgeo.eu

Nicolai Hess

Re: Ridiculous we are

There is a similar issue for windows

13127

can not (always) read permissions for directoryentries on a path with nonascii characters

2014-09-22 23:21 GMT+02:00 Hilaire <[hidden email]>:

Le 22/09/2014 23:14, [hidden email] a écrit :
> Also, sometimes things do look like "Téléchargement" but are still
> Downloads under the hood as the OS translates the UI.

Yes, I check within another path of my own like 'été', still same issue.
Strange is I have no issue to search for sketch file with accent. Only
when loading the font.
Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu

Sven Van Caekenberghe-2

Re: Ridiculous we are

In reply to this post by stepharo

I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ?

I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names):

('/tmp' asFileReference / 'été') ensureCreateDirectory.
'/tmp/été' asFileReference exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out |
out << 'What about Greece ?' ].
('/tmp/été' asFileReference / 'Ελλάδα.txt') exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') contents.

And in a terminal, I get:

$ ls /tmp/été/Ελλάδα.txt
/tmp/été/Ελλάδα.txt

$ cat !$
cat /tmp/été/Ελλάδα.txt
What about Greece ?

This is on Mac OS X.

So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms.

Sven

On 22 Sep 2014, at 22:35, stepharo <[hidden email]> wrote:

> Hilaire
>
> These are two days that after upgrading my iPhone, the recovery process crash.
> After two days trying I finally succeeded to upload my recovery to my iPhone and
> now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
> it restarts. I will have to send my iPhone to Apple for real check.
> Just because I did an update!
>
> So I do not accept the title of your email. Simply I cannot.
>
> Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
> more complex than Pharo but the money injected into Pharo is our collective time and
> it is far from being an order of magnitude smaller than several billions.
>
> Stef
>
>
> On 22/9/14 22:07, Hilaire wrote:
>> Hello,
>>
>> Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
>> accent characters (For example, /home/hilaire/Téléchargement/), loading
>> font does not work
>>
>> However font path seems ok:
>> File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
>> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
>> should be utf-8, right?
>>
>> I think there are issue on Windows, as some user reported to me.
>>
>> Holy shit.
>>
>> Hilaire
>>
>
>

LogiqueWerks

Re: Ridiculous we are

In reply to this post by stepharo

so I stay with my 8Gb iTouch iOS 3 ; with no prospect of an upgrade, I am sorta worry-free.

If only it were also a phone ...

" Don't dial ... DO ! "

;-)

[ this msg was last seen in my default font ]

On 22 September 2014 17:35, stepharo <[hidden email]> wrote:

Hilaire

These are two days that after upgrading my iPhone, the recovery process crash.
After two days trying I finally succeeded to upload my recovery to my iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
more complex than Pharo but the money injected into Pharo is our collective time and
it is far from being an order of magnitude smaller than several billions.

Stef

On 22/9/14 22:07, Hilaire wrote:

Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

Damien Cassou

Re: Ridiculous we are

In reply to this post by HilaireFernandes

On Mon, Sep 22, 2014 at 10:07 PM, Hilaire <[hidden email]> wrote:
> However font path seems ok:
> File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
> should be utf-8, right?

I recently read documents about utf-8 encoding. In all of them, the
author says that pathnames should be kept as is because you never know
which encoding the filesystem uses. So, a filename should probably be
a bytearray.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm."
Winston Churchill

HilaireFernandes

Re: Ridiculous we are

Le 23/09/2014 14:09, Damien Cassou a écrit :
> I recently read documents about utf-8 encoding. In all of them, the
> author says that pathnames should be kept as is because you never know
> which encoding the filesystem uses. So, a filename should probably be
> a bytearray.

yes, but a #é should be encoded in two bytes.
But although it looks strange, I am not sure it is the exact problem
because I can use accented file name for sketch, but problem arise when
loading a font. So may be the code loading a font. (cf my bug report)

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu

http://drgeo.eu

Benjamin Pollack-2

Re: Ridiculous we are

In reply to this post by Sven Van Caekenberghe-2

On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe <[hidden email]>
wrote:

> I also find the way some problems are reported quite disturbing. How
> much testing did you do ? On which platforms ?
>
> I can do this (in Pharo 3) without any problems (we're talking about
> arbitrary Unicode characters in path names):
>
> ('/tmp' asFileReference / 'été') ensureCreateDirectory.
> '/tmp/été' asFileReference exists.
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out |
> out << 'What about Greece ?' ].
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists.
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents.
>
> And in a terminal, I get:
>
> $ ls /tmp/été/Ελλάδα.txt
> /tmp/été/Ελλάδα.txt
>
> $ cat !$
> cat /tmp/été/Ελλάδα.txt
> What about Greece ?
>
> This is on Mac OS X.
>
> So this part fundamentally works in the image and on one VM. There might
> of course be problems in how paths are used in certain places or on
> certain VM/platforms.
>

Focusing purely on Unicode itself (not the encoding systems), a letter
like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or
as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute
accent). These will appear identical to the user, but are emphatically
*not* identical for most software. The way you're testing here, you will
not hit any error relating to this concept, ever, because you're using
Pharo for both generating and consuming the strings. At the very least,
we'd need to generate a file named "été" with both forms explicitly and
see what happens.

Things get even more exciting, though, because Unix says that file names
are simply arbitrary byte patterns that do not contain the null byte.*
Thus, you can trivially create a file named "été" using Latin-1 encoding,
and again using UTF-8 encoding, and again using UTF-7 encoding, and these
might all be shown to the user as "identically" named, but I guarantee you
that Pharo will not act sanely with all four of these. Even on Windows,
where things are a bit saner (NTFS mandates UTF-16), and where an explicit
normalization form is preferred (NFC), I just explicitly verified that I
can trivially inject other normalization forms into the file system.
Thus, you can still have two files named "été" that nevertheless have
different names as far as the OS is concerned.

In this case, as far as I can tell, Pharo assumes that all path names are
Unicode, and does not do any work to convert strings to or from the
various normalization schemes (looking in Path
class>>canonicalizeElements:, Path class>>from:delimiter, and
FileSystemStore>>pathFromString: here).

There's therefore a pretty straightforward fix that Pharo could do:

1. Path would use ByteArrays as the actual canonical store, and
provide convenience methods to see what the array decodes to
in various encodings. The developer and application can make
decisions about what encoding system they want to use.
2. The VM likely needs to be modified to handle this (didn't check)

As much as I wish Hilaire provided more details in his bug report, it's
worth keeping in mind that not all users, or even all programmers,
understand the full implications of things like how various Unicode
normalization and encoding schemes interact in practice with Unix's very
vague concept of what a file name actually is, so I usually try to
approach these bug reports carefully and with an open mind.

--Benjamin

* On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas
I do not believe this holds for e.g. UFS or FUSE-backed file systems, so
things are a bit subtler there, but the general rule holds.

Benjamin Pollack-2

Re: Ridiculous we are

In reply to this post by HilaireFernandes

On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote:

> Le 23/09/2014 14:09, Damien Cassou a écrit :
>> I recently read documents about utf-8 encoding. In all of them, the
>> author says that pathnames should be kept as is because you never know
>> which encoding the filesystem uses. So, a filename should probably be
>> a bytearray.
>
>
> yes, but a #é should be encoded in two bytes.

As noted in my previous message, "é" could be represented as either one or
two Unicode code points, and these in turn could validly be either two or
three bytes in UTF-8. My gut says that $é should be U+00E9, because
otherwise you should have to use two Characters ($e and $´), but you could
legitimately argue otherwise as well, and at any rate, #é could definitely
be either. This is likely the core of the issue you're hitting.

Sven Van Caekenberghe-2

Re: Ridiculous we are

On 24 Sep 2014, at 18:48, Benjamin Pollack <[hidden email]> wrote:

> On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote:
>
>> Le 23/09/2014 14:09, Damien Cassou a écrit :
>>> I recently read documents about utf-8 encoding. In all of them, the
>>> author says that pathnames should be kept as is because you never know
>>> which encoding the filesystem uses. So, a filename should probably be
>>> a bytearray.
>>
>>
>> yes, but a #é should be encoded in two bytes.
>
> As noted in my previous message, "é" could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting.

Did you read the actual conversation in the issue ?

https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding.

Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue.

Regarding the pathnames encoding: if the OS itself does not know it, how can we ? I think that the current approach (assuming UTF-8) makes (the most) sense for a system that runs on multiple platforms.

Sven

Benjamin Pollack-2

Re: Ridiculous we are

On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]>
wrote:

>
> Did you read the actual conversation in the issue ?
>
> https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
>
> It has been renamed and there is a fix (as a change set, not as a slice,
> yet). Basically, there was a primitive call into a plugin that failed to
> do encoding.
>

No, I apologize; I missed the bug link. Thanks for reposting it.

> Now regarding the issues you raised. Pharo does not do Unicode
> canonicalisation or any of that other fancy stuff (like categorisation,
> proper ordering and so on). This is another orthogonal and way more
> general issue.
>
> Regarding the pathnames encoding: if the OS itself does not know it, how
> can we ?

That's actually the argument *against* using UTF-8 as the standard Pharo
way to represent filenames--at least on Unix systems. If Pharo used
ByteArrays to represent paths, with convenience methods for working with
UTF-8 (since I do agree that's the most likely thing for a user/dev to
want), then you'd be able to work with all files no matter what, *and*
have a convenient way of doing so for the common case.

This is an old discussion, and I do see both sides of it. In terms of
SCMs, Mercurial and Git both just say "it's a collection of bytes",
whereas Subversion says "it's Unicode code points." This has some
uncomfortable implications for both systems when working on multiple
platforms.

--Benjamin

Sven Van Caekenberghe-2

Re: Ridiculous we are

On 24 Sep 2014, at 19:09, Benjamin Pollack <[hidden email]> wrote:

> On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> Did you read the actual conversation in the issue ?
>>
>> https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
>>
>> It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding.
>>
>
> No, I apologize; I missed the bug link. Thanks for reposting it.
>
>> Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue.
>>
>> Regarding the pathnames encoding: if the OS itself does not know it, how can we ?
>
> That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case.
>
> This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say "it's a collection of bytes", whereas Subversion says "it's Unicode code points." This has some uncomfortable implications for both systems when working on multiple platforms.

Benjamin,

I think I understand the concern / situation that you describe. But I fail to see how not-interpreting it and interpreting it in different encodings can work in practice, especially since your point seems to be that there is no meta information that gives a definitive answer.

I would guess that other languages, say Java or Python, have some approach to handle this problem ?

Also, since we are living with the current approach without much problems, I think the issue is not terribly pressing.

Sven

Alain Rastoul-2

Re: Ridiculous we are

In reply to this post by Benjamin Pollack-2

Le 24/09/2014 19:09, Benjamin Pollack a écrit :

> If Pharo used > ByteArrays to represent paths, with convenience methods for working with
> UTF-8 (since I do agree that's the most likely thing for a user/dev to
> want), then you'd be able to work with all files no matter what, *and*
> have a convenient way of doing so for the common case.
Hi Ben,
I strongly disagree with you on this point: using byte arrays (or byte
strings) is a pain in an international context.
The OS knows about its encoding: locale for unix, code page for windows.
Windows code pages depends on country, for english windows 1252 (similar
to iso-8859-1), for other european countries, other variations of
8859-xx... (welcome to ISO soup), same for unix.

Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for
Python) where chars are not bytes and they are not used as byte arrays
but as Character arrays.
Both do conversions from OS character set encoding to internal encoding
for strings (paths and whatever).

There is already an UTF8 and UTF16 encoding support in Pharo, but the
standard String class uses bytes, and lot of files, directories and
system methods use ByteString class and that is the problem here.
UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is
not the same as an (hypothetical) Utf8String where all (variable length)
chars would be utf8 encoded.
Using a new UTF8 or UTF16 string class could be a major rework,
but taking a decision about about internal string encoding is needed.
As Sven says, there is no emergency and you have a workaround, but
perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
some well defined classes/methods could be a good start for this rework?
IMHO the workaround of using utf8 encoded byte strings is not a good way
to deal with this problem and should not be granted as "the solution".

123