Ridiculous we are

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Ridiculous we are

HilaireFernandes
Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
 File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

abergel
:-(

I will soon face the same problem I fear, when I will start my lecture…

Alexandre
-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.



On Sep 22, 2014, at 5:07 PM, Hilaire <[hidden email]> wrote:

Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu



Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Juraj Kubelka-5
In reply to this post by HilaireFernandes
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

HilaireFernandes
In reply to this post by abergel
You can use screenshot.

But back to the issue, in other part of DrGeo, when saving/loading
sketch, path or filename with accent, space are ok.
So not sure what's going on.

Hilaire

Le 22/09/2014 22:15, Alexandre Bergel a écrit :
> :-(
>
> I will soon face the same problem I fear, when I will start my lecture…
>
> Alexandre
>


--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

stepharo
In reply to this post by HilaireFernandes
Hilaire

These are two days that after upgrading my iPhone, the recovery process
crash.
After two days trying I finally succeeded to upload my recovery to my
iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia
screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is
one order of magnitude
more complex than Pharo but the money injected into Pharo is our
collective time and
it is far from being an order of magnitude smaller than several billions.

Stef


On 22/9/14 22:07, Hilaire wrote:

> Hello,
>
> Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
> accent characters (For example, /home/hilaire/Téléchargement/), loading
> font does not work
>
> However font path seems ok:
>   File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
> should be utf-8, right?
>
> I think there are issue on Windows, as some user reported to me.
>
> Holy shit.
>
> Hilaire
>


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

HilaireFernandes
In reply to this post by Juraj Kubelka-5
The issue is already there
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

I try to document it but it is odd, because for some other part in DrGeo
I don't have issue with accented path.
But should not the path be utf-8 encoded? Or is my fresh linuxmint box
using non utf-8 filename, not it can't be.

Hilaire

Le 22/09/2014 22:20, Juraj Kubelka a écrit :
> Can you create an issue? I am cleaning the fonts and in some case I could consider this issue. If it is problem only on Windows, I will need someone’s assistance.
>


--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

HilaireFernandes
In reply to this post by stepharo
Le 22/09/2014 22:35, stepharo a écrit :
> So I do not accept the title of your email. Simply I cannot.

Don't worry, it is a temporary cry/yield of frustration.

--
Dr. Geo - http://drgeo.eu
iStoa - http://istao.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

philippeback
In reply to this post by stepharo
Also, sometimes things do look like "Téléchargement" but are still Downloads under the hood as the OS translates the UI.

Phil


 


On Mon, Sep 22, 2014 at 10:35 PM, stepharo <[hidden email]> wrote:
Hilaire

These are two days that after upgrading my iPhone, the recovery process crash.
After two days trying I finally succeeded to upload my recovery to my iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
more complex than Pharo but the money injected into Pharo is our collective time and
it is far from being an order of magnitude smaller than several billions.

Stef


On 22/9/14 22:07, Hilaire wrote:
Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
  File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire





Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

HilaireFernandes
Le 22/09/2014 23:14, [hidden email] a écrit :
> Also, sometimes things do look like "Téléchargement" but are still
> Downloads under the hood as the OS translates the UI.

Yes, I check within another path of my own like 'été', still same issue.
Strange is I have no issue to search for sketch file with accent. Only
when loading the font.
Hilaire


--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Nicolai Hess
There is a similar issue for windows

can not (always) read permissions for directoryentries on a path with nonascii characters


2014-09-22 23:21 GMT+02:00 Hilaire <[hidden email]>:
Le 22/09/2014 23:14, [hidden email] a écrit :
> Also, sometimes things do look like "Téléchargement" but are still
> Downloads under the hood as the OS translates the UI.

Yes, I check within another path of my own like 'été', still same issue.
Strange is I have no issue to search for sketch file with accent. Only
when loading the font.
Hilaire


--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu



Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Sven Van Caekenberghe-2
In reply to this post by stepharo
I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ?

I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names):

('/tmp' asFileReference / 'été') ensureCreateDirectory.
'/tmp/été' asFileReference exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out |
  out << 'What about Greece ?' ].
('/tmp/été' asFileReference / 'Ελλάδα.txt') exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') contents.

And in a terminal, I get:

$ ls /tmp/été/Ελλάδα.txt
/tmp/été/Ελλάδα.txt

$ cat !$
cat /tmp/été/Ελλάδα.txt
What about Greece ?

This is on Mac OS X.

So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms.

Sven

On 22 Sep 2014, at 22:35, stepharo <[hidden email]> wrote:

> Hilaire
>
> These are two days that after upgrading my iPhone, the recovery process crash.
> After two days trying I finally succeeded to upload my recovery to my iPhone and
> now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
> it restarts. I will have to send my iPhone to Apple for real check.
> Just because I did an update!
>
> So I do not accept the title of your email. Simply I cannot.
>
> Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
> more complex than Pharo but the money injected into Pharo is our collective time and
> it is far from being an order of magnitude smaller than several billions.
>
> Stef
>
>
> On 22/9/14 22:07, Hilaire wrote:
>> Hello,
>>
>> Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
>> accent characters (For example, /home/hilaire/Téléchargement/), loading
>> font does not work
>>
>> However font path seems ok:
>>  File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
>> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
>> should be utf-8, right?
>>
>> I think there are issue on Windows, as some user reported to me.
>>
>> Holy shit.
>>
>> Hilaire
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

LogiqueWerks
In reply to this post by stepharo
so I stay with my 8Gb  iTouch iOS 3 ; with no prospect of an upgrade, I am sorta worry-free. 

If only it were also a phone ...

" Don't dial ... DO ! "

;-)

[ this msg was last seen in my default font ] 


On 22 September 2014 17:35, stepharo <[hidden email]> wrote:
Hilaire

These are two days that after upgrading my iPhone, the recovery process crash.
After two days trying I finally succeeded to upload my recovery to my iPhone and
now my iPhone crashes continously at boot time. I get a nice sepia screenshot and
it restarts. I will have to send my iPhone to Apple for real check.
Just because I did an update!

So I do not accept the title of your email. Simply I cannot.

Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude
more complex than Pharo but the money injected into Pharo is our collective time and
it is far from being an order of magnitude smaller than several billions.

Stef


On 22/9/14 22:07, Hilaire wrote:
Hello,

Tested on Linux, when I move DrGeo.app folder under hierarchy tree with
accent characters (For example, /home/hilaire/Téléchargement/), loading
font does not work

However font path seems ok:
  File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
should be utf-8, right?

I think there are issue on Windows, as some user reported to me.

Holy shit.

Hilaire




Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Damien Cassou
In reply to this post by HilaireFernandes
On Mon, Sep 22, 2014 at 10:07 PM, Hilaire <[hidden email]> wrote:
> However font path seems ok:
>  File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources.
> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it
> should be utf-8, right?


I recently read documents about utf-8 encoding. In all of them, the
author says that pathnames should be kept as is because you never know
which encoding the filesystem uses. So, a filename should probably be
a bytearray.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm."
Winston Churchill

Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

HilaireFernandes
Le 23/09/2014 14:09, Damien Cassou a écrit :
> I recently read documents about utf-8 encoding. In all of them, the
> author says that pathnames should be kept as is because you never know
> which encoding the filesystem uses. So, a filename should probably be
> a bytearray.


yes, but a #é should be encoded in two bytes.
But although it looks strange, I am not sure it is the exact problem
because I can use accented file name for sketch, but problem arise when
loading a font. So may be the code loading a font. (cf my bug report)

Hilaire

--
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Benjamin Pollack-2
In reply to this post by Sven Van Caekenberghe-2
On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe <[hidden email]>  
wrote:

> I also find the way some problems are reported quite disturbing. How  
> much testing did you do ? On which platforms ?
>
> I can do this (in Pharo 3) without any problems (we're talking about  
> arbitrary Unicode characters in path names):
>
> ('/tmp' asFileReference / 'été') ensureCreateDirectory.
> '/tmp/été' asFileReference exists.
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out |
>   out << 'What about Greece ?' ].
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists.
> ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents.
>
> And in a terminal, I get:
>
> $ ls /tmp/été/Ελλάδα.txt
> /tmp/été/Ελλάδα.txt
>
> $ cat !$
> cat /tmp/été/Ελλάδα.txt
> What about Greece ?
>
> This is on Mac OS X.
>
> So this part fundamentally works in the image and on one VM. There might  
> of course be problems in how paths are used in certain places or on  
> certain VM/platforms.
>

Focusing purely on Unicode itself (not the encoding systems), a letter  
like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or  
as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute  
accent).  These will appear identical to the user, but are emphatically  
*not* identical for most software.  The way you're testing here, you will  
not hit any error relating to this concept, ever, because you're using  
Pharo for both generating and consuming the strings.  At the very least,  
we'd need to generate a file named "été" with both forms explicitly and  
see what happens.

Things get even more exciting, though, because Unix says that file names  
are simply arbitrary byte patterns that do not contain the null byte.*  
Thus, you can trivially create a file named "été" using Latin-1 encoding,  
and again using UTF-8 encoding, and again using UTF-7 encoding, and these  
might all be shown to the user as "identically" named, but I guarantee you  
that Pharo will not act sanely with all four of these.  Even on Windows,  
where things are a bit saner (NTFS mandates UTF-16), and where an explicit  
normalization form is preferred (NFC), I just explicitly verified that I  
can trivially inject other normalization forms into the file system.  
Thus, you can still have two files named "été" that nevertheless have  
different names as far as the OS is concerned.

In this case, as far as I can tell, Pharo assumes that all path names are  
Unicode, and does not do any work to convert strings to or from the  
various normalization schemes (looking in Path  
class>>canonicalizeElements:, Path class>>from:delimiter, and  
FileSystemStore>>pathFromString: here).

There's therefore a pretty straightforward fix that Pharo could do:

   1. Path would use ByteArrays as the actual canonical store, and
      provide convenience methods to see what the array decodes to
      in various encodings.  The developer and application can make
      decisions about what encoding system they want to use.
   2. The VM likely needs to be modified to handle this (didn't check)

As much as I wish Hilaire provided more details in his bug report, it's  
worth keeping in mind that not all users, or even all programmers,  
understand the full implications of things like how various Unicode  
normalization and encoding schemes interact in practice with Unix's very  
vague concept of what a file name actually is, so I usually try to  
approach these bug reports carefully and with an open mind.

--Benjamin

* On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas  
I do not believe this holds for e.g. UFS or FUSE-backed file systems, so  
things are a bit subtler there, but the general rule holds.

Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Benjamin Pollack-2
In reply to this post by HilaireFernandes
On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote:

> Le 23/09/2014 14:09, Damien Cassou a écrit :
>> I recently read documents about utf-8 encoding. In all of them, the
>> author says that pathnames should be kept as is because you never know
>> which encoding the filesystem uses. So, a filename should probably be
>> a bytearray.
>
>
> yes, but a #é should be encoded in two bytes.

As noted in my previous message, "é" could be represented as either one or  
two Unicode code points, and these in turn could validly be either two or  
three bytes in UTF-8.  My gut says that $é should be U+00E9, because  
otherwise you should have to use two Characters ($e and $´), but you could  
legitimately argue otherwise as well, and at any rate, #é could definitely  
be either.  This is likely the core of the issue you're hitting.

Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Sven Van Caekenberghe-2

On 24 Sep 2014, at 18:48, Benjamin Pollack <[hidden email]> wrote:

> On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote:
>
>> Le 23/09/2014 14:09, Damien Cassou a écrit :
>>> I recently read documents about utf-8 encoding. In all of them, the
>>> author says that pathnames should be kept as is because you never know
>>> which encoding the filesystem uses. So, a filename should probably be
>>> a bytearray.
>>
>>
>> yes, but a #é should be encoded in two bytes.
>
> As noted in my previous message, "é" could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8.  My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either.  This is likely the core of the issue you're hitting.

Did you read the actual conversation in the issue ?

 https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding.

Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue.

Regarding the pathnames encoding: if the OS itself does not know it, how can we ? I think that the current approach (assuming UTF-8) makes (the most) sense for a system that runs on multiple platforms.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Benjamin Pollack-2
On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]>  
wrote:

>
> Did you read the actual conversation in the issue ?
>
>  https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
>
> It has been renamed and there is a fix (as a change set, not as a slice,  
> yet). Basically, there was a primitive call into a plugin that failed to  
> do encoding.
>

No, I apologize; I missed the bug link.  Thanks for reposting it.

> Now regarding the issues you raised. Pharo does not do Unicode  
> canonicalisation or any of that other fancy stuff (like categorisation,  
> proper ordering and so on). This is another orthogonal and way more  
> general issue.
>
> Regarding the pathnames encoding: if the OS itself does not know it, how  
> can we ?

That's actually the argument *against* using UTF-8 as the standard Pharo  
way to represent filenames--at least on Unix systems.  If Pharo used  
ByteArrays to represent paths, with convenience methods for working with  
UTF-8 (since I do agree that's the most likely thing for a user/dev to  
want), then you'd be able to work with all files no matter what, *and*  
have a convenient way of doing so for the common case.

This is an old discussion, and I do see both sides of it.  In terms of  
SCMs, Mercurial and Git both just say "it's a collection of bytes",  
whereas Subversion says "it's Unicode code points."  This has some  
uncomfortable implications for both systems when working on multiple  
platforms.

--Benjamin

Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Sven Van Caekenberghe-2

On 24 Sep 2014, at 19:09, Benjamin Pollack <[hidden email]> wrote:

> On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> Did you read the actual conversation in the issue ?
>>
>> https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
>>
>> It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding.
>>
>
> No, I apologize; I missed the bug link.  Thanks for reposting it.
>
>> Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue.
>>
>> Regarding the pathnames encoding: if the OS itself does not know it, how can we ?
>
> That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems.  If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case.
>
> This is an old discussion, and I do see both sides of it.  In terms of SCMs, Mercurial and Git both just say "it's a collection of bytes", whereas Subversion says "it's Unicode code points."  This has some uncomfortable implications for both systems when working on multiple platforms.

Benjamin,

I think I understand the concern / situation that you describe. But I fail to see how not-interpreting it and interpreting it in different encodings can work in practice, especially since your point seems to be that there is no meta information that gives a definitive answer.

I would guess that other languages, say Java or Python, have some approach to handle this problem ?

Also, since we are living with the current approach without much problems, I think the issue is not terribly pressing.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Ridiculous we are

Alain Rastoul-2
In reply to this post by Benjamin Pollack-2


Le 24/09/2014 19:09, Benjamin Pollack a écrit :

> If Pharo used > ByteArrays to represent paths, with convenience methods for working with
> UTF-8 (since I do agree that's the most likely thing for a user/dev to
> want), then you'd be able to work with all files no matter what, *and*
> have a convenient way of doing so for the common case.
Hi Ben,
I strongly disagree with you on this point: using byte arrays (or byte
strings) is a pain in an international context.
The OS knows about its encoding: locale for unix, code page for windows.
Windows code pages depends on country, for english windows 1252 (similar
to iso-8859-1), for other european countries, other variations of
8859-xx... (welcome to ISO  soup), same for unix.

Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for
Python) where chars are not bytes and they are not used as byte arrays
but as Character arrays.
Both do conversions from OS character set encoding  to internal encoding
for strings (paths and whatever).

There is already an UTF8 and UTF16 encoding support in Pharo, but the
standard String class uses bytes, and lot of files, directories and
system methods use ByteString class and that is the problem here.
UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is
not the same as an (hypothetical) Utf8String where all (variable length)
chars would be utf8 encoded.
Using a new UTF8 or UTF16 string class could be a major rework,
but taking a decision about about internal string encoding is needed.
As Sven says, there is no emergency and you have a workaround, but
perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
some well defined classes/methods could be a good start for this rework?
IMHO the workaround of using utf8 encoded byte strings is not a good way
to deal with this problem and should not be granted as "the solution".


123