Hello,
Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu |
:-(
I will soon face the same problem I fear, when I will start my lecture… Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. On Sep 22, 2014, at 5:07 PM, Hilaire <[hidden email]> wrote: Hello, |
In reply to this post by HilaireFernandes
CONTENTS DELETED
The author has deleted this message.
|
In reply to this post by abergel
You can use screenshot.
But back to the issue, in other part of DrGeo, when saving/loading sketch, path or filename with accent, space are ok. So not sure what's going on. Hilaire Le 22/09/2014 22:15, Alexandre Bergel a écrit : > :-( > > I will soon face the same problem I fear, when I will start my lecture… > > Alexandre > -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu |
In reply to this post by HilaireFernandes
Hilaire
These are two days that after upgrading my iPhone, the recovery process crash. After two days trying I finally succeeded to upload my recovery to my iPhone and now my iPhone crashes continously at boot time. I get a nice sepia screenshot and it restarts. I will have to send my iPhone to Apple for real check. Just because I did an update! So I do not accept the title of your email. Simply I cannot. Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude more complex than Pharo but the money injected into Pharo is our collective time and it is far from being an order of magnitude smaller than several billions. Stef On 22/9/14 22:07, Hilaire wrote: > Hello, > > Tested on Linux, when I move DrGeo.app folder under hierarchy tree with > accent characters (For example, /home/hilaire/Téléchargement/), loading > font does not work > > However font path seems ok: > File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. > Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it > should be utf-8, right? > > I think there are issue on Windows, as some user reported to me. > > Holy shit. > > Hilaire > |
In reply to this post by Juraj Kubelka-5
The issue is already there
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters I try to document it but it is odd, because for some other part in DrGeo I don't have issue with accented path. But should not the path be utf-8 encoded? Or is my fresh linuxmint box using non utf-8 filename, not it can't be. Hilaire Le 22/09/2014 22:20, Juraj Kubelka a écrit : > Can you create an issue? I am cleaning the fonts and in some case I could consider this issue. If it is problem only on Windows, I will need someone’s assistance. > -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu |
In reply to this post by stepharo
Le 22/09/2014 22:35, stepharo a écrit :
> So I do not accept the title of your email. Simply I cannot. Don't worry, it is a temporary cry/yield of frustration. -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu |
In reply to this post by stepharo
Also, sometimes things do look like "Téléchargement" but are still Downloads under the hood as the OS translates the UI. Phil On Mon, Sep 22, 2014 at 10:35 PM, stepharo <[hidden email]> wrote: Hilaire |
Le 22/09/2014 23:14, [hidden email] a écrit :
> Also, sometimes things do look like "Téléchargement" but are still > Downloads under the hood as the OS translates the UI. Yes, I check within another path of my own like 'été', still same issue. Strange is I have no issue to search for sketch file with accent. Only when loading the font. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu |
There is a similar issue for windows can not (always) read permissions for directoryentries on a path with nonascii characters 2014-09-22 23:21 GMT+02:00 Hilaire <[hidden email]>: Le 22/09/2014 23:14, [hidden email] a écrit : |
In reply to this post by stepharo
I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ?
I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names): ('/tmp' asFileReference / 'été') ensureCreateDirectory. '/tmp/été' asFileReference exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out | out << 'What about Greece ?' ]. ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents. And in a terminal, I get: $ ls /tmp/été/Ελλάδα.txt /tmp/été/Ελλάδα.txt $ cat !$ cat /tmp/été/Ελλάδα.txt What about Greece ? This is on Mac OS X. So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms. Sven On 22 Sep 2014, at 22:35, stepharo <[hidden email]> wrote: > Hilaire > > These are two days that after upgrading my iPhone, the recovery process crash. > After two days trying I finally succeeded to upload my recovery to my iPhone and > now my iPhone crashes continously at boot time. I get a nice sepia screenshot and > it restarts. I will have to send my iPhone to Apple for real check. > Just because I did an update! > > So I do not accept the title of your email. Simply I cannot. > > Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude > more complex than Pharo but the money injected into Pharo is our collective time and > it is far from being an order of magnitude smaller than several billions. > > Stef > > > On 22/9/14 22:07, Hilaire wrote: >> Hello, >> >> Tested on Linux, when I move DrGeo.app folder under hierarchy tree with >> accent characters (For example, /home/hilaire/Téléchargement/), loading >> font does not work >> >> However font path seems ok: >> File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. >> Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it >> should be utf-8, right? >> >> I think there are issue on Windows, as some user reported to me. >> >> Holy shit. >> >> Hilaire >> > > |
In reply to this post by stepharo
so I stay with my 8Gb iTouch iOS 3 ; with no prospect of an upgrade, I am sorta worry-free. If only it were also a phone ... " Don't dial ... DO ! " ;-) [ this msg was last seen in my default font ] On 22 September 2014 17:35, stepharo <[hidden email]> wrote: Hilaire |
In reply to this post by HilaireFernandes
On Mon, Sep 22, 2014 at 10:07 PM, Hilaire <[hidden email]> wrote:
> However font path seems ok: > File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. > Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it > should be utf-8, right? I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. -- Damien Cassou http://damiencassou.seasidehosting.st "Success is the ability to go from one failure to another without losing enthusiasm." Winston Churchill |
Le 23/09/2014 14:09, Damien Cassou a écrit :
> I recently read documents about utf-8 encoding. In all of them, the > author says that pathnames should be kept as is because you never know > which encoding the filesystem uses. So, a filename should probably be > a bytearray. yes, but a #é should be encoded in two bytes. But although it looks strange, I am not sure it is the exact problem because I can use accented file name for sketch, but problem arise when loading a font. So may be the code loading a font. (cf my bug report) Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu |
In reply to this post by Sven Van Caekenberghe-2
On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe <[hidden email]>
wrote: > I also find the way some problems are reported quite disturbing. How > much testing did you do ? On which platforms ? > > I can do this (in Pharo 3) without any problems (we're talking about > arbitrary Unicode characters in path names): > > ('/tmp' asFileReference / 'été') ensureCreateDirectory. > '/tmp/été' asFileReference exists. > ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out | > out << 'What about Greece ?' ]. > ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists. > ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents. > > And in a terminal, I get: > > $ ls /tmp/été/Ελλάδα.txt > /tmp/été/Ελλάδα.txt > > $ cat !$ > cat /tmp/été/Ελλάδα.txt > What about Greece ? > > This is on Mac OS X. > > So this part fundamentally works in the image and on one VM. There might > of course be problems in how paths are used in certain places or on > certain VM/platforms. > Focusing purely on Unicode itself (not the encoding systems), a letter like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute accent). These will appear identical to the user, but are emphatically *not* identical for most software. The way you're testing here, you will not hit any error relating to this concept, ever, because you're using Pharo for both generating and consuming the strings. At the very least, we'd need to generate a file named "été" with both forms explicitly and see what happens. Things get even more exciting, though, because Unix says that file names are simply arbitrary byte patterns that do not contain the null byte.* Thus, you can trivially create a file named "été" using Latin-1 encoding, and again using UTF-8 encoding, and again using UTF-7 encoding, and these might all be shown to the user as "identically" named, but I guarantee you that Pharo will not act sanely with all four of these. Even on Windows, where things are a bit saner (NTFS mandates UTF-16), and where an explicit normalization form is preferred (NFC), I just explicitly verified that I can trivially inject other normalization forms into the file system. Thus, you can still have two files named "été" that nevertheless have different names as far as the OS is concerned. In this case, as far as I can tell, Pharo assumes that all path names are Unicode, and does not do any work to convert strings to or from the various normalization schemes (looking in Path class>>canonicalizeElements:, Path class>>from:delimiter, and FileSystemStore>>pathFromString: here). There's therefore a pretty straightforward fix that Pharo could do: 1. Path would use ByteArrays as the actual canonical store, and provide convenience methods to see what the array decodes to in various encodings. The developer and application can make decisions about what encoding system they want to use. 2. The VM likely needs to be modified to handle this (didn't check) As much as I wish Hilaire provided more details in his bug report, it's worth keeping in mind that not all users, or even all programmers, understand the full implications of things like how various Unicode normalization and encoding schemes interact in practice with Unix's very vague concept of what a file name actually is, so I usually try to approach these bug reports carefully and with an open mind. --Benjamin * On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas I do not believe this holds for e.g. UFS or FUSE-backed file systems, so things are a bit subtler there, but the general rule holds. |
In reply to this post by HilaireFernandes
On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote:
> Le 23/09/2014 14:09, Damien Cassou a écrit : >> I recently read documents about utf-8 encoding. In all of them, the >> author says that pathnames should be kept as is because you never know >> which encoding the filesystem uses. So, a filename should probably be >> a bytearray. > > > yes, but a #é should be encoded in two bytes. As noted in my previous message, "é" could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting. |
On 24 Sep 2014, at 18:48, Benjamin Pollack <[hidden email]> wrote: > On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <[hidden email]> wrote: > >> Le 23/09/2014 14:09, Damien Cassou a écrit : >>> I recently read documents about utf-8 encoding. In all of them, the >>> author says that pathnames should be kept as is because you never know >>> which encoding the filesystem uses. So, a filename should probably be >>> a bytearray. >> >> >> yes, but a #é should be encoded in two bytes. > > As noted in my previous message, "é" could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting. Did you read the actual conversation in the issue ? https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. Regarding the pathnames encoding: if the OS itself does not know it, how can we ? I think that the current approach (assuming UTF-8) makes (the most) sense for a system that runs on multiple platforms. Sven |
On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]>
wrote: > > Did you read the actual conversation in the issue ? > > https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters > > It has been renamed and there is a fix (as a change set, not as a slice, > yet). Basically, there was a primitive call into a plugin that failed to > do encoding. > No, I apologize; I missed the bug link. Thanks for reposting it. > Now regarding the issues you raised. Pharo does not do Unicode > canonicalisation or any of that other fancy stuff (like categorisation, > proper ordering and so on). This is another orthogonal and way more > general issue. > > Regarding the pathnames encoding: if the OS itself does not know it, how > can we ? That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say "it's a collection of bytes", whereas Subversion says "it's Unicode code points." This has some uncomfortable implications for both systems when working on multiple platforms. --Benjamin |
On 24 Sep 2014, at 19:09, Benjamin Pollack <[hidden email]> wrote: > On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <[hidden email]> wrote: > >> >> Did you read the actual conversation in the issue ? >> >> https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters >> >> It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. >> > > No, I apologize; I missed the bug link. Thanks for reposting it. > >> Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. >> >> Regarding the pathnames encoding: if the OS itself does not know it, how can we ? > > That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. > > This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say "it's a collection of bytes", whereas Subversion says "it's Unicode code points." This has some uncomfortable implications for both systems when working on multiple platforms. Benjamin, I think I understand the concern / situation that you describe. But I fail to see how not-interpreting it and interpreting it in different encodings can work in practice, especially since your point seems to be that there is no meta information that gives a definitive answer. I would guess that other languages, say Java or Python, have some approach to handle this problem ? Also, since we are living with the current approach without much problems, I think the issue is not terribly pressing. Sven |
In reply to this post by Benjamin Pollack-2
Le 24/09/2014 19:09, Benjamin Pollack a écrit : > If Pharo used > ByteArrays to represent paths, with convenience methods for working with > UTF-8 (since I do agree that's the most likely thing for a user/dev to > want), then you'd be able to work with all files no matter what, *and* > have a convenient way of doing so for the common case. Hi Ben, I strongly disagree with you on this point: using byte arrays (or byte strings) is a pain in an international context. The OS knows about its encoding: locale for unix, code page for windows. Windows code pages depends on country, for english windows 1252 (similar to iso-8859-1), for other european countries, other variations of 8859-xx... (welcome to ISO soup), same for unix. Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for Python) where chars are not bytes and they are not used as byte arrays but as Character arrays. Both do conversions from OS character set encoding to internal encoding for strings (paths and whatever). There is already an UTF8 and UTF16 encoding support in Pharo, but the standard String class uses bytes, and lot of files, directories and system methods use ByteString class and that is the problem here. UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is not the same as an (hypothetical) Utf8String where all (variable length) chars would be utf8 encoded. Using a new UTF8 or UTF16 string class could be a major rework, but taking a decision about about internal string encoding is needed. As Sven says, there is no emergency and you have a workaround, but perhaps using the existing WideString encoded as UTF16 (or UTF32?) in some well defined classes/methods could be a good start for this rework? IMHO the workaround of using utf8 encoded byte strings is not a good way to deal with this problem and should not be granted as "the solution". |
Free forum by Nabble | Edit this page |