Hi,
on command line, this works, my file is copied: cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test.pdf In Pharo (4+5) this does not work (file not copied, no error message) OSProcess command: 'cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf'. This works (file is copied): OSProcess command: 'cp /Library/WebServer/Documents/reports/bar.pdf /Library/WebServer/Documents/reports/test-b.pdf'. Seems that german umlauts don't work. Is there something, I can I do, I don't want to replace the umlauts in filenames... Regards Sabine |
Hello Sabine,
Just a suggestion. If your string is not pure ascii you may need to convert it to UTF8 first as this is what likely expect your OS host. Indeed Pharo string are not internally encoded as utf8 Check the UTF8TextConverter class to do so. Hilaire Le 05/06/2016 18:39, Sabine Manaa a écrit : > on command line, this works, my file is copied: > cp /Library/WebServer/Documents/reports/bär.pdf > /Library/WebServer/Documents/reports/test.pdf > > In Pharo (4+5) this does not work (file not copied, no error message) > OSProcess command: 'cp /Library/WebServer/Documents/reports/bär.pdf > /Library/WebServer/Documents/reports/test-a.pdf'. > > This works (file is copied): > OSProcess command: 'cp /Library/WebServer/Documents/reports/bar.pdf > /Library/WebServer/Documents/reports/test-b.pdf'. > > Seems that german umlauts don't work. > > Is there something, I can I do, I don't want to replace the umlauts in > filenames... > > Regards > Sabine > > -- Dr. Geo http://drgeo.eu |
> On 05 Jun 2016, at 21:07, Hilaire <[hidden email]> wrote: > > Hello Sabine, > > Just a suggestion. > If your string is not pure ascii you may need to convert it to UTF8 > first as this is what likely expect your OS host. > > Indeed Pharo string are not internally encoded as utf8 If that is the case, that the OS expects a different encoding, then the OS(Sub)Process implementation should deal with it, not the user of the API. > Check the UTF8TextConverter class to do so. No, these converters are conceptually wrong, since they encode String to String, while it should be String to ByteArray. The ZnCharacterEncoder hierarchy should be used instead. String>>#utf8Encoded is a convenience method that can be used to do a quick conversion. > Hilaire > > Le 05/06/2016 18:39, Sabine Manaa a écrit : >> on command line, this works, my file is copied: >> cp /Library/WebServer/Documents/reports/bär.pdf >> /Library/WebServer/Documents/reports/test.pdf >> >> In Pharo (4+5) this does not work (file not copied, no error message) >> OSProcess command: 'cp /Library/WebServer/Documents/reports/bär.pdf >> /Library/WebServer/Documents/reports/test-a.pdf'. >> >> This works (file is copied): >> OSProcess command: 'cp /Library/WebServer/Documents/reports/bar.pdf >> /Library/WebServer/Documents/reports/test-b.pdf'. >> >> Seems that german umlauts don't work. >> >> Is there something, I can I do, I don't want to replace the umlauts in >> filenames... >> >> Regards >> Sabine >> >> > > -- > Dr. Geo > http://drgeo.eu > > |
|
Hi Dave, I get the german ä with: (Character value: 228) asString Do you want me to go in it and suggest a solution or do you want to try to fix it and I test it? Thanks for helping! Regards Sabine 2016-06-05 23:08 GMT+02:00 David T. Lewis [via Smalltalk] <[hidden email]>:
|
Hi Sven, why ByteArray? does not work (Improper store into indexable object): OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded asString). works: OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) Perhaps David can add this here: command: aCommandString "Run a command in a shell process. Similar to the system(3) call in the standard C library, except that aCommandString runs asynchronously in a child process. The command is run by a ConnectedUnixProcess in order to facilitate command pipelines within Squeak." "UnixProcess thisOSProcess command: 'ls -l /etc'" | proc | pid isNil ifTrue: [self class noAccessorAvailable. ^nil] ifFalse: [proc := self forkJob: ExternalUnixOSProcess defaultShellPath arguments: (Array with: '-c' with: aCommandString utf8Encoded asString) <<<=== environment: nil descriptors: nil. proc ifNil: [self class noAccessorAvailable]. ^ proc] regards Sabine 2016-06-06 8:41 GMT+02:00 Sabine Manaa <[hidden email]>:
|
Hi Sabine,
That's great that #utf8Encoded is working, thanks for confirming. I'll look and see if I can add that to OSProcess (I'm traveling and cannot look at it right now). Mariano - this thread probably applies to OSSubProcess also. Dave > Hi Sven, > > why ByteArray? > > does not work (Improper store into indexable object): > OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf > /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded asString). > > works: > OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf > /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) > > Perhaps David can add this here: > > command: aCommandString > "Run a command in a shell process. Similar to the system(3) call in the > standard C library, > except that aCommandString runs asynchronously in a child process. The > command is > run by a ConnectedUnixProcess in order to facilitate command pipelines > within Squeak." > > "UnixProcess thisOSProcess command: 'ls -l /etc'" > > | proc | > pid isNil > ifTrue: > [self class noAccessorAvailable. ^nil] > ifFalse: > [proc := self > forkJob: ExternalUnixOSProcess defaultShellPath > arguments: (Array with: '-c' with: aCommandString utf8Encoded asString) > <<<=== > environment: nil > descriptors: nil. > proc ifNil: [self class noAccessorAvailable]. > ^ proc] > > > regards > Sabine > > >> >> > > 2016-06-06 8:41 GMT+02:00 Sabine Manaa <[hidden email]>: > >> Hi Dave, >> >> I get the german ä with: >> >> (Character value: 228) asString >> >> Do you want me to go in it and suggest a solution or do you want to try >> to >> fix it and I test it? >> >> Thanks for helping! >> >> Regards Sabine >> >> 2016-06-05 23:08 GMT+02:00 David T. Lewis [via Smalltalk] <[hidden >> email] >> <http:///user/SendEmail.jtp?type=node&node=4899318&i=0>>: >> >>> >>> >>> ------------------------------ >>> If you reply to this email, your message will be added to the >>> discussion >>> below: >>> >>> http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899301.html >>> To start a new topic under Pharo Smalltalk Users, email [hidden email] >>> <http:///user/SendEmail.jtp?type=node&node=4899318&i=1> >>> To unsubscribe from Pharo Smalltalk Users, click here. >>> NAML >>> <http://forum.world.st/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>> >> >> >> ------------------------------ >> View this message in context: Re: OSProcess command with german umlaut >> does not work >> <http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899318.html> >> Sent from the Pharo Smalltalk Users mailing list archive >> <http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html> at >> Nabble.com. >> > |
Dave,
> Am 06.06.2016 um 18:13 schrieb David T. Lewis <[hidden email]>: > > Hi Sabine, > > That's great that #utf8Encoded is working, thanks for confirming. > > I'll look and see if I can add that to OSProcess (I'm traveling and cannot > look at it right now). > > Mariano - this thread probably applies to OSSubProcess also. > Norbert > Dave > >> Hi Sven, >> >> why ByteArray? >> >> does not work (Improper store into indexable object): >> OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf >> /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded asString). >> >> works: >> OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf >> /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) >> >> Perhaps David can add this here: >> >> command: aCommandString >> "Run a command in a shell process. Similar to the system(3) call in the >> standard C library, >> except that aCommandString runs asynchronously in a child process. The >> command is >> run by a ConnectedUnixProcess in order to facilitate command pipelines >> within Squeak." >> >> "UnixProcess thisOSProcess command: 'ls -l /etc'" >> >> | proc | >> pid isNil >> ifTrue: >> [self class noAccessorAvailable. ^nil] >> ifFalse: >> [proc := self >> forkJob: ExternalUnixOSProcess defaultShellPath >> arguments: (Array with: '-c' with: aCommandString utf8Encoded asString) >> <<<=== >> environment: nil >> descriptors: nil. >> proc ifNil: [self class noAccessorAvailable]. >> ^ proc] >> >> >> regards >> Sabine >> >> >>> >>> >> >> 2016-06-06 8:41 GMT+02:00 Sabine Manaa <[hidden email]>: >> >>> Hi Dave, >>> >>> I get the german ä with: >>> >>> (Character value: 228) asString >>> >>> Do you want me to go in it and suggest a solution or do you want to try >>> to >>> fix it and I test it? >>> >>> Thanks for helping! >>> >>> Regards Sabine >>> >>> 2016-06-05 23:08 GMT+02:00 David T. Lewis [via Smalltalk] <[hidden >>> email] >>> <http:///user/SendEmail.jtp?type=node&node=4899318&i=0>>: >>> >>>> >>>> >>>> ------------------------------ >>>> If you reply to this email, your message will be added to the >>>> discussion >>>> below: >>>> >>>> http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899301.html >>>> To start a new topic under Pharo Smalltalk Users, email [hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=4899318&i=1> >>>> To unsubscribe from Pharo Smalltalk Users, click here. >>>> NAML >>>> <http://forum.world.st/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>>> >>> >>> >>> ------------------------------ >>> View this message in context: Re: OSProcess command with german umlaut >>> does not work >>> <http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899318.html> >>> Sent from the Pharo Smalltalk Users mailing list archive >>> <http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html> at >>> Nabble.com. >>> >> > > > |
Norbert,
You are probably right. I'm not sure the best way to handle it. Dave > Dave, > >> Am 06.06.2016 um 18:13 schrieb David T. Lewis <[hidden email]>: >> >> Hi Sabine, >> >> That's great that #utf8Encoded is working, thanks for confirming. >> >> I'll look and see if I can add that to OSProcess (I'm traveling and >> cannot >> look at it right now). >> >> Mariano - this thread probably applies to OSSubProcess also. >> > that would just work if the system locale is utf8, right? Wouldn't it be > better to making that a setting? > > Norbert > >> Dave >> >>> Hi Sven, >>> >>> why ByteArray? >>> >>> does not work (Improper store into indexable object): >>> OSProcess command: ('cp >>> /Library/WebServer/Documents/reports/bär.pdf >>> /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded asString). >>> >>> works: >>> OSProcess command: ('cp >>> /Library/WebServer/Documents/reports/bär.pdf >>> /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) >>> >>> Perhaps David can add this here: >>> >>> command: aCommandString >>> "Run a command in a shell process. Similar to the system(3) call in the >>> standard C library, >>> except that aCommandString runs asynchronously in a child process. The >>> command is >>> run by a ConnectedUnixProcess in order to facilitate command pipelines >>> within Squeak." >>> >>> "UnixProcess thisOSProcess command: 'ls -l /etc'" >>> >>> | proc | >>> pid isNil >>> ifTrue: >>> [self class noAccessorAvailable. ^nil] >>> ifFalse: >>> [proc := self >>> forkJob: ExternalUnixOSProcess defaultShellPath >>> arguments: (Array with: '-c' with: aCommandString utf8Encoded asString) >>> <<<=== >>> environment: nil >>> descriptors: nil. >>> proc ifNil: [self class noAccessorAvailable]. >>> ^ proc] >>> >>> >>> regards >>> Sabine >>> >>> >>>> >>>> >>> >>> 2016-06-06 8:41 GMT+02:00 Sabine Manaa <[hidden email]>: >>> >>>> Hi Dave, >>>> >>>> I get the german ä with: >>>> >>>> (Character value: 228) asString >>>> >>>> Do you want me to go in it and suggest a solution or do you want to >>>> try >>>> to >>>> fix it and I test it? >>>> >>>> Thanks for helping! >>>> >>>> Regards Sabine >>>> >>>> 2016-06-05 23:08 GMT+02:00 David T. Lewis [via Smalltalk] <[hidden >>>> email] >>>> <http:///user/SendEmail.jtp?type=node&node=4899318&i=0>>: >>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> If you reply to this email, your message will be added to the >>>>> discussion >>>>> below: >>>>> >>>>> http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899301.html >>>>> To start a new topic under Pharo Smalltalk Users, email [hidden >>>>> email] >>>>> <http:///user/SendEmail.jtp?type=node&node=4899318&i=1> >>>>> To unsubscribe from Pharo Smalltalk Users, click here. >>>>> NAML >>>>> <http://forum.world.st/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>>>> >>>> >>>> >>>> ------------------------------ >>>> View this message in context: Re: OSProcess command with german umlaut >>>> does not work >>>> <http://forum.world.st/OSProcess-command-with-german-umlaut-does-not-work-tp4899285p4899318.html> >>>> Sent from the Pharo Smalltalk Users mailing list archive >>>> <http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html> at >>>> Nabble.com. >>>> >>> >> >> >> > > |
Hi Dave, Sabine, Norbert et all, Few weeks (months?) ago I was also reviewing this topic of encoding a OS(Sub)Process. After surfing a bit the web, I found out the most simple and accurate answer/solution was indeed to set the correct locale and/or text encoding in the computer in question. Anyway...more answers below. Now... what I don't understand from Sabine is.. she said this one works: OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) But then my question is...does that work only because she's computer locale is UTF8? Or Unix* automatically decodes it and knows it is utf8? If not...should I adapt the #utf8Encoded to the encoding defined by the terminal? mmm In my OSX box I do have UTF8 set: ❯ locale [13:56:49] LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= On Mon, Jun 6, 2016 at 1:42 PM, David T. Lewis <[hidden email]> wrote: Norbert, |
In reply to this post by Sabine Manaa
Sorry, I did a mistake. I reversed it by mistake. asString is needed. does work : OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded asString). does not work (Improper store into indexable object): OSProcess command: ('cp /Library/WebServer/Documents/reports/bär.pdf /Library/WebServer/Documents/reports/test-a.pdf' utf8Encoded) I can try on windows tomorrow if you want. 2016-06-06 17:22 GMT+02:00 Sabine Manaa <[hidden email]>:
|
In reply to this post by Sabine Manaa
> On 06 Jun 2016, at 17:22, Sabine Manaa <[hidden email]> wrote: > > why ByteArray? http://www.unicode.org/faq/utf_bom.html A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. https://en.wikipedia.org/wiki/UTF-8 UTF-8 encodes each of the 1,112,064 valid code points in the Unicode code space (1,114,112 code points minus 2,048 surrogate code points) using one to four 8-bit bytes (a group of 8 bits is known as an octet in the Unicode Standard). In Pharo https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html Of course, given a ByteArray, whose values are all between 0 and 255 by definition, you can convert it to a ByteString. That String is not a correct (Pharo) String anymore, it is like converting a PNG or JPEG to String, you can do it, it is just wrong. When talking to the outside world, be it over a network connection, or via primitive calls, anything but pure ASCII strings need an encoding. This has to be agreed upon by both parties. If the receiving party wants UTF-8 forced into a (kind of) String, that is (still) possible. Your initial solution seems to indicate that this is expected. This (ugly) conversion should be done at an as low level as possible, IMHO. Sven |
Hi Sven, thank you very much for your explanation. I will read the pharo book chapter again tomorrow morning. Each time I have to do with encoding, I have to start again with reading....;-( I was not asking for the reason of encoding but because OSProcess command: needs a String and not a Byte Array. But yes, sure, first encode it and then convert it back to a string. Regards and a nice evening Sabine 2016-06-06 19:57 GMT+02:00 Sven Van Caekenberghe-2 [via Smalltalk] <[hidden email]>:
|
In reply to this post by Sven Van Caekenberghe-2
On Mon, Jun 06, 2016 at 08:34:40PM +0200, Sven Van Caekenberghe wrote:
> > > On 06 Jun 2016, at 17:22, Sabine Manaa <[hidden email]> wrote: > > > > why ByteArray? > > http://www.unicode.org/faq/utf_bom.html > > A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. > > https://en.wikipedia.org/wiki/UTF-8 > > UTF-8 encodes each of the 1,112,064 valid code points in the Unicode code space (1,114,112 code points minus 2,048 surrogate code points) using one to four 8-bit bytes (a group of 8 bits is known as an octet in the Unicode Standard). > > In Pharo > > https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html > > Of course, given a ByteArray, whose values are all between 0 and 255 by definition, you can convert it to a ByteString. That String is not a correct (Pharo) String anymore, it is like converting a PNG or JPEG to String, you can do it, it is just wrong. > > When talking to the outside world, be it over a network connection, or via primitive calls, anything but pure ASCII strings need an encoding. This has to be agreed upon by both parties. If the receiving party wants UTF-8 forced into a (kind of) String, that is (still) possible. > > Your initial solution seems to indicate that this is expected. This (ugly) conversion should be done at an as low level as possible, IMHO. > Hi Sven, Thanks for this concise summary. I think perhaps what is conceptually a problem in my OSProcess implementation is that I allow command arguments to be given in the form of Strings, then pass the byte array contents of those Squeak/Pharo Strings to a Unix shell or to an exec() system call. This is convenient from my point of view, because strings are very easy to use, but it does not account for the differences in mapping from a String to a byte array. It is the byte array that is actually used in the calls to the operating system such as: UnixOSProcessAccessor>>primForkExec: executableFile stdIn: inputFileHandle stdOut: outputFileHandle stdErr: errorFileHandle argBuf: argVec argOffsets: argOffsets envBuf: envVec envOffsets: envOffsets workingDir: pathString At this point, the argVec is composed of "strings" in the C sense of the word, which really means that it contains byte array data from the Strings. And of course, if the string encodings in the Squeak/Pharo strings do not happen to match the string encodings of the operating system, then indeed the byte arrays do not match and we get a "file not found" kind of problem. My hope is that Mariano's assessment is correct, and that we can treat this as the right way to handle the encoding match issues: On Mon, Jun 06, 2016 at 01:59:21PM -0300, Mariano Martinez Peck wrote: > Hi Dave, Sabine, Norbert et all, > > Few weeks (months?) ago I was also reviewing this topic of encoding a > OS(Sub)Process. After surfing a bit the web, I found out the most simple > and accurate answer/solution was indeed to set the correct locale and/or > text encoding in the computer in question. Anyway...more answers below. This certainly sounds like the Right Thing To Do if only it works :-) Dave |
In reply to this post by Sven Van Caekenberghe-2
Sven could you update the class comments of such classes and we should
finish to get rid of them. Your solution is much nicer. Stef Le 5/6/16 à 21:40, Sven Van Caekenberghe a écrit : >> On 05 Jun 2016, at 21:07, Hilaire <[hidden email]> wrote: >> >> Hello Sabine, >> >> Just a suggestion. >> If your string is not pure ascii you may need to convert it to UTF8 >> first as this is what likely expect your OS host. >> >> Indeed Pharo string are not internally encoded as utf8 > If that is the case, that the OS expects a different encoding, then the OS(Sub)Process implementation should deal with it, not the user of the API. > >> Check the UTF8TextConverter class to do so. > No, these converters are conceptually wrong, since they encode String to String, while it should be String to ByteArray. > > The ZnCharacterEncoder hierarchy should be used instead. > > String>>#utf8Encoded is a convenience method that can be used to do a quick conversion. > >> Hilaire >> >> Le 05/06/2016 18:39, Sabine Manaa a écrit : >>> on command line, this works, my file is copied: >>> cp /Library/WebServer/Documents/reports/bär.pdf >>> /Library/WebServer/Documents/reports/test.pdf >>> >>> In Pharo (4+5) this does not work (file not copied, no error message) >>> OSProcess command: 'cp /Library/WebServer/Documents/reports/bär.pdf >>> /Library/WebServer/Documents/reports/test-a.pdf'. >>> >>> This works (file is copied): >>> OSProcess command: 'cp /Library/WebServer/Documents/reports/bar.pdf >>> /Library/WebServer/Documents/reports/test-b.pdf'. >>> >>> Seems that german umlauts don't work. >>> >>> Is there something, I can I do, I don't want to replace the umlauts in >>> filenames... >>> >>> Regards >>> Sabine >>> >>> >> -- >> Dr. Geo >> http://drgeo.eu >> >> > > |
Free forum by Nabble | Edit this page |