Hi Folks - Since I just went through all of this, can someone explain to me what string encoding the Unix and Mac VMs use for interfacing the file, directory and clipboard functions? If these are all UTF-8 based (which I suspect) then should we just define that *all* strings passed to the VM are to be interpreted as UTF-8 and any VM or function that doesn't deal with UTF-8 correctly is considered broken and needs fixing? It strikes me as a nice, elegant solution to solve this problem once and forever. Comments, anyone? Cheers, - Andreas |
ok the mac carbon vm, and I believe with the unix os-x vm let you specify what format the file/directory/drag-drop information is in . By default the os-x carbon vm uses macroman because of issues with the file list dialog and how it assumes it knows what the file/ directory names should be translated in various version of Squeak. For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I believe is MacRoman I'll note from http://en.wikipedia.org/wiki/UTF-8 The Mac OS X Operating System uses canonically decomposed Unicode, encoded using UTF-8 for file names in the filesystem. So saying it's UTF8 is well not quite all the picture when it comes to UTF8. In early May I applied some fixes to the Mac Carbon VM to address issues with pre-composed versus canonically decomposed Unicode UTF8 translation based on suggestions from Tetsuya Hayashi and further testing. > sqMacUnixFileInterface.c Tetsuya HAYASHI, [hidden email], > [hidden email] I've found the latest mac vm (or recent version) > fails to normalize UTF file name. > It seems to be the function convertChars() of > sqMacUnixFileInterface.c, which normalizes only decompose when > converting squeak string to unix, > but I think it needs pre-combined when unix string to > squeak, and I noticed normalization form should be canonical > (exactly should be > kCFStringNormalizationFormC) for pre-combined. I cannot say if this is also an issue with the unix VM. As for the clipboard the old primitives assume macroman. The extended os-x clipboard plugin lets you pass any character format you wish based on mime-type. Should that be text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote: > Hi Folks - > > Since I just went through all of this, can someone explain to me > what string encoding the Unix and Mac VMs use for interfacing the > file, directory and clipboard functions? If these are all UTF-8 > based (which I suspect) then should we just define that *all* > strings passed to the VM are to be interpreted as UTF-8 and any VM > or function that doesn't deal with UTF-8 correctly is considered > broken and needs fixing? It strikes me as a nice, elegant solution > to solve this problem once and forever. > > Comments, anyone? > > Cheers, > - Andreas -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Oh, how interesting. I had no idea that there is UTF-8 and UTF-8. So much for my proposal, I guess ;-) Cheers, - Andreas John M McIntosh wrote: > > ok the mac carbon vm, and I believe with the unix os-x vm let you > specify what format the file/directory/drag-drop information is in . > > By default the os-x carbon vm uses macroman because of issues with the > file list dialog and how it assumes it knows what the file/directory > names should be translated in various > version of Squeak. > > For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I believe > is MacRoman > > I'll note from http://en.wikipedia.org/wiki/UTF-8 > > The Mac OS X Operating System uses canonically decomposed Unicode, > encoded using UTF-8 for file names in the filesystem. > So saying it's UTF8 is well not quite all the picture when it comes to > UTF8. > > In early May I applied some fixes to the Mac Carbon VM to address issues > with pre-composed versus canonically decomposed Unicode UTF8 translation > based on suggestions from > Tetsuya Hayashi and further testing. > >> sqMacUnixFileInterface.c Tetsuya HAYASHI, >> [hidden email], [hidden email] I've found the latest mac vm (or >> recent version) fails to normalize UTF file name. >> It seems to be the function >> convertChars() of sqMacUnixFileInterface.c, which normalizes only >> decompose when converting squeak string to unix, >> but I think it needs >> pre-combined when unix string to squeak, and I noticed normalization >> form should be canonical (exactly should be >> kCFStringNormalizationFormC) >> for pre-combined. > > > I cannot say if this is also an issue with the unix VM. > > > As for the clipboard the old primitives assume macroman. The extended > os-x clipboard plugin lets you pass any character format you wish based > on mime-type. Should that be > text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG > > > On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote: > >> Hi Folks - >> >> Since I just went through all of this, can someone explain to me what >> string encoding the Unix and Mac VMs use for interfacing the file, >> directory and clipboard functions? If these are all UTF-8 based (which >> I suspect) then should we just define that *all* strings passed to the >> VM are to be interpreted as UTF-8 and any VM or function that doesn't >> deal with UTF-8 correctly is considered broken and needs fixing? It >> strikes me as a nice, elegant solution to solve this problem once and >> forever. >> >> Comments, anyone? >> >> Cheers, >> - Andreas > > -- > =========================================================================== > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > =========================================================================== > > > > > -- > =========================================================================== > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > =========================================================================== > > |
In reply to this post by Andreas.Raab
On Jun 3, 2007, at 6:34 , Andreas Raab wrote: > Hi Folks - > > Since I just went through all of this, can someone explain to me > what string encoding the Unix and Mac VMs use for interfacing the > file, directory and clipboard functions? If these are all UTF-8 > based (which I suspect) then should we just define that *all* > strings passed to the VM are to be interpreted as UTF-8 and any VM > or function that doesn't deal with UTF-8 correctly is considered > broken and needs fixing? It strikes me as a nice, elegant solution > to solve this problem once and forever. As John mentioned, the unix VM has command line options to choose the encoding that is presented to unix. Default still is MacRoman to be compatible with older images. Unfortunately, there is no primitive to tell the VM which encoding to use, or a way to ask which one the VM is using (vm attributes 1005-1007 were proposed some time ago for the latter purpose). I have a changeset (*) that makes accented filenames work slightly more reliably under unix - but it has to resort to second-guessing the command line parameters ... assumes MacRoman if it does not find "latin1" as an option. Not pretty. - Bert - (*) http://lists.squeakfoundation.org/pipermail/vm-dev/2007-March/ 001046.html |
In reply to this post by Andreas.Raab
Well the mac carbon vm should take precomposed Unicode values (Normalization Form Canonical Composition) and convert to canonically decomposed Unicode (Normalization Form Canonical Decomposition), and the other way for compatibility. Windows and linux work with Normalization Form Canonical Composition http://en.wikipedia.org/wiki/Unicode_normalization,. Hopefully it does that now. Well of course reading all this again it points out NFS volumes are special... On Jun 2, 2007, at 10:48 PM, Andreas Raab wrote: > Oh, how interesting. I had no idea that there is UTF-8 and UTF-8. > So much for my proposal, I guess ;-) > > Cheers, > - Andreas -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Free forum by Nabble | Edit this page |