Mac and Unix file/directory/clipboard interface?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Mac and Unix file/directory/clipboard interface?

Andreas.Raab
 
Hi Folks -

Since I just went through all of this, can someone explain to me what
string encoding the Unix and Mac VMs use for interfacing the file,
directory and clipboard functions? If these are all UTF-8 based (which I
suspect) then should we just define that *all* strings passed to the VM
are to be interpreted as UTF-8 and any VM or function that doesn't deal
with UTF-8 correctly is considered broken and needs fixing? It strikes
me as a nice, elegant solution to solve this problem once and forever.

Comments, anyone?

Cheers,
   - Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Mac and Unix file/directory/clipboard interface?

johnmci
 
ok the mac carbon vm, and I believe with the unix os-x vm  let you  
specify what format the file/directory/drag-drop information is in .

By default the os-x carbon vm uses macroman because of issues with  
the file list dialog and how it assumes it knows what the file/
directory  names should be translated in various
version of Squeak.

For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I  
believe is MacRoman

I'll note from http://en.wikipedia.org/wiki/UTF-8

The Mac OS X Operating System uses canonically decomposed Unicode,  
encoded using UTF-8 for file names in the filesystem.
So saying it's UTF8 is well not quite all the picture when it comes  
to UTF8.

In early May I applied some fixes to the Mac Carbon VM to address  
issues with pre-composed versus canonically decomposed Unicode UTF8  
translation based on suggestions from
Tetsuya Hayashi and further testing.

> sqMacUnixFileInterface.c Tetsuya HAYASHI, [hidden email],  
> [hidden email]  I've found the latest mac vm (or recent version)  
> fails to normalize UTF file name.
> It seems to be the function convertChars() of  
> sqMacUnixFileInterface.c, which normalizes only decompose when  
> converting squeak string to unix,
> but I think it needs pre-combined when unix string to  
> squeak, and I noticed normalization form should be canonical  
> (exactly should be
> kCFStringNormalizationFormC) for pre-combined.


I cannot say if this is also an issue with the unix VM.


As for the clipboard the old primitives assume macroman. The extended  
os-x clipboard plugin lets you pass any character format you wish  
based on mime-type.  Should that be
text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG


On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:

> Hi Folks -
>
> Since I just went through all of this, can someone explain to me  
> what string encoding the Unix and Mac VMs use for interfacing the  
> file, directory and clipboard functions? If these are all UTF-8  
> based (which I suspect) then should we just define that *all*  
> strings passed to the VM are to be interpreted as UTF-8 and any VM  
> or function that doesn't deal with UTF-8 correctly is considered  
> broken and needs fixing? It strikes me as a nice, elegant solution  
> to solve this problem once and forever.
>
> Comments, anyone?
>
> Cheers,
>   - Andreas

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===




--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: Mac and Unix file/directory/clipboard interface?

Andreas.Raab
 
Oh, how interesting. I had no idea that there is UTF-8 and UTF-8. So
much for my proposal, I guess ;-)

Cheers,
   - Andreas

John M McIntosh wrote:

>
> ok the mac carbon vm, and I believe with the unix os-x vm  let you
> specify what format the file/directory/drag-drop information is in .
>
> By default the os-x carbon vm uses macroman because of issues with the
> file list dialog and how it assumes it knows what the file/directory  
> names should be translated in various
> version of Squeak.
>
> For Sophie we use UTF8, Plopp I think they use UTF8, Scratch I believe
> is MacRoman
>
> I'll note from http://en.wikipedia.org/wiki/UTF-8
>
> The Mac OS X Operating System uses canonically decomposed Unicode,
> encoded using UTF-8 for file names in the filesystem.
> So saying it's UTF8 is well not quite all the picture when it comes to
> UTF8.
>
> In early May I applied some fixes to the Mac Carbon VM to address issues
> with pre-composed versus canonically decomposed Unicode UTF8 translation
> based on suggestions from
> Tetsuya Hayashi and further testing.
>
>>             sqMacUnixFileInterface.c        Tetsuya HAYASHI,
>> [hidden email], [hidden email]  I've found the latest mac vm (or
>> recent version) fails to normalize UTF file name.
>>                                         It seems to be the function
>> convertChars() of sqMacUnixFileInterface.c, which normalizes only
>> decompose when converting squeak string to unix,
>>                                         but I think it needs
>> pre-combined when unix string to squeak, and I noticed normalization
>> form should be canonical (exactly should be
>>                                          kCFStringNormalizationFormC)
>> for pre-combined.
>
>
> I cannot say if this is also an issue with the unix VM.
>
>
> As for the clipboard the old primitives assume macroman. The extended
> os-x clipboard plugin lets you pass any character format you wish based
> on mime-type.  Should that be
> text, utf-8, utf-32, utf-16 or RTF? mmm no perhaps TIFF/PNG or JPEG
>
>
> On Jun 2, 2007, at 9:34 PM, Andreas Raab wrote:
>
>> Hi Folks -
>>
>> Since I just went through all of this, can someone explain to me what
>> string encoding the Unix and Mac VMs use for interfacing the file,
>> directory and clipboard functions? If these are all UTF-8 based (which
>> I suspect) then should we just define that *all* strings passed to the
>> VM are to be interpreted as UTF-8 and any VM or function that doesn't
>> deal with UTF-8 correctly is considered broken and needs fixing? It
>> strikes me as a nice, elegant solution to solve this problem once and
>> forever.
>>
>> Comments, anyone?
>>
>> Cheers,
>>   - Andreas
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Mac and Unix file/directory/clipboard interface?

Bert Freudenberg
In reply to this post by Andreas.Raab
 
On Jun 3, 2007, at 6:34 , Andreas Raab wrote:

> Hi Folks -
>
> Since I just went through all of this, can someone explain to me  
> what string encoding the Unix and Mac VMs use for interfacing the  
> file, directory and clipboard functions? If these are all UTF-8  
> based (which I suspect) then should we just define that *all*  
> strings passed to the VM are to be interpreted as UTF-8 and any VM  
> or function that doesn't deal with UTF-8 correctly is considered  
> broken and needs fixing? It strikes me as a nice, elegant solution  
> to solve this problem once and forever.

As John mentioned, the unix VM has command line options to choose the  
encoding that is presented to unix. Default still is MacRoman to be  
compatible with older images.

Unfortunately, there is no primitive to tell the VM which encoding to  
use, or a way to ask which one the VM is using (vm attributes  
1005-1007 were proposed some time ago for the latter purpose). I have  
a changeset (*) that makes accented filenames work slightly more  
reliably under unix - but it has to resort to second-guessing the  
command line parameters ... assumes MacRoman if it does not find  
"latin1" as an option. Not pretty.

- Bert -

(*) http://lists.squeakfoundation.org/pipermail/vm-dev/2007-March/ 
001046.html

Reply | Threaded
Open this post in threaded view
|

Re: Mac and Unix file/directory/clipboard interface?

johnmci
In reply to this post by Andreas.Raab
 
Well the mac carbon vm should take precomposed Unicode values  
(Normalization Form Canonical Composition)
and convert to canonically decomposed Unicode (Normalization Form  
Canonical Decomposition), and the other way for compatibility.

Windows and linux  work with Normalization Form Canonical  
Composition  http://en.wikipedia.org/wiki/Unicode_normalization,.


Hopefully it does that now.  Well of course reading all this again it  
points out NFS volumes are special...

On Jun 2, 2007, at 10:48 PM, Andreas Raab wrote:

> Oh, how interesting. I had no idea that there is UTF-8 and UTF-8.  
> So much for my proposal, I guess ;-)
>
> Cheers,
>   - Andreas

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===