Hi Folks -
Thanks to some dedicated OLPC-related work done in Greece by Chris Petsos[1], we now have a Windows VM with Unicode support enabled. This VM will both generate UTF input from characters as well as support clipboard, file and directory names in UTF-8. The VM is available here: http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-bin.zip http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-src.zip You are invited to test the new work but be advised that this may require some manual adjustments - for an understanding what needs to be done, please see [2]. I'm interested in reports, both good and bad about whether the clipboard, file, directory and input support behaves as expected. [1]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-May/001194.html [2]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-June/001306.html Cheers, - Andreas |
I just did a quick test out of curiosity. I did an "update code from
server" on a rather old image (Squeak3.9gamma [latest update: #7066]) and it seems there is an error while creating a cache directory/file. The primitive "primOpen: fileName writable: writableFlag" fails on the filename [1] and writableFlag true. This works with a 3.7.1 VM. Alex [1] 'C:\Dokumente und Einstellungen\laza\Desktop\SqueakVM-Win32-3.10.1-bin\package-cache\ScriptLoader-sd.324.mcz' Andreas Raab schrieb: > Hi Folks - > > Thanks to some dedicated OLPC-related work done in Greece by Chris > Petsos[1], we now have a Windows VM with Unicode support enabled. This > VM will both generate UTF input from characters as well as support > clipboard, file and directory names in UTF-8. The VM is available here: > > http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-bin.zip > http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-src.zip > > You are invited to test the new work but be advised that this may > require some manual adjustments - for an understanding what needs to be > done, please see [2]. > > I'm interested in reports, both good and bad about whether the > clipboard, file, directory and input support behaves as expected. > > [1]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-May/001194.html > [2]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-June/001306.html > > Cheers, > - Andreas > |
In reply to this post by Andreas.Raab
> I'm interested in reports, both good and bad about whether the
> clipboard, file, directory and input support behaves as expected. After installing the various fixes and picking fonts, I was able to enter Cyrillic text through the keyboard, using a Greek environment, on a German Windows XP. What I couldn't get working is the listing of file names in the file browser. To reproduce, create file names using Greek, Cyrillic, and Chinese letters, and then do "open/file list". With the wrong font, I get question marks. When I select a font that ought to be able to represent it correctly, I still get a mix of Latin letters and square boxes. What I don't understand is: Why do I have to set the language environment (*) to make it work? It's Unicode, so Squeak shouldn't care what the language is. If it needs to know, it should get the language from the system. Regards, Martin (*) As instructed, I did Locale currentPlatform: (Locale localeID: (LocaleID isoString: 'el')). |
In reply to this post by Alexander Lazarevic'
Yes, I just fixed that. A left-over call to CreateDirectoryA() would
make directory creation impossible and later file creation attempts as well. Will be fixed in the next version. Cheers, - Andreas Alexander Lazarevic' wrote: > I just did a quick test out of curiosity. I did an "update code from > server" on a rather old image (Squeak3.9gamma [latest update: #7066]) > and it seems there is an error while creating a cache directory/file. > The primitive "primOpen: fileName writable: writableFlag" fails on the > filename [1] and writableFlag true. > This works with a 3.7.1 VM. > > Alex > > [1] 'C:\Dokumente und > Einstellungen\laza\Desktop\SqueakVM-Win32-3.10.1-bin\package-cache\ScriptLoader-sd.324.mcz' > > Andreas Raab schrieb: >> Hi Folks - >> >> Thanks to some dedicated OLPC-related work done in Greece by Chris >> Petsos[1], we now have a Windows VM with Unicode support enabled. This >> VM will both generate UTF input from characters as well as support >> clipboard, file and directory names in UTF-8. The VM is available here: >> >> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-bin.zip >> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-src.zip >> >> You are invited to test the new work but be advised that this may >> require some manual adjustments - for an understanding what needs to be >> done, please see [2]. >> >> I'm interested in reports, both good and bad about whether the >> clipboard, file, directory and input support behaves as expected. >> >> [1]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-May/001194.html >> [2]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-June/001306.html >> >> Cheers, >> - Andreas >> > > |
In reply to this post by "Martin v. Löwis"
Martin v. Löwis wrote:
> What I couldn't get working is the listing of file names in the > file browser. To reproduce, create file names using Greek, Cyrillic, > and Chinese letters, and then do "open/file list". With the wrong font, > I get question marks. When I select a font that ought to be able to > represent it correctly, I still get a mix of Latin letters and > square boxes. Hm ... there isn't any easy way to test this I guess? The code hasn't changed that much so I would expect this to be working (in particular considering that it seems to work fine for ascii file names). > What I don't understand is: Why do I have to set the language > environment (*) to make it work? It's Unicode, so Squeak > shouldn't care what the language is. If it needs to know, it > should get the language from the system. I don't know. In particular considering that we have now the locale plugin which can detect these settings easily. Cheers, - Andreas |
In reply to this post by "Martin v. Löwis"
Martin v. Löwis wrote:
> What I couldn't get working is the listing of file names in the > file browser. To reproduce, create file names using Greek, Cyrillic, > and Chinese letters, and then do "open/file list". With the wrong font, > I get question marks. When I select a font that ought to be able to > represent it correctly, I still get a mix of Latin letters and > square boxes. Digging in the code it seems that the conversion of file names is broken (or at least it seems that way). I can't seem to find the place where a UTF8TextConverter would ever be used (which of course is a requirement for this to work). It seems that the code still assumes that the VMs present file names encoded in the corresponding code pages (which also explains why you'd need to set the language environment etc). The thing to try would be to go into LanguageEnvironment class and change defaultFileNameConverter to include: "Windows VMs always use UTF8-encoded file names now" Smalltalk platformName = 'Win32' ifTrue:[^UTF8TextConverter new]. Cheers, - Andreas |
In reply to this post by Andreas.Raab
I have fixed the problem with directory creation and updated the VM to
3.10.2 which is up in the usual places: http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.2-bin.zip http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.2-src.zip Cheers, - Andreas Andreas Raab wrote: > Hi Folks - > > Thanks to some dedicated OLPC-related work done in Greece by Chris > Petsos[1], we now have a Windows VM with Unicode support enabled. This > VM will both generate UTF input from characters as well as support > clipboard, file and directory names in UTF-8. The VM is available here: > > http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-bin.zip > http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.1-src.zip > > You are invited to test the new work but be advised that this may > require some manual adjustments - for an understanding what needs to be > done, please see [2]. > > I'm interested in reports, both good and bad about whether the > clipboard, file, directory and input support behaves as expected. > > [1]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-May/001194.html > [2]http://lists.squeakfoundation.org/pipermail/vm-dev/2007-June/001306.html > > Cheers, > - Andreas > > |
> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.2-bin.zip
> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.2-src.zip The file path interpretation works on Japanese Windows (see the file name pane in the attached picture) by changing the fileNameConverterClass of JapaneseEnvironment. However, I can't quite figure out to make keyboard input work... It doesn't seem that I get meaningful values when the input is done via an IME. I'll take a look at it later... -- Yoshiki filelist.png (16K) Download Attachment |
In reply to this post by "Martin v. Löwis"
Martin,
> What I couldn't get working is the listing of file names in the > file browser. To reproduce, create file names using Greek, Cyrillic, > and Chinese letters, and then do "open/file list". With the wrong font, > I get question marks. When I select a font that ought to be able to > represent it correctly, I still get a mix of Latin letters and > square boxes. As Andreas wrote, defaultFileNameConverter has to be modified (and the class var in the LanguageEnvironment has to be cleared.) > What I don't understand is: Why do I have to set the language > environment (*) to make it work? It's Unicode, so Squeak > shouldn't care what the language is. If it needs to know, it > should get the language from the system. Read the Unicode standard. Because it is Unicode, a mechanism out of scope of Unicode has to supply language information to do sensible stuff. -- Yoshiki |
In reply to this post by Yoshiki Ohshima
> The file path interpretation works on Japanese Windows (see the file
> name pane in the attached picture) by changing the > fileNameConverterClass of JapaneseEnvironment. However, I can't quite > figure out to make keyboard input work... It doesn't seem that I get > meaningful values when the input is done via an IME. Yoshiki, Don't forget that the VM is sending unicode chars as the sixth data member of the event buffer. So advice your input interpreter as evtbuf sixth. instead of evtbuf third. I also tried to copy/paste same japanese text with the new VM and i couldn't do it... Japanese locale specific question... does a unicode Japanese character fit inside a WCHAR struct? Cause, there have been used Windows functions that convert WCHAR streams MultiByteToWideChar WideCharToMultiByte Could the problem be there? Christos. |
Chris,
> Don't forget that the VM is sending unicode chars as the sixth data member > of the event buffer. Sure. I looked at these values yet didn't see it. (If I remember correctly, it got changed (from third to sixth. take a look at MacUnicodeInputInterpreter>>initialize. It tells you the history^^;)) > Japanese locale specific question... does a unicode Japanese character fit > inside a WCHAR struct? It does. UTF-16LE without surrogated pairs is almost ok for daily use. > Cause, there have been used Windows functions that convert WCHAR streams > > MultiByteToWideChar > WideCharToMultiByte > > Could the problem be there? I can't quite trace the detail (and my memory), but when the macro UNICODE is defined, the latter should be just used? -- Yoshiki |
Yoshiki Ohshima wrote:
>> Cause, there have been used Windows functions that convert WCHAR streams >> >> MultiByteToWideChar >> WideCharToMultiByte >> >> Could the problem be there? > > I can't quite trace the detail (and my memory), but when the macro > UNICODE is defined, the latter should be just used? The VM is not generally compiled with -DUNICODE; the places where we utilize WCHAR are explicit and we use the explicit *W variants of the Windows functions. Cheers, - Andreas |
In reply to this post by Yoshiki Ohshima
>> Japanese locale specific question... does a unicode Japanese character
>> fit >> inside a WCHAR struct? > > It does. UTF-16LE without surrogated pairs is almost ok for daily > use. > Right, but take a look at the codepage parameter of the functions http://msdn2.microsoft.com/en-us/library/ms776413.aspx http://msdn2.microsoft.com/en-us/library/ms776420.aspx It only supports until UTF8...so, can it work for Japanese chars? >> Cause, there have been used Windows functions that convert WCHAR streams >> >> MultiByteToWideChar >> WideCharToMultiByte >> Christos. |
In reply to this post by Andreas.Raab
>> What I couldn't get working is the listing of file names in the
>> file browser. To reproduce, create file names using Greek, Cyrillic, >> and Chinese letters, and then do "open/file list". With the wrong font, >> I get question marks. When I select a font that ought to be able to >> represent it correctly, I still get a mix of Latin letters and >> square boxes. > > Hm ... there isn't any easy way to test this I guess? It's actually very easy. 1. Create a text file. 2. rename it to some Cyrillic (or Japanese, or whatever) name 3. Open it in the listing To rename it, the easiest way is to start up charmap.exe, select a few "funny" characters, copy them to the clipboard, and past them in explorer into the file name. > The code hasn't > changed that much so I would expect this to be working (in particular > considering that it seems to work fine for ascii file names). That might be the problem. If the code is still using the *A functions (FindFirstFileA), then this cannot work - but I would expect to see question marks in that case. If it was changed to use the *W functions, then the question is how these strings are communicated to the VM. Regards, Martin |
In reply to this post by Yoshiki Ohshima
>> What I don't understand is: Why do I have to set the language >> environment (*) to make it work? It's Unicode, so Squeak >> shouldn't care what the language is. If it needs to know, it >> should get the language from the system. > > Read the Unicode standard. I did. What section are you specifically referring to? > Because it is Unicode, a mechanism out of scope of Unicode has to > supply language information to do sensible stuff. What is the sensible stuff it needs to do? Regards, Martin |
In reply to this post by "Martin v. Löwis"
Martin v. Löwis wrote:
>> Hm ... there isn't any easy way to test this I guess? > > It's actually very easy. > > 1. Create a text file. > 2. rename it to some Cyrillic (or Japanese, or whatever) name > 3. Open it in the listing > > To rename it, the easiest way is to start up charmap.exe, > select a few "funny" characters, copy them to the clipboard, > and past them in explorer into the file name. work: 1) You need to fix LanguageEnvironment class defaultFileNameConverter (make it return UTF8TextConverter new) 2) You need to load a TTF font with the glyphs. For this you need: * Load the TTF loading fixes that Christos posted * Drag and drop a TTF font with the right glyphs on Squeak (Arial works fine) 3) Make this font the default font for text and lists. Once you got all of this the file list shows the correct names. Cheers, - Andreas Filelist.gif (39K) Download Attachment |
In reply to this post by "Martin v. Löwis"
Martin,
> >> What I don't understand is: Why do I have to set the language > >> environment (*) to make it work? It's Unicode, so Squeak > >> shouldn't care what the language is. If it needs to know, it > >> should get the language from the system. > > > > Read the Unicode standard. > > I did. What section are you specifically referring to? For example, take a look at this FAQ entry: http://www.unicode.org/faq/han_cjk.html#3 (and one before this and after). > > Because it is Unicode, a mechanism out of scope of Unicode has to > > supply language information to do sensible stuff. > > What is the sensible stuff it needs to do? To display strings in an ok way. http://www.unicode.org/faq/han_cjk.html#2 says that you should select a proper font based on the language you would like to treat the character in. Although the current Squeak implementation is not there yet, you would like to do different sorting or uppercase/lowercase conversions based on the language (even within Latin-1 regions). A segment of text generally should have more information other than the bare code point of Unicode. -- Yoshiki |
In reply to this post by Andreas.Raab
On Monday 04 June 2007 12:08 am, Andreas Raab wrote:
> "Windows VMs always use UTF8-encoded file names now" > Smalltalk platformName = 'Win32' > ifTrue:[^UTF8TextConverter new]. Andreas, The conditional is incorrect and unnecessary. filename encoding depends on filesystem, not the VM platform. For instance, I could have a UTF-8 file on a USB flash and use it across different VMs. AFAIK, FAT fs does not support UTF-8. NTFS, HPFS (Mac) and all current Linux filesystems support UTF-8 in filenames. Regards .. Subbu |
On Jun 4, 2007, at 12:49 , subbukk wrote: > On Monday 04 June 2007 12:08 am, Andreas Raab wrote: >> "Windows VMs always use UTF8-encoded file names now" >> Smalltalk platformName = 'Win32' >> ifTrue:[^UTF8TextConverter new]. > Andreas, > > The conditional is incorrect and unnecessary. filename encoding > depends on > filesystem, not the VM platform. For instance, I could have a UTF-8 > file on a > USB flash and use it across different VMs. > > AFAIK, FAT fs does not support UTF-8. NTFS, HPFS (Mac) and all > current Linux > filesystems support UTF-8 in filenames. Wrong. This solely defines what encoding is used to communicate between the image and the VM. The VM then translates this to whatever encoding the file system uses. - Bert - |
On Monday 04 June 2007 4:28 pm, Bert Freudenberg wrote:
> > .. filename encoding > > depends on > > filesystem, not the VM platform. For instance, I could have a UTF-8 > > file on a > > USB flash and use it across different VMs... > Wrong. This solely defines what encoding is used to communicate > between the image and the VM. The VM then translates this to whatever > encoding the file system uses. I presume, this double conversion is transparent to code in the image. The code assumes that if the VM is win32, then UTF-8 is supported in filenames instead of querying the VM for a UTF-8 capability. This breaks encapsulation. In Tim's words : On Tuesday 15 May 2007 12:46 am, tim Rowledge wrote: > .. Allowing #fileNamed: to > attempt to parse mangled platform related strings was a serious > error. Platform fiddle-faddle for filenames is just horrific. Regards .. Subbu |
Free forum by Nabble | Edit this page |