Ah! Indeed this doesn't seem to work correctly. I'll have a look at it.
Cheers, - Andreas Yoshiki Ohshima wrote: > > I think what Takashi meant was to put an image file on the "desktop", > which is translated to Katakana characters on Japanese Windows, and > then try to launch the image with the new VM. I did get the same > error in this way. > > -- Yoshiki > > At Tue, 05 Jun 2007 09:58:20 -0700, > Andreas Raab wrote: >> >> I just did the same without any problems. Can you check to see whether >> that was a one-time problem or if a different image file works. And if >> so, can you try to download the VM again (perhaps there was something >> corrupted?). Oh, and finally, check your virus, spyware etc. checker - >> they might think to take a closer look on an application that you just >> put there and dropped a file on. >> >> Cheers, >> - Andreas >> >> Takashi Yamamiya wrote: >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> Hi Andreas, >>> >>> When I started Squeakland image with new vm on the desktop (I >>> extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged >>> SqueakPlugin.image icon to Squeak.exe), I got this error. >>> >>> Image file read problem (0 out of 4 bytes read) >>> Cheers, >>> - Takashi >>> >>> Andreas Raab wrote: >>>> After a few more rounds of fixes and debugging (incl. the >>>> unicodification of the drag and drop and async file primitives) we >>>> have a shiny new 3.10.3 VM which should be usable for a more general >>>> audience: >>>> >>>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-bin.zip >>>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-src.zip >>> ------------------------------------------------------------------------ >>> > |
In reply to this post by Takashi Yamamiya
Hi Takashi -
The latest version (3.10.4) fixes this and many related problems. Turns out that there were still plenty of places in the whole vm/image path conversion that were a little unclean to say the least ;-) 3.10.4 allows me to have both, the VM as well as images sitting in internationalized directories without any problems. Give it a try. Cheers, - Andreas Takashi Yamamiya wrote: > > > > ------------------------------------------------------------------------ > > Hi Andreas, > > When I started Squeakland image with new vm on the desktop (I > extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged > SqueakPlugin.image icon to Squeak.exe), I got this error. > > Image file read problem (0 out of 4 bytes read) > Cheers, > - Takashi > > Andreas Raab wrote: >> >> After a few more rounds of fixes and debugging (incl. the >> unicodification of the drag and drop and async file primitives) we >> have a shiny new 3.10.3 VM which should be usable for a more general >> audience: >> >> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-bin.zip >> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-src.zip > > ------------------------------------------------------------------------ > |
Hi Andreas,
It works with new VM, http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.4-bin.zip Good. I still got primitive failed in Squeakland image, but it would not be vm issue. I'll check another image. Thanks, - Takashi Andreas Raab wrote: > > Hi Takashi - > > The latest version (3.10.4) fixes this and many related problems. Turns > out that there were still plenty of places in the whole vm/image path > conversion that were a little unclean to say the least ;-) 3.10.4 allows > me to have both, the VM as well as images sitting in internationalized > directories without any problems. Give it a try. > >> Hi Andreas, >> >> When I started Squeakland image with new vm on the desktop (I >> extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged >> SqueakPlugin.image icon to Squeak.exe), I got this error. >> >> Image file read problem (0 out of 4 bytes read) >> Cheers, |
In reply to this post by K. K. Subramaniam
subbukk <[hidden email]> writes:
> On Tuesday 05 June 2007 10:25 am, Martin v. Löwis wrote: > >It would actually be good if the VM would guarantee UTF-8 file > > names on all systems > Yes, indeed. The image could query the VM on startup to see if it supports > UTF-8 in filenames. Yes, it would seem to simplify matters to use UTF-8 consistently for interfacing between the image and the VM. Instead of the VM picking an encoding and telling the image which one it picked, it could go ahead and convert it to UTF-8. This applies not just to filenames, but every place where text is exchanged between the Smalltalk world and the VM, for example keyboard events and the clipboard. If the Windows VM is going in this direction, that's just great. Lex |
On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> Yes, it would seem to simplify matters to use UTF-8 consistently for > interfacing between the image and the VM. Instead of the VM picking > an encoding and telling the image which one it picked, it could go > ahead and convert it to UTF-8. > > This applies not just to filenames, but every place where text is > exchanged between the Smalltalk world and the VM, for example keyboard > events and the clipboard. This is not an easy job as the assumption of ASCII pervades Squeak. The only system that I am aware of that bit the bullet and went the whole hog is Plan 9. The team got the kernel, library and utilities to work with UTF8 as basic character unit and wrote about experience: http://plan9.bell-labs.com/sys/doc/utf.html Is there a kernel image that just contains basic Squeak and VMMaker where one could try building a UTF-8 Squeak? Smaller the better. Regards .. Subbu |
I'd start with Pavel's kernel image:
http://www.comtalk.net/Squeak/98 If you google Pavel kernel image you can find discussion On 6/7/07, subbukk <[hidden email]> wrote: > On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote: > > Yes, it would seem to simplify matters to use UTF-8 consistently for > > interfacing between the image and the VM. Instead of the VM picking > > an encoding and telling the image which one it picked, it could go > > ahead and convert it to UTF-8. > > > > This applies not just to filenames, but every place where text is > > exchanged between the Smalltalk world and the VM, for example keyboard > > events and the clipboard. > This is not an easy job as the assumption of ASCII pervades Squeak. The only > system that I am aware of that bit the bullet and went the whole hog is Plan > 9. The team got the kernel, library and utilities to work with UTF8 as basic > character unit and wrote about experience: > http://plan9.bell-labs.com/sys/doc/utf.html > > Is there a kernel image that just contains basic Squeak and VMMaker where one > could try building a UTF-8 Squeak? Smaller the better. > > Regards .. Subbu > > |
In reply to this post by K. K. Subramaniam
subbukk wrote:
> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote: >> Yes, it would seem to simplify matters to use UTF-8 consistently for >> interfacing between the image and the VM. Instead of the VM picking >> an encoding and telling the image which one it picked, it could go >> ahead and convert it to UTF-8. >> >> This applies not just to filenames, but every place where text is >> exchanged between the Smalltalk world and the VM, for example keyboard >> events and the clipboard. > This is not an easy job as the assumption of ASCII pervades Squeak. The Windows VM does exactly that now, and it was pretty straightforward, and it works fine. I don't know what you base your comment(s) on; certainly not exhaustive experience with Squeak. Cheers, - Andreas The only > system that I am aware of that bit the bullet and went the whole hog is Plan > 9. The team got the kernel, library and utilities to work with UTF8 as basic > character unit and wrote about experience: > http://plan9.bell-labs.com/sys/doc/utf.html > > Is there a kernel image that just contains basic Squeak and VMMaker where one > could try building a UTF-8 Squeak? Smaller the better. > > Regards .. Subbu > > |
In reply to this post by K. K. Subramaniam
I don't know details but I hope that UTF8 Squeak means full Unicode in
image and UTF-8 just on the "borders", to OS, to files etc? Best regards Janko subbukk wrote: > On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote: >> Yes, it would seem to simplify matters to use UTF-8 consistently for >> interfacing between the image and the VM. Instead of the VM picking >> an encoding and telling the image which one it picked, it could go >> ahead and convert it to UTF-8. >> >> This applies not just to filenames, but every place where text is >> exchanged between the Smalltalk world and the VM, for example keyboard >> events and the clipboard. > This is not an easy job as the assumption of ASCII pervades Squeak. The only > system that I am aware of that bit the bullet and went the whole hog is Plan > 9. The team got the kernel, library and utilities to work with UTF8 as basic > character unit and wrote about experience: > http://plan9.bell-labs.com/sys/doc/utf.html > > Is there a kernel image that just contains basic Squeak and VMMaker where one > could try building a UTF-8 Squeak? Smaller the better. > > Regards .. Subbu > > -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
On Thursday 07 June 2007 10:00 pm, Janko Mivšek wrote:
> I don't know details but I hope that UTF8 Squeak means full Unicode in > image and UTF-8 just on the "borders", to OS, to files etc? Well, UTF8 is just an encoding of Unicode code points, So, Squeak will have to support Unicode. Its language and tools will need to handle Unicode code points and UTF8 streams. Internally, whether code points or UTF8 encoding is used would depend on the context. Regards .. Subbu |
Each String object should specify its encoding scheme. UTF-8 should be the
default, but all commonly-encounterd encodings should be supported, and should all be useable at once (in different String instances.) When a Character is reified from a String, it should use the Unicode code point values (full 32-bit value.) Ideally, the encoding of a String should be a function of an associated Strategy object, and not be based on having different subclasses of String. |
In reply to this post by K. K. Subramaniam
Subbu,
At Thu, 7 Jun 2007 19:26:14 +0530, subbukk wrote: > > On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote: > > Yes, it would seem to simplify matters to use UTF-8 consistently for > > interfacing between the image and the VM. Instead of the VM picking > > an encoding and telling the image which one it picked, it could go > > ahead and convert it to UTF-8. > > > > This applies not just to filenames, but every place where text is > > exchanged between the Smalltalk world and the VM, for example keyboard > > events and the clipboard. > This is not an easy job as the assumption of ASCII pervades Squeak. The only > system that I am aware of that bit the bullet and went the whole hog is Plan > 9. The team got the kernel, library and utilities to work with UTF8 as basic > character unit and wrote about experience: > http://plan9.bell-labs.com/sys/doc/utf.html If "this" is the interface between the Smalltalk world and the VM, it is not that hard thing. There are only three paths for such interfacing, and you just convert at there. It might be just a matter of self-defence, but I still think that the way we did it (i.e., not change the VM first, and rely on the image level conversion) was the right thing. Back in 1999: - we were more concerned about small devices such as MI-series Zaurus. On that, adding the conversion table from/to Shift-JIS to Unicode was significant. We seem to care less about obscure platforms in these days, we care less flabors of Unix, as you provide the Linux version, it more or less works everywhere. And Windows, Mac and Linux (alright, only if Tim pretends, Acorn) are only platforms people care. - Releasing an image that requires a single version of VM would have been a mistake. Not all Squeak users was tech savvy. Some users have restrictions in terms of what they can change on their computers (at schools and such). Providing working installers for all major platforms was (still is) a large task. > Is there a kernel image that just contains basic Squeak and VMMaker where one > could try building a UTF-8 Squeak? Smaller the better. Ian might put his vmm-n.n-n image on the squeakvm.org sometime soon. -- Yoshiki |
Yoshiki Ohshima wrote:
> It might be just a matter of self-defence, but I still think that > the way we did it (i.e., not change the VM first, and rely on the > image level conversion) was the right thing. Completely agree. With 20/20 hindsight it's easy to say that this should use UTF-8; back then things weren't quite as clearly cut (for a time, going "all out UTF-16" in the VM was definitely an option, as seen in the 2.x WCE VMs). Having these conversions in the image was a very useful strategy to cope with the reality of encodings out there. OTOH, it's about time we tie up a few of the loose ends and make them a little more consistent. Cheers, - Andreas |
In reply to this post by K. K. Subramaniam
> Each String object should specify its encoding scheme. UTF-8 should be the
> default, but all commonly-encounterd encodings should be supported, and > should all be useable at once (in different String instances.) When a > Character is reified from a String, it should use the Unicode code point > values (full 32-bit value.) Ideally, the encoding of a String should be a > function of an associated Strategy object, and not be based on having > different subclasses of String. Is this better than using UTF32 throught the image for all Strings? One reason would be that for some chars in domestic encodings, the round-trip conversion is not exactly guaranteed; so you can avoid that problem in this way. But ohter than that, encodings only matters when the system is interfacing with the outside world. So, the internal representation can be uniform, I think. Would you write all comparison methods for all of combinations of different encodings? -- Yoshiki |
In reply to this post by K. K. Subramaniam
Subbu,
> > I don't know details but I hope that UTF8 Squeak means full Unicode in > > image and UTF-8 just on the "borders", to OS, to files etc? > Well, UTF8 is just an encoding of Unicode code points, So, Squeak will have to > support Unicode. Its language and tools will need to handle Unicode code > points and UTF8 streams. Internally, whether code points or UTF8 encoding is > used would depend on the context. Why do you get the impression that Squeak doesn't support it? Using UTF-8 internally throughout the system would be a challenge, especially thinking about that the overloaded methods like at:, at:put: and all of these have to be disambiguated as to what it means. -- Yoshiki |
In reply to this post by K. K. Subramaniam
Wouldn't that be a pretty big speed impact given how much strings are used?
>From: "Alan Lovejoy" <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: "'The general-purpose Squeak developers >list'"<[hidden email]> >Subject: RE: UTF8 Squeak >Date: Thu, 7 Jun 2007 11:55:02 -0700 > >Each String object should specify its encoding scheme. UTF-8 should be the >default, but all commonly-encounterd encodings should be supported, and >should all be useable at once (in different String instances.) When a >Character is reified from a String, it should use the Unicode code point >values (full 32-bit value.) Ideally, the encoding of a String should be a >function of an associated Strategy object, and not be based on having >different subclasses of String. _________________________________________________________________ Need a break? Find your escape route with Live Search Maps. http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01 |
In reply to this post by K. K. Subramaniam
Because I'm coming from VisualWorks world, let me explain a bit how the
Unicode support is solved there: 1. internally everything is in 16bit Unicode, without any additionally encoding info attached to strings 2. there is a class ByteString for pure ASCII(1) and TwoByteString for Unicode strings. Conversion from Byte to TwoByteString is automatic when you concatenate two mixed-width strings. 3. streams: external streams(2) are always dealing with encodings, internal streams never (1) Strings have actually subclasses for 8 bit encodings like ISO8859L1String etc. but this seems not used much recently (2) with help of an EncodedStream as a wrapper of original stream. And it is helped by StreamEncoders, which actually do en/decoding. There is quite a number of them, from Base64StreamEncoder to for us more interesting UTF8StreamEncoder. I find VW approach very simple and elegant and I think Squeak can solve Unicode easily by following VW as an example a bit. Best regards Janko Alan Lovejoy wrote: > Each String object should specify its encoding scheme. UTF-8 should be the > default, but all commonly-encounterd encodings should be supported, and > should all be useable at once (in different String instances.) When a > Character is reified from a String, it should use the Unicode code point > values (full 32-bit value.) Ideally, the encoding of a String should be a > function of an associated Strategy object, and not be based on having > different subclasses of String. -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
It is so true that I should've looked at the class names in VW
before doing everything... > 1. internally everything is in 16bit Unicode, without any additionally > encoding info attached to strings If they use 16-bit per char, how do they deal with surrogated pairs? > 2. there is a class ByteString for pure ASCII(1) and TwoByteString for > Unicode strings. Conversion from Byte to TwoByteString is automatic > when you concatenate two mixed-width strings. This is what Squeak does with ByteString and WideString. > 3. streams: external streams(2) are always dealing with > encodings, internal streams never In Squeak to do conversion from/to file useMultiByteFileStream. For memory based strings, use MultiByteBinaryOrTextStream. Or, you can manually create an instance of TextConverter and write some logic to pass chars from/to streams. > (1) Strings have actually subclasses for 8 bit encodings like > ISO8859L1String etc. but this seems not used much recently So, as in Squeak, having only ByteString and WideString (with a common abstract superclass) is better^^; > (2) with help of an EncodedStream as a wrapper of original stream. And > it is helped by StreamEncoders, which actually do en/decoding. > There is quite a number of them, from Base64StreamEncoder to for us > more interesting UTF8StreamEncoder. As I wrote, you can write these variation of Streams by youself quite easily. I admit that there is no framework for it. > I find VW approach very simple and elegant and I think Squeak can solve > Unicode easily by following VW as an example a bit. Thank you for summarizing it! -- Yoshiki |
Hi Yoshiki,
Yoshiki Ohshima wrote: > It is so true that I should've looked at the class names in VW > before doing everything... > >> 1. internally everything is in 16bit Unicode, without any additionally >> encoding info attached to strings > > If they use 16-bit per char, how do they deal with surrogated pairs? I looked once again and there is actually a FourByteString too. This probably answer your question. VW also support Japanese locale well. Best regards Janko > >> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for >> Unicode strings. Conversion from Byte to TwoByteString is automatic >> when you concatenate two mixed-width strings. > > This is what Squeak does with ByteString and WideString. > >> 3. streams: external streams(2) are always dealing with >> encodings, internal streams never > > In Squeak to do conversion from/to file useMultiByteFileStream. For > memory based strings, use MultiByteBinaryOrTextStream. Or, you can > manually create an instance of TextConverter and write some logic to > pass chars from/to streams. > >> (1) Strings have actually subclasses for 8 bit encodings like >> ISO8859L1String etc. but this seems not used much recently > > So, as in Squeak, having only ByteString and WideString (with a > common abstract superclass) is better^^; > >> (2) with help of an EncodedStream as a wrapper of original stream. And >> it is helped by StreamEncoders, which actually do en/decoding. >> There is quite a number of them, from Base64StreamEncoder to for us >> more interesting UTF8StreamEncoder. > > As I wrote, you can write these variation of Streams by youself > quite easily. I admit that there is no framework for it. > >> I find VW approach very simple and elegant and I think Squeak can solve >> Unicode easily by following VW as an example a bit. > > Thank you for summarizing it! > > -- Yoshiki > > -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
Hi, Janko,
> >> 1. internally everything is in 16bit Unicode, without any additionally > >> encoding info attached to strings > > > > If they use 16-bit per char, how do they deal with surrogated pairs? > > I looked once again and there is actually a FourByteString too. This > probably answer your question. Probably, yes. So, the question to you is that if you have a system with 8-bit ByteString and 32-bit WideString in year 2007, would you add a class to represent 16-bit string to that system? > VW also support Japanese locale well. Oh, yes. I know it. In fact, the internationalization of VisualWorks was done by a company that is my former employee. (The work was done way before I joined, though). I have seen some apps and developers of the system. However, there is a reason to call our stuff m17n, instead of i18n. It might be still an aspiration to it, but supporting one language at a time "sort of localed based idea" is not enough for "real" multilingualization, where you would like to mix strings from different languages freely. -- Yoshiki |
Hi Yoshiki,
Yoshiki Ohshima wrote: >>>> 1. internally everything is in 16bit Unicode, without any additionally >>>> encoding info attached to strings >>> If they use 16-bit per char, how do they deal with surrogated pairs? >> I looked once again and there is actually a FourByteString too. This >> probably answer your question. > > Probably, yes. > > So, the question to you is that if you have a system with 8-bit > ByteString and 32-bit WideString in year 2007, would you add a class > to represent 16-bit string to that system? I would say yes, because for most countries 16-bit is enough and 32-bit is then just a waste of memory. And I just noticed that WideString is actually fixed to 4 bytes. I would therefore think about renaming it to ForByteString and add TwoByteString (or similar names). For user these are always Strings anyway, as SmallIntegers and LargeIntegers are always Integers. > >> VW also support Japanese locale well. > > Oh, yes. I know it. In fact, the internationalization of > VisualWorks was done by a company that is my former employee. (The > work was done way before I joined, though). I have seen some apps and > developers of the system. > > However, there is a reason to call our stuff m17n, instead of i18n. > It might be still an aspiration to it, but supporting one language at > a time "sort of localed based idea" is not enough for "real" > multilingualization, where you would like to mix strings from > different languages freely. I strongly agree and therefore a well thought-out effort to solve i18n well in Squeak is a must. For me also, because I still need to find out how to port Aida/Web i18n support to Squeak ... Best regards JAnko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
Free forum by Nabble | Edit this page |