Smalltalk › Squeak › Squeak - Dev

New Win32 VM [m17n testers needed]

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

124 messages Options

1234567

Andreas.Raab

Re: [Vm-dev] New Win32 VM (3.10.3)

Ah! Indeed this doesn't seem to work correctly. I'll have a look at it.

Cheers,
- Andreas

Yoshiki Ohshima wrote:

>
> I think what Takashi meant was to put an image file on the "desktop",
> which is translated to Katakana characters on Japanese Windows, and
> then try to launch the image with the new VM. I did get the same
> error in this way.
>
> -- Yoshiki
>
> At Tue, 05 Jun 2007 09:58:20 -0700,
> Andreas Raab wrote:
>>
>> I just did the same without any problems. Can you check to see whether
>> that was a one-time problem or if a different image file works. And if
>> so, can you try to download the VM again (perhaps there was something
>> corrupted?). Oh, and finally, check your virus, spyware etc. checker -
>> they might think to take a closer look on an application that you just
>> put there and dropped a file on.
>>
>> Cheers,
>> - Andreas
>>
>> Takashi Yamamiya wrote:
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Hi Andreas,
>>>
>>> When I started Squeakland image with new vm on the desktop (I
>>> extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged
>>> SqueakPlugin.image icon to Squeak.exe), I got this error.
>>>
>>> Image file read problem (0 out of 4 bytes read)
>>> Cheers,
>>> - Takashi
>>>
>>> Andreas Raab wrote:
>>>> After a few more rounds of fixes and debugging (incl. the
>>>> unicodification of the drag and drop and async file primitives) we
>>>> have a shiny new 3.10.3 VM which should be usable for a more general
>>>> audience:
>>>>
>>>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-bin.zip
>>>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-src.zip
>>> ------------------------------------------------------------------------
>>>
>

Andreas.Raab

Re: [Vm-dev] New Win32 VM (3.10.3)

In reply to this post by Takashi Yamamiya

Hi Takashi -

The latest version (3.10.4) fixes this and many related problems. Turns
out that there were still plenty of places in the whole vm/image path
conversion that were a little unclean to say the least ;-) 3.10.4 allows
me to have both, the VM as well as images sitting in internationalized
directories without any problems. Give it a try.

Cheers,
- Andreas

Takashi Yamamiya wrote:

>
>
>
> ------------------------------------------------------------------------
>
> Hi Andreas,
>
> When I started Squeakland image with new vm on the desktop (I
> extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged
> SqueakPlugin.image icon to Squeak.exe), I got this error.
>
> Image file read problem (0 out of 4 bytes read)
> Cheers,
> - Takashi
>
> Andreas Raab wrote:
>>
>> After a few more rounds of fixes and debugging (incl. the
>> unicodification of the drag and drop and async file primitives) we
>> have a shiny new 3.10.3 VM which should be usable for a more general
>> audience:
>>
>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-bin.zip
>> http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.3-src.zip
>
> ------------------------------------------------------------------------
>

Takashi Yamamiya

Re: [Vm-dev] New Win32 VM (3.10.3)

Hi Andreas,

It works with new VM,
http://www.squeakvm.org/win32/release/SqueakVM-Win32-3.10.4-bin.zip
Good. I still got primitive failed in Squeakland image, but it would not
be vm issue. I'll check another image.

Thanks,
- Takashi

Andreas Raab wrote:

>
> Hi Takashi -
>
> The latest version (3.10.4) fixes this and many related problems. Turns
> out that there were still plenty of places in the whole vm/image path
> conversion that were a little unclean to say the least ;-) 3.10.4 allows
> me to have both, the VM as well as images sitting in internationalized
> directories without any problems. Give it a try.
>
>> Hi Andreas,
>>
>> When I started Squeakland image with new vm on the desktop (I
>> extracted SqueakVM-Win32-3.10.3-bin.zip, and dragged
>> SqueakPlugin.image icon to Squeak.exe), I got this error.
>>
>> Image file read problem (0 out of 4 bytes read)
>> Cheers,

Lex Spoon-3

Re: New Win32 VM [m17n testers needed]

In reply to this post by K. K. Subramaniam

subbukk <[hidden email]> writes:
> On Tuesday 05 June 2007 10:25 am, Martin v. Löwis wrote:
> >It would actually be good if the VM would guarantee UTF-8 file
> > names on all systems
> Yes, indeed. The image could query the VM on startup to see if it supports
> UTF-8 in filenames.

Yes, it would seem to simplify matters to use UTF-8 consistently for
interfacing between the image and the VM. Instead of the VM picking
an encoding and telling the image which one it picked, it could go
ahead and convert it to UTF-8.

This applies not just to filenames, but every place where text is
exchanged between the Smalltalk world and the VM, for example keyboard
events and the clipboard.

If the Windows VM is going in this direction, that's just great.

Lex

K. K. Subramaniam

UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> Yes, it would seem to simplify matters to use UTF-8 consistently for
> interfacing between the image and the VM. Instead of the VM picking
> an encoding and telling the image which one it picked, it could go
> ahead and convert it to UTF-8.
>
> This applies not just to filenames, but every place where text is
> exchanged between the Smalltalk world and the VM, for example keyboard
> events and the clipboard.
This is not an easy job as the assumption of ASCII pervades Squeak. The only
system that I am aware of that bit the bullet and went the whole hog is Plan
9. The team got the kernel, library and utilities to work with UTF8 as basic
character unit and wrote about experience:
http://plan9.bell-labs.com/sys/doc/utf.html

Is there a kernel image that just contains basic Squeak and VMMaker where one
could try building a UTF-8 Squeak? Smaller the better.

Regards .. Subbu

David Mitchell-10

Re: UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

I'd start with Pavel's kernel image:
http://www.comtalk.net/Squeak/98

If you google Pavel kernel image you can find discussion

On 6/7/07, subbukk <[hidden email]> wrote:

> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> > Yes, it would seem to simplify matters to use UTF-8 consistently for
> > interfacing between the image and the VM. Instead of the VM picking
> > an encoding and telling the image which one it picked, it could go
> > ahead and convert it to UTF-8.
> >
> > This applies not just to filenames, but every place where text is
> > exchanged between the Smalltalk world and the VM, for example keyboard
> > events and the clipboard.
> This is not an easy job as the assumption of ASCII pervades Squeak. The only
> system that I am aware of that bit the bullet and went the whole hog is Plan
> 9. The team got the kernel, library and utilities to work with UTF8 as basic
> character unit and wrote about experience:
> http://plan9.bell-labs.com/sys/doc/utf.html
>
> Is there a kernel image that just contains basic Squeak and VMMaker where one
> could try building a UTF-8 Squeak? Smaller the better.
>
> Regards .. Subbu
>
>

Andreas.Raab

Re: UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

In reply to this post by K. K. Subramaniam

subbukk wrote:

The Windows VM does exactly that now, and it was pretty straightforward,
and it works fine. I don't know what you base your comment(s) on;
certainly not exhaustive experience with Squeak.

Cheers,
- Andreas

The only

> system that I am aware of that bit the bullet and went the whole hog is Plan
> 9. The team got the kernel, library and utilities to work with UTF8 as basic
> character unit and wrote about experience:
> http://plan9.bell-labs.com/sys/doc/utf.html
>
> Is there a kernel image that just contains basic Squeak and VMMaker where one
> could try building a UTF-8 Squeak? Smaller the better.
>
> Regards .. Subbu
>
>

Janko Mivšek

Re: UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

In reply to this post by K. K. Subramaniam

I don't know details but I hope that UTF8 Squeak means full Unicode in
image and UTF-8 just on the "borders", to OS, to files etc?

Best regards
Janko

subbukk wrote:

> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
>> Yes, it would seem to simplify matters to use UTF-8 consistently for
>> interfacing between the image and the VM. Instead of the VM picking
>> an encoding and telling the image which one it picked, it could go
>> ahead and convert it to UTF-8.
>>
>> This applies not just to filenames, but every place where text is
>> exchanged between the Smalltalk world and the VM, for example keyboard
>> events and the clipboard.
> This is not an easy job as the assumption of ASCII pervades Squeak. The only
> system that I am aware of that bit the bullet and went the whole hog is Plan
> 9. The team got the kernel, library and utilities to work with UTF8 as basic
> character unit and wrote about experience:
> http://plan9.bell-labs.com/sys/doc/utf.html
>
> Is there a kernel image that just contains basic Squeak and VMMaker where one
> could try building a UTF-8 Squeak? Smaller the better.
>
> Regards .. Subbu
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

K. K. Subramaniam

Re: UTF8 Squeak

On Thursday 07 June 2007 10:00 pm, Janko Mivšek wrote:
> I don't know details but I hope that UTF8 Squeak means full Unicode in
> image and UTF-8 just on the "borders", to OS, to files etc?
Well, UTF8 is just an encoding of Unicode code points, So, Squeak will have to
support Unicode. Its language and tools will need to handle Unicode code
points and UTF8 streams. Internally, whether code points or UTF8 encoding is
used would depend on the context.

Regards .. Subbu

Alan L. Lovejoy

RE: UTF8 Squeak

Each String object should specify its encoding scheme. UTF-8 should be the
default, but all commonly-encounterd encodings should be supported, and
should all be useable at once (in different String instances.) When a
Character is reified from a String, it should use the Unicode code point
values (full 32-bit value.) Ideally, the encoding of a String should be a
function of an associated Strategy object, and not be based on having
different subclasses of String.

Yoshiki Ohshima

Re: UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

In reply to this post by K. K. Subramaniam

Subbu,

At Thu, 7 Jun 2007 19:26:14 +0530,
subbukk wrote:

>
> On Wednesday 06 June 2007 5:54 pm, Lex Spoon wrote:
> > Yes, it would seem to simplify matters to use UTF-8 consistently for
> > interfacing between the image and the VM. Instead of the VM picking
> > an encoding and telling the image which one it picked, it could go
> > ahead and convert it to UTF-8.
> >
> > This applies not just to filenames, but every place where text is
> > exchanged between the Smalltalk world and the VM, for example keyboard
> > events and the clipboard.
> This is not an easy job as the assumption of ASCII pervades Squeak. The only
> system that I am aware of that bit the bullet and went the whole hog is Plan
> 9. The team got the kernel, library and utilities to work with UTF8 as basic
> character unit and wrote about experience:
> http://plan9.bell-labs.com/sys/doc/utf.html

If "this" is the interface between the Smalltalk world and the VM,
it is not that hard thing. There are only three paths for such
interfacing, and you just convert at there.

It might be just a matter of self-defence, but I still think that
the way we did it (i.e., not change the VM first, and rely on the
image level conversion) was the right thing.

Back in 1999:
- we were more concerned about small devices such as MI-series
Zaurus. On that, adding the conversion table from/to Shift-JIS to
Unicode was significant. We seem to care less about obscure
platforms in these days, we care less flabors of Unix, as you
provide the Linux version, it more or less works everywhere. And
Windows, Mac and Linux (alright, only if Tim pretends, Acorn) are
only platforms people care.
- Releasing an image that requires a single version of VM would have
been a mistake. Not all Squeak users was tech savvy. Some users
have restrictions in terms of what they can change on their
computers (at schools and such). Providing working installers for
all major platforms was (still is) a large task.

> Is there a kernel image that just contains basic Squeak and VMMaker where one
> could try building a UTF-8 Squeak? Smaller the better.

Ian might put his vmm-n.n-n image on the squeakvm.org sometime
soon.

-- Yoshiki

Andreas.Raab

Re: UTF8 Squeak (was Re: New Win32 VM [m17n testers needed])

Yoshiki Ohshima wrote:
> It might be just a matter of self-defence, but I still think that
> the way we did it (i.e., not change the VM first, and rely on the
> image level conversion) was the right thing.

Completely agree. With 20/20 hindsight it's easy to say that this should
use UTF-8; back then things weren't quite as clearly cut (for a time,
going "all out UTF-16" in the VM was definitely an option, as seen in
the 2.x WCE VMs). Having these conversions in the image was a very
useful strategy to cope with the reality of encodings out there. OTOH,
it's about time we tie up a few of the loose ends and make them a little
more consistent.

Cheers,
- Andreas

Yoshiki Ohshima

Re: UTF8 Squeak

In reply to this post by K. K. Subramaniam

> Each String object should specify its encoding scheme. UTF-8 should be the
> default, but all commonly-encounterd encodings should be supported, and
> should all be useable at once (in different String instances.) When a
> Character is reified from a String, it should use the Unicode code point
> values (full 32-bit value.) Ideally, the encoding of a String should be a
> function of an associated Strategy object, and not be based on having
> different subclasses of String.

Is this better than using UTF32 throught the image for all Strings?
One reason would be that for some chars in domestic encodings, the
round-trip conversion is not exactly guaranteed; so you can avoid that
problem in this way. But ohter than that, encodings only matters when
the system is interfacing with the outside world. So, the internal
representation can be uniform, I think.

Would you write all comparison methods for all of combinations of
different encodings?

-- Yoshiki

Yoshiki Ohshima

Re: UTF8 Squeak

In reply to this post by K. K. Subramaniam

Subbu,

> > I don't know details but I hope that UTF8 Squeak means full Unicode in
> > image and UTF-8 just on the "borders", to OS, to files etc?
> Well, UTF8 is just an encoding of Unicode code points, So, Squeak will have to
> support Unicode. Its language and tools will need to handle Unicode code
> points and UTF8 streams. Internally, whether code points or UTF8 encoding is
> used would depend on the context.

Why do you get the impression that Squeak doesn't support it?

Using UTF-8 internally throughout the system would be a challenge,
especially thinking about that the overloaded methods like at:,
at:put: and all of these have to be disambiguated as to what it means.

-- Yoshiki

J J-6

RE: UTF8 Squeak

In reply to this post by K. K. Subramaniam

Wouldn't that be a pretty big speed impact given how much strings are used?

>From: "Alan Lovejoy" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: "'The general-purpose Squeak developers
>list'"<[hidden email]>
>Subject: RE: UTF8 Squeak
>Date: Thu, 7 Jun 2007 11:55:02 -0700
>
>Each String object should specify its encoding scheme. UTF-8 should be the
>default, but all commonly-encounterd encodings should be supported, and
>should all be useable at once (in different String instances.) When a
>Character is reified from a String, it should use the Unicode code point
>values (full 32-bit value.) Ideally, the encoding of a String should be a
>function of an associated Strategy object, and not be based on having
>different subclasses of String.

_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01

Janko Mivšek

Re: UTF8 Squeak

In reply to this post by K. K. Subramaniam

Because I'm coming from VisualWorks world, let me explain a bit how the
Unicode support is solved there:

1. internally everything is in 16bit Unicode, without any additionally
encoding info attached to strings
2. there is a class ByteString for pure ASCII(1) and TwoByteString for
Unicode strings. Conversion from Byte to TwoByteString is automatic
when you concatenate two mixed-width strings.
3. streams: external streams(2) are always dealing with
encodings, internal streams never

(1) Strings have actually subclasses for 8 bit encodings like
ISO8859L1String etc. but this seems not used much recently
(2) with help of an EncodedStream as a wrapper of original stream. And
it is helped by StreamEncoders, which actually do en/decoding.
There is quite a number of them, from Base64StreamEncoder to for us
more interesting UTF8StreamEncoder.

I find VW approach very simple and elegant and I think Squeak can solve
Unicode easily by following VW as an example a bit.

Best regards
Janko

Alan Lovejoy wrote:
> Each String object should specify its encoding scheme. UTF-8 should be the
> default, but all commonly-encounterd encodings should be supported, and
> should all be useable at once (in different String instances.) When a
> Character is reified from a String, it should use the Unicode code point
> values (full 32-bit value.) Ideally, the encoding of a String should be a
> function of an associated Strategy object, and not be based on having
> different subclasses of String.

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Yoshiki Ohshima

Re: UTF8 Squeak

It is so true that I should've looked at the class names in VW
before doing everything...

> 1. internally everything is in 16bit Unicode, without any additionally
> encoding info attached to strings

If they use 16-bit per char, how do they deal with surrogated pairs?

> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for
> Unicode strings. Conversion from Byte to TwoByteString is automatic
> when you concatenate two mixed-width strings.

This is what Squeak does with ByteString and WideString.

> 3. streams: external streams(2) are always dealing with
> encodings, internal streams never

In Squeak to do conversion from/to file useMultiByteFileStream. For
memory based strings, use MultiByteBinaryOrTextStream. Or, you can
manually create an instance of TextConverter and write some logic to
pass chars from/to streams.

> (1) Strings have actually subclasses for 8 bit encodings like
> ISO8859L1String etc. but this seems not used much recently

So, as in Squeak, having only ByteString and WideString (with a
common abstract superclass) is better^^;

> (2) with help of an EncodedStream as a wrapper of original stream. And
> it is helped by StreamEncoders, which actually do en/decoding.
> There is quite a number of them, from Base64StreamEncoder to for us
> more interesting UTF8StreamEncoder.

As I wrote, you can write these variation of Streams by youself
quite easily. I admit that there is no framework for it.

> I find VW approach very simple and elegant and I think Squeak can solve
> Unicode easily by following VW as an example a bit.

Thank you for summarizing it!

-- Yoshiki

Janko Mivšek

Re: UTF8 Squeak

Hi Yoshiki,

Yoshiki Ohshima wrote:
> It is so true that I should've looked at the class names in VW
> before doing everything...
>
>> 1. internally everything is in 16bit Unicode, without any additionally
>> encoding info attached to strings
>
> If they use 16-bit per char, how do they deal with surrogated pairs?

I looked once again and there is actually a FourByteString too. This
probably answer your question. VW also support Japanese locale well.

Best regards
Janko

>
>> 2. there is a class ByteString for pure ASCII(1) and TwoByteString for
>> Unicode strings. Conversion from Byte to TwoByteString is automatic
>> when you concatenate two mixed-width strings.
>
> This is what Squeak does with ByteString and WideString.
>
>> 3. streams: external streams(2) are always dealing with
>> encodings, internal streams never
>
> In Squeak to do conversion from/to file useMultiByteFileStream. For
> memory based strings, use MultiByteBinaryOrTextStream. Or, you can
> manually create an instance of TextConverter and write some logic to
> pass chars from/to streams.
>
>> (1) Strings have actually subclasses for 8 bit encodings like
>> ISO8859L1String etc. but this seems not used much recently
>
> So, as in Squeak, having only ByteString and WideString (with a
> common abstract superclass) is better^^;
>
>> (2) with help of an EncodedStream as a wrapper of original stream. And
>> it is helped by StreamEncoders, which actually do en/decoding.
>> There is quite a number of them, from Base64StreamEncoder to for us
>> more interesting UTF8StreamEncoder.
>
> As I wrote, you can write these variation of Streams by youself
> quite easily. I admit that there is no framework for it.
>
>> I find VW approach very simple and elegant and I think Squeak can solve
>> Unicode easily by following VW as an example a bit.
>
> Thank you for summarizing it!
>
> -- Yoshiki
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Yoshiki Ohshima

Re: UTF8 Squeak

Hi, Janko,

> >> 1. internally everything is in 16bit Unicode, without any additionally
> >> encoding info attached to strings
> >
> > If they use 16-bit per char, how do they deal with surrogated pairs?
>
> I looked once again and there is actually a FourByteString too. This
> probably answer your question.

Probably, yes.

So, the question to you is that if you have a system with 8-bit
ByteString and 32-bit WideString in year 2007, would you add a class
to represent 16-bit string to that system?

> VW also support Japanese locale well.

Oh, yes. I know it. In fact, the internationalization of
VisualWorks was done by a company that is my former employee. (The
work was done way before I joined, though). I have seen some apps and
developers of the system.

However, there is a reason to call our stuff m17n, instead of i18n.
It might be still an aspiration to it, but supporting one language at
a time "sort of localed based idea" is not enough for "real"
multilingualization, where you would like to mix strings from
different languages freely.

-- Yoshiki

Janko Mivšek

Re: UTF8 Squeak

Hi Yoshiki,

Yoshiki Ohshima wrote:

>>>> 1. internally everything is in 16bit Unicode, without any additionally
>>>> encoding info attached to strings
>>> If they use 16-bit per char, how do they deal with surrogated pairs?
>> I looked once again and there is actually a FourByteString too. This
>> probably answer your question.
>
> Probably, yes.
>
> So, the question to you is that if you have a system with 8-bit
> ByteString and 32-bit WideString in year 2007, would you add a class
> to represent 16-bit string to that system?

I would say yes, because for most countries 16-bit is enough and 32-bit
is then just a waste of memory. And I just noticed that WideString is
actually fixed to 4 bytes. I would therefore think about renaming it to
ForByteString and add TwoByteString (or similar names). For user these
are always Strings anyway, as SmallIntegers and LargeIntegers are always
Integers.

>
>> VW also support Japanese locale well.
>
> Oh, yes. I know it. In fact, the internationalization of
> VisualWorks was done by a company that is my former employee. (The
> work was done way before I joined, though). I have seen some apps and
> developers of the system.
>
> However, there is a reason to call our stuff m17n, instead of i18n.
> It might be still an aspiration to it, but supporting one language at
> a time "sort of localed based idea" is not enough for "real"
> multilingualization, where you would like to mix strings from
> different languages freely.

I strongly agree and therefore a well thought-out effort to solve i18n
well in Squeak is a must. For me also, because I still need to find out
how to port Aida/Web i18n support to Squeak ...

Best regards
JAnko

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

1234567