Smalltalk › Squeak › Squeak VM

accents in unix using non-english languages

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

José "L. Redrejo" Rodríguez

accents in unix using non-english languages

Hi
I've been trying to use accents and non-english characters (as the
spanish ñ) with the vm, and up to now the behaviour is totally erratic
and with some regretions. Maybe the developing is only be made in
english and it's not being tested in other languages, maybe the
documentation is not clear enough, or maybe I'm just useless...

I brief:
- Using vm 3.9-9 and below up to 3.7, the accents and other characters
worked when setting the environment variable LC_ALL to 'es_ES' (using
UTF-8 it didn't work). But with this image drag & drop of external files
with accents in their names, or using the file list to open them didn't
work (tested only with gif & jpeg files)
- Using vm 3.9-12 I can open files with accents in their names using the
file list, but drag & drop don't work. And now the keyboard doesn't work
with accents or other non-english characters. After loading Bert
Changeset
(http://lists.squeakfoundation.org/pipermail/vm-dev/2007-March/001046.html) I can write ñ and accents, but they are printed before the vowel instead of over the vowel. I. e: 'o instead of ó.

I've wasted a lot of time playing with the -encoding, -pathenc and
-textenc options without any success.

So my questions are:
- is this totally broken ?
- is anybody else aware of this behaviour?
- haven't I been able to find the right documentation to make it work
and it should work fine?

Regards.
José L.

signature.asc (196 bytes) Download Attachment

Bert Freudenberg

Re: accents in unix using non-english languages

On Sep 28, 2007, at 19:36 , José L. Redrejo Rodríguez wrote:

> Hi
> I've been trying to use accents and non-english characters (as the
> spanish ñ) with the vm, and up to now the behaviour is totally erratic
> and with some regretions. Maybe the developing is only be made in
> english and it's not being tested in other languages, maybe the
> documentation is not clear enough, or maybe I'm just useless...
>
> I brief:
> - Using vm 3.9-9 and below up to 3.7, the accents and other
> characters
> worked when setting the environment variable LC_ALL to 'es_ES' (using
> UTF-8 it didn't work). But with this image drag & drop of external
> files
> with accents in their names, or using the file list to open them
> didn't
> work (tested only with gif & jpeg files)
> - Using vm 3.9-12 I can open files with accents in their names
> using the
> file list, but drag & drop don't work.

File name encoding in DnD might not yet work, this is pretty new code.

> And now the keyboard doesn't work
> with accents or other non-english characters. After loading Bert
> Changeset
> (http://lists.squeakfoundation.org/pipermail/vm-dev/2007-March/
> 001046.html) I can write ñ and accents, but they are printed before
> the vowel instead of over the vowel. I. e: 'o instead of ó.

Strange - I have never seen this.

> I've wasted a lot of time playing with the -encoding, -pathenc and
> -textenc options without any success.
>
> So my questions are:
> - is this totally broken ?

Given an arbitrary VM/image combo, yes.

> - is anybody else aware of this behaviour?

Yes. We intend to fix it for the OLPC version (https://dev.laptop.org/
ticket/3343), but have not yet gotten around to.

> - haven't I been able to find the right documentation to make it work
> and it should work fine?

I believe the VM side should actually be fine - what is lacking is a
way to ensure that VM and image agree on a common encoding. There
were some proposals in the last couple of years but none got adopted.
So it's the user who must ensure it works, and this indeed is
undocumented.

Perhaps a simple step to clean up the mess is adding three new system
attributes to allow the image to know the current VM encoding for
keyboard, clipboard, and file names. Then fix the image to set
converters using these attributes, instead of second-guessing from
the platform string as it is done now.

- Bert -

Andreas.Raab

Re: accents in unix using non-english languages

Bert Freudenberg wrote:
> I believe the VM side should actually be fine - what is lacking is a way
> to ensure that VM and image agree on a common encoding. There were some
> proposals in the last couple of years but none got adopted. So it's the
> user who must ensure it works, and this indeed is undocumented.

What's wrong with universally adopting UTF-8? The Windows VM does that
now and it seems far superior to having to deal with tons of different
encodings.

> Perhaps a simple step to clean up the mess is adding three new system
> attributes to allow the image to know the current VM encoding for
> keyboard, clipboard, and file names. Then fix the image to set
> converters using these attributes, instead of second-guessing from the
> platform string as it is done now.

Perhaps a simpler step is to assume everything is UTF-8 and give people
a little CS which turns all these converters under all circumstances to
UTF-8?

Cheers,
- Andreas

José "L. Redrejo" Rodríguez

Re: accents in unix using non-english languages

In reply to this post by Bert Freudenberg

El vie, 28-09-2007 a las 20:11 +0200, Bert Freudenberg escribió:

> On Sep 28, 2007, at 19:36 , José L. Redrejo Rodríguez wrote:
>
> > Hi
> > I've been trying to use accents and non-english characters (as the
> > spanish ñ) with the vm, and up to now the behaviour is totally erratic
> > and with some regretions. Maybe the developing is only be made in
> > english and it's not being tested in other languages, maybe the
> > documentation is not clear enough, or maybe I'm just useless...
> >
> > I brief:
> > - Using vm 3.9-9 and below up to 3.7, the accents and other
> > characters
> > worked when setting the environment variable LC_ALL to 'es_ES' (using
> > UTF-8 it didn't work). But with this image drag & drop of external
> > files
> > with accents in their names, or using the file list to open them
> > didn't
> > work (tested only with gif & jpeg files)
> > - Using vm 3.9-12 I can open files with accents in their names
> > using the
> > file list, but drag & drop don't work.
>
> File name encoding in DnD might not yet work, this is pretty new code.

In fact, it works.. if typing accents does not work, and it doesn't work
if typing accents works. Depending on the setup of the environment
variables one of the things work and the other doesn't, but I haven't
been able to find the perfect setup to make it all work.

>
> > And now the keyboard doesn't work
> > with accents or other non-english characters. After loading Bert
> > Changeset
> > (http://lists.squeakfoundation.org/pipermail/vm-dev/2007-March/
> > 001046.html) I can write ñ and accents, but they are printed before
> > the vowel instead of over the vowel. I. e: 'o instead of ó.
>
> Strange - I have never seen this.
>

This is the environment where I can reproduce this behaviour:
- gnome with the variables LANG=es_ES.UTF-8, LC_TYPE=LC_ALL=""
- spanish keyboard
-default locale=es_ES.UTF-8 (also generated es_ES ISO-8859-1 but not set
as default)
- etoys-dev-2.1-1575.image
- command-line: /usr/lib/squeak/3.9-12/squeak
etoys-dev-2.1-1575.image

tell me if you need more data. I suppose you can reproduce it with an
english keyboard too, but don't know where accents are for it.

> > I've wasted a lot of time playing with the -encoding, -pathenc and
> > -textenc options without any success.
> >
> > So my questions are:
> > - is this totally broken ?
>
> Given an arbitrary VM/image combo, yes.
>
> > - is anybody else aware of this behaviour?
>
> Yes. We intend to fix it for the OLPC version (https://dev.laptop.org/
> ticket/3343), but have not yet gotten around to.
>
> > - haven't I been able to find the right documentation to make it work
> > and it should work fine?
>
> I believe the VM side should actually be fine - what is lacking is a
> way to ensure that VM and image agree on a common encoding. There
> were some proposals in the last couple of years but none got adopted.
> So it's the user who must ensure it works, and this indeed is
> undocumented.
>
> Perhaps a simple step to clean up the mess is adding three new system
> attributes to allow the image to know the current VM encoding for
> keyboard, clipboard, and file names. Then fix the image to set
> converters using these attributes, instead of second-guessing from
> the platform string as it is done now.
>

I fully agree with Andreas comments: nowadays the logical approach is
assuming everything is UTF-8.

Regards.
José L.

signature.asc (196 bytes) Download Attachment

Bert Freudenberg

Re: accents in unix using non-english languages

In reply to this post by Andreas.Raab

On Sep 28, 2007, at 20:25 , Andreas Raab wrote:

> Bert Freudenberg wrote:
>> I believe the VM side should actually be fine - what is lacking is
>> a way to ensure that VM and image agree on a common encoding.
>> There were some proposals in the last couple of years but none got
>> adopted. So it's the user who must ensure it works, and this
>> indeed is undocumented.
>
> What's wrong with universally adopting UTF-8? The Windows VM does
> that now and it seems far superior to having to deal with tons of
> different encodings.
>
>> Perhaps a simple step to clean up the mess is adding three new
>> system attributes to allow the image to know the current VM
>> encoding for keyboard, clipboard, and file names. Then fix the
>> image to set converters using these attributes, instead of second-
>> guessing from the platform string as it is done now.
>
> Perhaps a simpler step is to assume everything is UTF-8 and give
> people a little CS which turns all these converters under all
> circumstances to UTF-8?

This would break compatibility with older images - until new we tried
to preserve backwards-compatibility.

For my specific purposes, switching to UTF-8 in general would be fine
- what about others?

- Bert -

johnmci

Re: accents in unix using non-english languages

Sophie uses UTF-8.
Plopp uses UTF-8.
Scratch uses macroman

As mentions squeak tools (file browser) then are broken if you use
UTF8 because the UTF8 -> latin1 translation in the tool doesn't work
as expected.

As noted for Windows, drag and drop file names aren't correctly done
on the mac either, (this I should fix someday).
Sophie actually does the proper translation via FFI calls to fix that
issue as a workaround.

On Sep 28, 2007, at 2:07 PM, Bert Freudenberg wrote:

>
> On Sep 28, 2007, at 20:25 , Andreas Raab wrote:
>
>> Bert Freudenberg wrote:
>>> I believe the VM side should actually be fine - what is lacking
>>> is a way to ensure that VM and image agree on a common encoding.
>>> There were some proposals in the last couple of years but none
>>> got adopted. So it's the user who must ensure it works, and this
>>> indeed is undocumented.
>>
>> What's wrong with universally adopting UTF-8? The Windows VM does
>> that now and it seems far superior to having to deal with tons of
>> different encodings.
>>
>>> Perhaps a simple step to clean up the mess is adding three new
>>> system attributes to allow the image to know the current VM
>>> encoding for keyboard, clipboard, and file names. Then fix the
>>> image to set converters using these attributes, instead of second-
>>> guessing from the platform string as it is done now.
>>
>> Perhaps a simpler step is to assume everything is UTF-8 and give
>> people a little CS which turns all these converters under all
>> circumstances to UTF-8?
>
> This would break compatibility with older images - until new we
> tried to preserve backwards-compatibility.
>
> For my specific purposes, switching to UTF-8 in general would be
> fine - what about others?
>
> - Bert -
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Andreas.Raab

Re: accents in unix using non-english languages

In reply to this post by Bert Freudenberg

Bert Freudenberg wrote:
>> Perhaps a simpler step is to assume everything is UTF-8 and give
>> people a little CS which turns all these converters under all
>> circumstances to UTF-8?
>
> This would break compatibility with older images - until new we tried to
> preserve backwards-compatibility.

I thought about this for roughly two seconds before changing the Windows
VM and decided to break compatibility for the following reasons:
* There are older VMs available, so if someone really needs the old
encodings use one of those (or recompile from that code base)
* This has never worked reliably to begin with. Neither the VMs nor the
images were fully encoding-aware. In addition, the default encoding in
the image was changed in 3.8. If *anyone* out there would have actually
used it we'd be getting complaints about this left and right.
* The changes are trivial to fold back into older images if that is
desirable.
* It hugely simplifies the VM code that deals with stuff coming from
Squeak - there is only one path to take and if an function in the VM
doesn't take its input as UTF-8 then it is broken and needs fixing.

The above were reason enough for me to break compatibility.

Cheers,
- Andreas