Smalltalk › Squeak › Squeak VM

Re: [Pharo-dev] Better management of encoding of environment variables

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

Eliot Miranda-2

Re: [Pharo-dev] Better management of encoding of environment variables

Hi Sven,

On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <[hidden email]> wrote:

Still, one of the conclusions of previous discussions about the encoding of environment variables was/is that there is no single correct solution. OS's are not consistent in how the encoding is done in all (historical) contexts (like sometimes, 1 env var defines the encoding to use for others, different applications do different things, and other such nice stuff), and certainly not across platforms.

So this is really complex.

Do we want to hide this in some obscure VM C code that very few people can see, read, let alone help with ?

The image side is perfectly capable of dealing with platform differences in a clean/clear way, and at least we can then use the full power of our language and our tools.

Agreed. At the same time I think it is very important that we don't reply on the FFI for environment variable access. This is a basic cross-platform facility. So I would like to see the environment accessed through primitives, but have the image place interpretation on the result of the primitive(s), and have the primitive(s) answer a raw result, just a sequence of uninterpreted bytes.

VisualWorks takes this approach and provides a class UninterpretedBytes that the VM is aware of. That's always seemed like an ugly name and overkill to me. I would just use ByteArray and provide image level conversion from ByteArray to String, which is what I believe we have anyway.

> On 16 Jan 2019, at 10:59, Guillermo Polito <[hidden email]> wrote:
>
> Hi Nicolas,
>
> On Wed, Jan 16, 2019 at 10:25 AM Nicolas Cellier <[hidden email]> wrote:
> IMO, windows VM (and plugins) should do the UCS2 -> UTF8 conversion because the purpose of a VM is to provide an OS independant façade.
> I made progress recently in this area, but we should finish the job/test/consolidate.
>
> I'm following your changes for windows from the shadows and I think they are awesome :).
>
> If someone bypass the VM and use direct windows API thru FFI, then he takes the responsibility, but uniformity doesn't hurt.
>
> So far we are using FFI for this, as you say we create first Win32WideStrings from utf8 strings and then we use ffi calls to the *W functions.
> I don't think we can make it for Pharo7.0.0. The cycle to build, do some acceptance tests, and then bless a new VM as stable is far too long for our inminent release :).
>
> But this could be for a 7.1.0, and if you like I can surely give a hand on this.
>
> Guille

_,,,^..^,,,_

best, Eliot

Nicolas Cellier

Re: [Pharo-dev] Better management of encoding of environment variables

Le mer. 16 janv. 2019 à 23:23, Eliot Miranda <[hidden email]> a écrit :

Hi Sven,

On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <[hidden email]> wrote:
Still, one of the conclusions of previous discussions about the encoding of environment variables was/is that there is no single correct solution. OS's are not consistent in how the encoding is done in all (historical) contexts (like sometimes, 1 env var defines the encoding to use for others, different applications do different things, and other such nice stuff), and certainly not across platforms.

So this is really complex.

Do we want to hide this in some obscure VM C code that very few people can see, read, let alone help with ?

The image side is perfectly capable of dealing with platform differences in a clean/clear way, and at least we can then use the full power of our language and our tools.

Agreed. At the same time I think it is very important that we don't reply on the FFI for environment variable access. This is a basic cross-platform facility. So I would like to see the environment accessed through primitives, but have the image place interpretation on the result of the primitive(s), and have the primitive(s) answer a raw result, just a sequence of uninterpreted bytes.

VisualWorks takes this approach and provides a class UninterpretedBytes that the VM is aware of. That's always seemed like an ugly name and overkill to me. I would just use ByteArray and provide image level conversion from ByteArray to String, which is what I believe we have anyway.

What's important is to create abstract layers that insulate the un-needed complexity in lowest layers possible.

The VM excels at insulating of course.

At image side we have to assume the responsibility of not leaking too much by ourself.

As Eliot said, right now the VM (and FFI) just take sequences of uninterpreted bytes (ByteArray) and pass them to API.

The conversion ByteString/WideString <-> specifically-encoded ByteArray is performed at image side.

With FFI, we could eventually make this conversion platform specific instead of always UTF8.

The purpose would be to reduce back and forth conversions in chained API calls for example.

For sanity, then better follow those rules:

- the image does not attempt direct interaction with these opaque data (other than thru OS API)

- nor preserve them across snapshots.

Beware, conversion is not platform specific, but can be library specific (some library on windows will take UTF8).

So we may reify the library and always double dispatch to the library, or we create upper level abstract messages that may chain several low level OS API calls.

We would thus let complexity creep one more level, but only if we have good reason to do so.

We don't want to trade uniformity for small gains.

BTW, note that the xxxW API is already a huge uniformisation progress compared to the code-page specific xxxA API!

Another strategy is to create more complex abstractions (i.e. parameterized) that can deal with a zoo of different underlying conventions.

For example, this would be the EncodedString of VW.

This strategy could be tempting, because it enables dealing with lower level platform-specific-encoded objects and still interact with them in the image transparently.

But I strongly advise to think twice (or more) before introducing such complexity:

- it breaks former invariants (thus potentially lot of code)

- complexity tends to spread in many places

I don't recommend it.

PS: oups, sorry for out of band message, I wanted to send, but it seems that I did not press the button properly...

> On 16 Jan 2019, at 10:59, Guillermo Polito <[hidden email]> wrote:
>
> Hi Nicolas,
>
> On Wed, Jan 16, 2019 at 10:25 AM Nicolas Cellier <[hidden email]> wrote:
> IMO, windows VM (and plugins) should do the UCS2 -> UTF8 conversion because the purpose of a VM is to provide an OS independant façade.
> I made progress recently in this area, but we should finish the job/test/consolidate.
>
> I'm following your changes for windows from the shadows and I think they are awesome :).
>
> If someone bypass the VM and use direct windows API thru FFI, then he takes the responsibility, but uniformity doesn't hurt.
>
> So far we are using FFI for this, as you say we create first Win32WideStrings from utf8 strings and then we use ffi calls to the *W functions.
> I don't think we can make it for Pharo7.0.0. The cycle to build, do some acceptance tests, and then bless a new VM as stable is far too long for our inminent release :).
>
> But this could be for a 7.1.0, and if you like I can surely give a hand on this.
>
> Guille

--
_,,,^..^,,,_
best, Eliot