about Grease codecs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

about Grease codecs

Paolo Bonzini-2
How are Codecs meant to be implemented exactly?  The tests are not
clear, and it looks like they are supposed to convert only to and from
ISO-8859-1.  If so, that's easy but it also shows that Seaside has no
Slavic or Far-East developer(1)... Can we add a simple factory that
takes two encodings and returns the appropriate codec or codec stream?

Also, a method to obtain a list of supported encodings is in general
not a good idea.  GNU Smalltalk uses iconv and, including all the
aliases, has more than a thousand valid encodings.

Paolo

(1) compare this to Ruby, where they wrote their own regex library to
better support the mess of Japanese encodings!
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Lukas Renggli
> How are Codecs meant to be implemented exactly?  The tests are not
> clear, and it looks like they are supposed to convert only to and from
> ISO-8859-1.

Codecs are implemented like this:

   decoder: something --> internal encoding
   encoder: internal encoding --> something

One of the two encodings is always the internal encoding of the
Smalltalk you are using. In case of Pharo/Squeak this is obviously
ISO-8859-1.

As far as I know Pharo and Squeak has no way to convert between two
arbitrary codecs. You always have to go through the internal encoding
first.

Philippe might have more insight into this.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Paolo Bonzini-3
On Tue, Mar 30, 2010 at 08:29, Lukas Renggli <[hidden email]> wrote:

>> How are Codecs meant to be implemented exactly?  The tests are not
>> clear, and it looks like they are supposed to convert only to and from
>> ISO-8859-1.
>
> Codecs are implemented like this:
>
>   decoder: something --> internal encoding
>   encoder: internal encoding --> something
>
> One of the two encodings is always the internal encoding of the
> Smalltalk you are using. In case of Pharo/Squeak this is obviously
> ISO-8859-1.

Okay, we will use UnicodeStrings in gst except for the null codec.

> As far as I know Pharo and Squeak has no way to convert between two
> arbitrary codecs. You always have to go through the internal encoding
> first.

This doesn't mean Grease couldn't provide a method to do the
triangulation.  It can be twice as fast in other dialects.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Lukas Renggli
>> As far as I know Pharo and Squeak has no way to convert between two
>> arbitrary codecs. You always have to go through the internal encoding
>> first.
>
> This doesn't mean Grease couldn't provide a method to do the
> triangulation.  It can be twice as fast in other dialects.

Yeah, we had long discussions about that last ESUG. AFAI, we came to
the conclusion that in practice this is rarely useful so we won't
support it for now. I see however that adding a factory method for
arbitrary codecs could be cool, if you propose a change we can
certainly integrate that.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Andreas.Raab
In reply to this post by Lukas Renggli
On 3/29/2010 11:29 PM, Lukas Renggli wrote:

>> How are Codecs meant to be implemented exactly?  The tests are not
>> clear, and it looks like they are supposed to convert only to and from
>> ISO-8859-1.
>
> Codecs are implemented like this:
>
>     decoder: something -->  internal encoding
>     encoder: internal encoding -->  something
>
> One of the two encodings is always the internal encoding of the
> Smalltalk you are using. In case of Pharo/Squeak this is obviously
> ISO-8859-1.

I'm probably missing something here but shouldn't this be Unicode?
Squeak (and as a result Pharo) uses Unicode internally, so why would the
internal encoding be "obviously" ISO-8859-1? Doesn't that also imply
that most actual conversions (Mac Roman, UTF-8) are lossy in general?

Cheers,
   - Andreas
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Lukas Renggli
>> One of the two encodings is always the internal encoding of the
>> Smalltalk you are using. In case of Pharo/Squeak this is obviously
>> ISO-8859-1.
>
> I'm probably missing something here but shouldn't this be Unicode? Squeak
> (and as a result Pharo) uses Unicode internally, so why would the internal
> encoding be "obviously" ISO-8859-1?

I am not an expert with encodings, I just assumed that Pharo used
ISO-8859-1 (or a superset thereof) because Paolo said so.

Personally I mostly use the NullCodec, because it avoids the
performance penalty of the transformations and I do not care how the
data is represented within the image.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Paolo Bonzini-3
In reply to this post by Andreas.Raab
On Tue, Mar 30, 2010 at 09:35, Andreas Raab <[hidden email]> wrote:

> On 3/29/2010 11:29 PM, Lukas Renggli wrote:
>>>
>>> How are Codecs meant to be implemented exactly?  The tests are not
>>> clear, and it looks like they are supposed to convert only to and from
>>> ISO-8859-1.
>>
>> Codecs are implemented like this:
>>
>>    decoder: something -->  internal encoding
>>    encoder: internal encoding -->  something
>>
>> One of the two encodings is always the internal encoding of the
>> Smalltalk you are using. In case of Pharo/Squeak this is obviously
>> ISO-8859-1.
>
> I'm probably missing something here but shouldn't this be Unicode? Squeak
> (and as a result Pharo) uses Unicode internally, so why would the internal
> encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual
> conversions (Mac Roman, UTF-8) are lossy in general?

I don't know; I said ISO-8859-1 because that's what I got from
SqueakSource when I downloaded Seaside.

Paolo
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Julian Fitzell-2
In reply to this post by Lukas Renggli
Yeah, we decided to leave really solving encoding issues for 3.1. But
I agree that you should be able to deal with encodings in terms of an
internal/external encoding pair (even if the internal is "native" in
most cases).

The idea was that a "null" encoding would still require to specify an
encoding, it would just be the same on both sides. That way the
external encoding of the adaptor could be used as the default when
setting headers and so on in the request (rather than having to
specify it also as an application setting).

Julian

On Tue, Mar 30, 2010 at 8:07 AM, Lukas Renggli <[hidden email]> wrote:

>>> As far as I know Pharo and Squeak has no way to convert between two
>>> arbitrary codecs. You always have to go through the internal encoding
>>> first.
>>
>> This doesn't mean Grease couldn't provide a method to do the
>> triangulation.  It can be twice as fast in other dialects.
>
> Yeah, we had long discussions about that last ESUG. AFAI, we came to
> the conclusion that in practice this is rarely useful so we won't
> support it for now. I see however that adding a factory method for
> arbitrary codecs could be cool, if you propose a change we can
> certainly integrate that.
>
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
> _______________________________________________
> seaside-dev mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
>
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Julian Fitzell-2
In reply to this post by Paolo Bonzini-3
On Tue, Mar 30, 2010 at 9:01 AM, Paolo Bonzini <[hidden email]> wrote:

> On Tue, Mar 30, 2010 at 09:35, Andreas Raab <[hidden email]> wrote:
>> On 3/29/2010 11:29 PM, Lukas Renggli wrote:
>>>>
>>>> How are Codecs meant to be implemented exactly?  The tests are not
>>>> clear, and it looks like they are supposed to convert only to and from
>>>> ISO-8859-1.
>>>
>>> Codecs are implemented like this:
>>>
>>>    decoder: something -->  internal encoding
>>>    encoder: internal encoding -->  something
>>>
>>> One of the two encodings is always the internal encoding of the
>>> Smalltalk you are using. In case of Pharo/Squeak this is obviously
>>> ISO-8859-1.
>>
>> I'm probably missing something here but shouldn't this be Unicode? Squeak
>> (and as a result Pharo) uses Unicode internally, so why would the internal
>> encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual
>> conversions (Mac Roman, UTF-8) are lossy in general?
>
> I don't know; I said ISO-8859-1 because that's what I got from
> SqueakSource when I downloaded Seaside.

I think the default encoders on squeak may be ISO-8859-1 (I think
because Philippe is always saying there are big problems with
WideStrings). But he's the expert in this area...

Julian
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Andreas.Raab
In reply to this post by Paolo Bonzini-3
On 3/30/2010 1:01 AM, Paolo Bonzini wrote:

> On Tue, Mar 30, 2010 at 09:35, Andreas Raab<[hidden email]>  wrote:
>> On 3/29/2010 11:29 PM, Lukas Renggli wrote:
>>>>
>>>> How are Codecs meant to be implemented exactly?  The tests are not
>>>> clear, and it looks like they are supposed to convert only to and from
>>>> ISO-8859-1.
>>>
>>> Codecs are implemented like this:
>>>
>>>     decoder: something -->    internal encoding
>>>     encoder: internal encoding -->    something
>>>
>>> One of the two encodings is always the internal encoding of the
>>> Smalltalk you are using. In case of Pharo/Squeak this is obviously
>>> ISO-8859-1.
>>
>> I'm probably missing something here but shouldn't this be Unicode? Squeak
>> (and as a result Pharo) uses Unicode internally, so why would the internal
>> encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual
>> conversions (Mac Roman, UTF-8) are lossy in general?
>
> I don't know; I said ISO-8859-1 because that's what I got from
> SqueakSource when I downloaded Seaside.

Easy to find out. Just put a Euro sign into the UTF8Converter and see
what the result is. The Euro sign is U+20AC so try converting from/to
(Character value: 8364) and #[226 130 172] and see if that works.

Cheers,
   - Andreas
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev
Reply | Threaded
Open this post in threaded view
|

Re: about Grease codecs

Andreas.Raab
In reply to this post by Julian Fitzell-2
On 3/30/2010 1:05 AM, Julian Fitzell wrote:
> I think the default encoders on squeak may be ISO-8859-1 (I think
> because Philippe is always saying there are big problems with
> WideStrings). But he's the expert in this area...

He's probably referring to the leading char which is an annoyance but
not a big problem. What you need to do is to "fix" the leading char in
any image that gets shipped around and not allow it to be picked up
dynamically. We've been doing this in our 3.8/Croquet based images and
it works fine.

Cheers,
   - Andreas
_______________________________________________
seaside-dev mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev