How are Codecs meant to be implemented exactly? The tests are not
clear, and it looks like they are supposed to convert only to and from ISO-8859-1. If so, that's easy but it also shows that Seaside has no Slavic or Far-East developer(1)... Can we add a simple factory that takes two encodings and returns the appropriate codec or codec stream? Also, a method to obtain a list of supported encodings is in general not a good idea. GNU Smalltalk uses iconv and, including all the aliases, has more than a thousand valid encodings. Paolo (1) compare this to Ruby, where they wrote their own regex library to better support the mess of Japanese encodings! _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
> How are Codecs meant to be implemented exactly? The tests are not
> clear, and it looks like they are supposed to convert only to and from > ISO-8859-1. Codecs are implemented like this: decoder: something --> internal encoding encoder: internal encoding --> something One of the two encodings is always the internal encoding of the Smalltalk you are using. In case of Pharo/Squeak this is obviously ISO-8859-1. As far as I know Pharo and Squeak has no way to convert between two arbitrary codecs. You always have to go through the internal encoding first. Philippe might have more insight into this. Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
On Tue, Mar 30, 2010 at 08:29, Lukas Renggli <[hidden email]> wrote:
>> How are Codecs meant to be implemented exactly? The tests are not >> clear, and it looks like they are supposed to convert only to and from >> ISO-8859-1. > > Codecs are implemented like this: > > decoder: something --> internal encoding > encoder: internal encoding --> something > > One of the two encodings is always the internal encoding of the > Smalltalk you are using. In case of Pharo/Squeak this is obviously > ISO-8859-1. Okay, we will use UnicodeStrings in gst except for the null codec. > As far as I know Pharo and Squeak has no way to convert between two > arbitrary codecs. You always have to go through the internal encoding > first. This doesn't mean Grease couldn't provide a method to do the triangulation. It can be twice as fast in other dialects. Paolo _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
>> As far as I know Pharo and Squeak has no way to convert between two
>> arbitrary codecs. You always have to go through the internal encoding >> first. > > This doesn't mean Grease couldn't provide a method to do the > triangulation. It can be twice as fast in other dialects. Yeah, we had long discussions about that last ESUG. AFAI, we came to the conclusion that in practice this is rarely useful so we won't support it for now. I see however that adding a factory method for arbitrary codecs could be cool, if you propose a change we can certainly integrate that. Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Lukas Renggli
On 3/29/2010 11:29 PM, Lukas Renggli wrote:
>> How are Codecs meant to be implemented exactly? The tests are not >> clear, and it looks like they are supposed to convert only to and from >> ISO-8859-1. > > Codecs are implemented like this: > > decoder: something --> internal encoding > encoder: internal encoding --> something > > One of the two encodings is always the internal encoding of the > Smalltalk you are using. In case of Pharo/Squeak this is obviously > ISO-8859-1. I'm probably missing something here but shouldn't this be Unicode? Squeak (and as a result Pharo) uses Unicode internally, so why would the internal encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual conversions (Mac Roman, UTF-8) are lossy in general? Cheers, - Andreas _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
>> One of the two encodings is always the internal encoding of the
>> Smalltalk you are using. In case of Pharo/Squeak this is obviously >> ISO-8859-1. > > I'm probably missing something here but shouldn't this be Unicode? Squeak > (and as a result Pharo) uses Unicode internally, so why would the internal > encoding be "obviously" ISO-8859-1? I am not an expert with encodings, I just assumed that Pharo used ISO-8859-1 (or a superset thereof) because Paolo said so. Personally I mostly use the NullCodec, because it avoids the performance penalty of the transformations and I do not care how the data is represented within the image. Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Andreas.Raab
On Tue, Mar 30, 2010 at 09:35, Andreas Raab <[hidden email]> wrote:
> On 3/29/2010 11:29 PM, Lukas Renggli wrote: >>> >>> How are Codecs meant to be implemented exactly? The tests are not >>> clear, and it looks like they are supposed to convert only to and from >>> ISO-8859-1. >> >> Codecs are implemented like this: >> >> decoder: something --> internal encoding >> encoder: internal encoding --> something >> >> One of the two encodings is always the internal encoding of the >> Smalltalk you are using. In case of Pharo/Squeak this is obviously >> ISO-8859-1. > > I'm probably missing something here but shouldn't this be Unicode? Squeak > (and as a result Pharo) uses Unicode internally, so why would the internal > encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual > conversions (Mac Roman, UTF-8) are lossy in general? I don't know; I said ISO-8859-1 because that's what I got from SqueakSource when I downloaded Seaside. Paolo _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Lukas Renggli
Yeah, we decided to leave really solving encoding issues for 3.1. But
I agree that you should be able to deal with encodings in terms of an internal/external encoding pair (even if the internal is "native" in most cases). The idea was that a "null" encoding would still require to specify an encoding, it would just be the same on both sides. That way the external encoding of the adaptor could be used as the default when setting headers and so on in the request (rather than having to specify it also as an application setting). Julian On Tue, Mar 30, 2010 at 8:07 AM, Lukas Renggli <[hidden email]> wrote: >>> As far as I know Pharo and Squeak has no way to convert between two >>> arbitrary codecs. You always have to go through the internal encoding >>> first. >> >> This doesn't mean Grease couldn't provide a method to do the >> triangulation. It can be twice as fast in other dialects. > > Yeah, we had long discussions about that last ESUG. AFAI, we came to > the conclusion that in practice this is rarely useful so we won't > support it for now. I see however that adding a factory method for > arbitrary codecs could be cool, if you propose a change we can > certainly integrate that. > > Lukas > > -- > Lukas Renggli > http://www.lukas-renggli.ch > _______________________________________________ > seaside-dev mailing list > [hidden email] > http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev > seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Paolo Bonzini-3
On Tue, Mar 30, 2010 at 9:01 AM, Paolo Bonzini <[hidden email]> wrote:
> On Tue, Mar 30, 2010 at 09:35, Andreas Raab <[hidden email]> wrote: >> On 3/29/2010 11:29 PM, Lukas Renggli wrote: >>>> >>>> How are Codecs meant to be implemented exactly? The tests are not >>>> clear, and it looks like they are supposed to convert only to and from >>>> ISO-8859-1. >>> >>> Codecs are implemented like this: >>> >>> decoder: something --> internal encoding >>> encoder: internal encoding --> something >>> >>> One of the two encodings is always the internal encoding of the >>> Smalltalk you are using. In case of Pharo/Squeak this is obviously >>> ISO-8859-1. >> >> I'm probably missing something here but shouldn't this be Unicode? Squeak >> (and as a result Pharo) uses Unicode internally, so why would the internal >> encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual >> conversions (Mac Roman, UTF-8) are lossy in general? > > I don't know; I said ISO-8859-1 because that's what I got from > SqueakSource when I downloaded Seaside. I think the default encoders on squeak may be ISO-8859-1 (I think because Philippe is always saying there are big problems with WideStrings). But he's the expert in this area... Julian _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Paolo Bonzini-3
On 3/30/2010 1:01 AM, Paolo Bonzini wrote:
> On Tue, Mar 30, 2010 at 09:35, Andreas Raab<[hidden email]> wrote: >> On 3/29/2010 11:29 PM, Lukas Renggli wrote: >>>> >>>> How are Codecs meant to be implemented exactly? The tests are not >>>> clear, and it looks like they are supposed to convert only to and from >>>> ISO-8859-1. >>> >>> Codecs are implemented like this: >>> >>> decoder: something --> internal encoding >>> encoder: internal encoding --> something >>> >>> One of the two encodings is always the internal encoding of the >>> Smalltalk you are using. In case of Pharo/Squeak this is obviously >>> ISO-8859-1. >> >> I'm probably missing something here but shouldn't this be Unicode? Squeak >> (and as a result Pharo) uses Unicode internally, so why would the internal >> encoding be "obviously" ISO-8859-1? Doesn't that also imply that most actual >> conversions (Mac Roman, UTF-8) are lossy in general? > > I don't know; I said ISO-8859-1 because that's what I got from > SqueakSource when I downloaded Seaside. Easy to find out. Just put a Euro sign into the UTF8Converter and see what the result is. The Euro sign is U+20AC so try converting from/to (Character value: 8364) and #[226 130 172] and see if that works. Cheers, - Andreas _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
In reply to this post by Julian Fitzell-2
On 3/30/2010 1:05 AM, Julian Fitzell wrote:
> I think the default encoders on squeak may be ISO-8859-1 (I think > because Philippe is always saying there are big problems with > WideStrings). But he's the expert in this area... He's probably referring to the leading char which is an annoyance but not a big problem. What you need to do is to "fix" the leading char in any image that gets shipped around and not allow it to be picked up dynamically. We've been doing this in our 3.8/Croquet based images and it works fine. Cheers, - Andreas _______________________________________________ seaside-dev mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/seaside-dev |
Free forum by Nabble | Edit this page |