Hi,
I ran into a encoding problem. I'm using seaside together with Glorp. For the web server I use WAKomEncoded39. WAKomEncoded39 converts the output to the browser to utf-8. But on incoming requests the url escaped characters are translated to something different. For me it appears to be latin-1 but I've no glue why it should be that way. I detected it because my postgresql session has client encoding utf-8 turned on and I get an error trying to store strings containing characters like ö. I read on the net that this has something to do with 3.9. Is this still true? Is there a way to make it run or is the only way to go back to 3.8? thanks in advance, Norbert _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
2007/2/28, Norbert Hartl <[hidden email]>:
> Hi, > > I ran into a encoding problem. I'm using seaside together > with Glorp. For the web server I use WAKomEncoded39. > WAKomEncoded39 converts the output to the browser to utf-8. > But on incoming requests the url escaped characters are > translated to something different. For me it appears to > be latin-1 but I've no glue why it should be that way. > I detected it because my postgresql session has client > encoding utf-8 turned on and I get an error trying to > store strings containing characters like ö. (new) Squeak encoding in your image which is basically non-unified unicode. For latin-1 characters this will be indistinguishable from latin-1. If your database is utf-8 you need to encode your strings to utf-8 when writing them to your database and decode your strings from utf-8 when reading from the database (only to convert it back to utf-8 when generating html). You can configure the PostgreS database driver to do this automatically for you. An other option is to have utf-8 strings in your image. On Squeak 3.9 this requires WAKom and a modified version of KomHttpServer not publicly available. This has the advantage that you don't need to do encoding conversion it has however the disadvantage that it won't work with the debugger, #size doesn't work and directly indexing into the string (creating substrings) won't work too. Additionally you need to convert you string literals to utf-8 (unless they're ascii). Cheers Philippe > I read on the net that this has something to do with 3.9. > Is this still true? Is there a way to make it run or is > the only way to go back to 3.8? > > thanks in advance, > > Norbert > > > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote:
> 2007/2/28, Norbert Hartl <[hidden email]>: > > Hi, > > > > I ran into a encoding problem. I'm using seaside together > > with Glorp. For the web server I use WAKomEncoded39. > > WAKomEncoded39 converts the output to the browser to utf-8. > > But on incoming requests the url escaped characters are > > translated to something different. For me it appears to > > be latin-1 but I've no glue why it should be that way. > > I detected it because my postgresql session has client > > encoding utf-8 turned on and I get an error trying to > > store strings containing characters like ö. > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > (new) Squeak encoding in your image which is basically non-unified > unicode. For latin-1 characters this will be indistinguishable from > latin-1. If your database is utf-8 you need to encode your strings to > utf-8 when writing them to your database and decode your strings from > utf-8 when reading from the database (only to convert it back to utf-8 > when generating html). You can configure the PostgreS database driver > to do this automatically for you. > in the Postgres driver. Do you have any hint? Norbert _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
On 2/27/07, Philippe Marschall <[hidden email]> wrote:
> An other option is to have utf-8 strings in your image. On Squeak 3.9 > this requires WAKom and a modified version of KomHttpServer not > publicly available. What changes were needed? Can you post them? > This has the advantage that you don't need to do > encoding conversion it has however the disadvantage that it won't work > with the debugger, #size doesn't work and directly indexing into the > string (creating substrings) won't work too. Additionally you need to > convert you string literals to utf-8 (unless they're ascii). ... exactly as in Squeak 3.7, right? Avi _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
I took a quick look at the request processing and I don't see where utf-8 stuff gets decoded. AFAICS, it just doesn't do it - thus producing a one byte to a character transformation, but maybe I'm missing something.
I have done a LOT of this stuff (formerly chief architect at a web I18N company). There are a few things that are not so intuitive when dealing with encodings and http requests. Escape sequences escape bytes, not characters. On pass 1, you assume you have latin-1, parse the header and get the content-type and associated charset. Remember this for later translation. Build a byte array from the string by putting ascii characters in as bytes. Decode escape sequences into single bytes as you go. Convert the byte array to a string by reading bytes and composing them into code points according to the encoding specified as the charset in the content-type. For utf-8 this means reading a byte, checking the high order bits to find out the length of the byte sequence, then reading the rest of the sequence, composing the code point, etc... Now you have text - start over and parse as normal. Some of these steps can be folded but conceptually, this is how it works. So I don't think WAKomEncoding39 is doing the right thing wrt to request processing AFAICS. -Todd Blanchard On Feb 27, 2007, at 3:26 PM, Philippe Marschall wrote:
_______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Avi Bryant-2
2007/2/28, Avi Bryant <[hidden email]>:
> On 2/27/07, Philippe Marschall <[hidden email]> wrote: > > > An other option is to have utf-8 strings in your image. On Squeak 3.9 > > this requires WAKom and a modified version of KomHttpServer not > > publicly available. > > What changes were needed? Can you post them? The basic problem is that #unescapePercents changed semantics from Squeak 3.8 to 3.9. To work around that you need to change the sends from #unescapePercents to #unescapePercentsWithTextEncoding: nil in HttpRequest >> #initStatusString: and HttpRequest class >> #decodeUrlEncodedForm:multipleValues: > > This has the advantage that you don't need to do > > encoding conversion it has however the disadvantage that it won't work > > with the debugger, #size doesn't work and directly indexing into the > > string (creating substrings) won't work too. Additionally you need to > > convert you string literals to utf-8 (unless they're ascii). > > ... exactly as in Squeak 3.7, right? Exactly > Avi > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by tblanchard
2007/2/28, Todd Blanchard <[hidden email]>:
> I took a quick look at the request processing and I don't see where utf-8 > stuff gets decoded. AFAICS, it just doesn't do it - thus producing a one > byte to a character transformation, but maybe I'm missing something. #unescapePercents does utf-8 decoding. > I have done a LOT of this stuff (formerly chief architect at a web I18N > company). There are a few things that are not so intuitive when dealing > with encodings and http requests. > > Escape sequences escape bytes, not characters. > > On pass 1, you assume you have latin-1, parse the header and get the > content-type and associated charset. Remember this for later translation. We don't do that. We assume either you are running utf-8 or you don't want any translation taking place. > Build a byte array from the string by putting ascii characters in as bytes. > Decode escape sequences into single bytes as you go. > > Convert the byte array to a string by reading bytes and composing them into > code points according to the encoding specified as the charset in the > content-type. For utf-8 this means reading a byte, checking the high order > bits to find out the length of the byte sequence, then reading the rest of > the sequence, composing the code point, etc... > > Now you have text - start over and parse as normal. > > Some of these steps can be folded but conceptually, this is how it works. > > So I don't think WAKomEncoding39 is doing the right thing wrt to request > processing AFAICS. > > -Todd Blanchard > > > On Feb 27, 2007, at 3:26 PM, Philippe Marschall wrote: > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > (new) Squeak encoding in your image which is basically non-unified > > unicode. For latin-1 characters this will be indistinguishable from > > latin-1. > > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > > Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by NorbertHartl
2007/2/28, Norbert Hartl <[hidden email]>:
> On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > Hi, > > > > > > I ran into a encoding problem. I'm using seaside together > > > with Glorp. For the web server I use WAKomEncoded39. > > > WAKomEncoded39 converts the output to the browser to utf-8. > > > But on incoming requests the url escaped characters are > > > translated to something different. For me it appears to > > > be latin-1 but I've no glue why it should be that way. > > > I detected it because my postgresql session has client > > > encoding utf-8 turned on and I get an error trying to > > > store strings containing characters like ö. > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > (new) Squeak encoding in your image which is basically non-unified > > unicode. For latin-1 characters this will be indistinguishable from > > latin-1. If your database is utf-8 you need to encode your strings to > > utf-8 when writing them to your database and decode your strings from > > utf-8 when reading from the database (only to convert it back to utf-8 > > when generating html). You can configure the PostgreS database driver > > to do this automatically for you. > > > Oh, this seems quite easy. But I didn't found anything to configure > in the Postgres driver. Do you have any hint? TestPGConnection >> #testFieldConverter You need to register a field converter for your string types that does #convertFromEncoding: #utf8 Sorry that does only do the decoding and not the encoding. I guess in your case Glorp does the encoding. I don't know how you can customize the Sql generation there but it everything else fails you can change PGConnection >> #execute (yes, this is a hack) sql := sqlString. to sql := sqlString convertToEncoding: #utf8. Philippe P.S.: PGConnection >> class #buildDefaultFieldConverters has given us a lot of pain because Squeak doesn't have full block closures > Norbert > > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, 2007-02-28 at 10:03 +0100, Philippe Marschall wrote:
> 2007/2/28, Norbert Hartl <[hidden email]>: > > On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > > Hi, > > > > > > > > I ran into a encoding problem. I'm using seaside together > > > > with Glorp. For the web server I use WAKomEncoded39. > > > > WAKomEncoded39 converts the output to the browser to utf-8. > > > > But on incoming requests the url escaped characters are > > > > translated to something different. For me it appears to > > > > be latin-1 but I've no glue why it should be that way. > > > > I detected it because my postgresql session has client > > > > encoding utf-8 turned on and I get an error trying to > > > > store strings containing characters like ö. > > > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > > (new) Squeak encoding in your image which is basically non-unified > > > unicode. For latin-1 characters this will be indistinguishable from > > > latin-1. If your database is utf-8 you need to encode your strings to > > > utf-8 when writing them to your database and decode your strings from > > > utf-8 when reading from the database (only to convert it back to utf-8 > > > when generating html). You can configure the PostgreS database driver > > > to do this automatically for you. > > > > > Oh, this seems quite easy. But I didn't found anything to configure > > in the Postgres driver. Do you have any hint? > > PGConnection >> class #buildDefaultFieldConverters > TestPGConnection >> #testFieldConverter > > You need to register a field converter for your string types that does > #convertFromEncoding: #utf8 > the string it comes as utf-8 from the database und gets encoded a second time by WAKomEncoded39 which has no effect. > Sorry that does only do the decoding and not the encoding. I guess in > your case Glorp does the encoding. I don't know how you can customize > the Sql generation there but it everything else fails you can change > PGConnection >> #execute (yes, this is a hack) > I don't think Glorp does encoding and I think it shouldn't. Glorp should be happy with strings. If there is conversion happening it should happen in the postgres driver (it is the only one who could know which encoding is needed for the database). My strings are carried by ByteString. It seems that ByteString (got from WAKomEncoded39) contains a bunch of bytes with any encoding ( ok, it is the non-unified unicode, you said, and i don't know what that means :) ). I can convert it with convertToEncoding: to another encoding still using ByteString. But there is no information about encoding in the object. I think this is really dangerous. I have to look at WideString. I'm curious how those deal with encodings they are created from. I think there are only two possibilities. Handle it like Java, Lisp and convert every encoding to the internal (UCS-2) on string creation. The other option which would be easier (i think) is to add the character encoding information into the string. What do you think? > sql := sqlString. > to > sql := sqlString convertToEncoding: #utf8. > The hack is actually adding the conversion to SqueakDatabaseAccessor>>basicExecuteSQLString: I understand a lot more now. Thanks very much. Norbert > P.S.: > PGConnection >> class #buildDefaultFieldConverters > has given us a lot of pain because Squeak doesn't have full block closures > Oh, wow, another day hearing a lot of basic things I don't have any idea about :) What are "full" block closures? _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by NorbertHartl
On Wed, 28 Feb 2007, Norbert Hartl wrote: >> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with >> (new) Squeak encoding in your image which is basically non-unified >> unicode. For latin-1 characters this will be indistinguishable from >> latin-1. If your database is utf-8 you need to encode your strings to >> utf-8 when writing them to your database and decode your strings from >> utf-8 when reading from the database (only to convert it back to utf-8 >> when generating html). You can configure the PostgreS database driver >> to do this automatically for you. >> > Oh, this seems quite easy. But I didn't found anything to configure > in the Postgres driver. Do you have any hint? Postgres supports communication with in various encodings, you can tell it what encoding are you using with the sql command SET CLIENT_ENCODING TO "encoding here"; look in postgres docs for all supported encodings rado _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, 2007-02-28 at 12:25 +0100, radoslav hodnicak wrote:
> > On Wed, 28 Feb 2007, Norbert Hartl wrote: > > >> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > >> (new) Squeak encoding in your image which is basically non-unified > >> unicode. For latin-1 characters this will be indistinguishable from > >> latin-1. If your database is utf-8 you need to encode your strings to > >> utf-8 when writing them to your database and decode your strings from > >> utf-8 when reading from the database (only to convert it back to utf-8 > >> when generating html). You can configure the PostgreS database driver > >> to do this automatically for you. > >> > > Oh, this seems quite easy. But I didn't found anything to configure > > in the Postgres driver. Do you have any hint? > > Postgres supports communication with in various encodings, you can tell it > what encoding are you using with the sql command > > SET CLIENT_ENCODING TO "encoding here"; > > look in postgres docs for all supported encodings > which the squeak strings are made off :) Btw. I really like to have utf-8 encoding. It is a good way to have a "common" way to do these things. So this way round I need a way to convert it that way. I think the postgres driver should be capable of: - If the driver is requested to do a specific format the driver should try to negotiate that with the database - If the driver is not requested to use a specific format the database driver should be capable of converting data in the encoding the database is using. - Or alternatively if no encoding is requested the driver may use a "default" encoding which is known to be supported on the database side as well as on the squeak side so that conversion can take place. But I don't know which effects client encoding has ater all. For me it appears only as the communication encoding between client and database. I assume I can have client encoding latin-1 and send a latin-1 string. And if I send the same string as utf-8 in client encoding utf-8 it will be the same string on the database side. Is this correct? Norbert _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
Hi,
I tought I had 'corrupted' 3.9 images but it seems to be a general issue for 3.9. I had made the hack as Philippe said. Actually I changed the Dialect class>>basicIsSqueak test so that it pass well during installation (to get the SmalltalkImage things and not Smalltalk); I added a Dialect class>>basicIsSqueak39 (true for SystemVersion number > 7010); I subclass SqueakDatabaseAccessor to Squeak39DatabaseAccessor to modify the instance method #basicExecuteSQLString to say: result := connection execute: (aString asWideString convertToEncoding: 'utf-8'). It works well. I did it to test Ramon Leon's Seaside Blog but I didn't post this because I wasn't sure it was a common problem and because of the ugliness of the string conversion. I attach my two mods. -- Martial Philippe Marschall a écrit : | 2007/2/28, Norbert Hartl <[hidden email]>: | >On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: | >> 2007/2/28, Norbert Hartl <[hidden email]>: | >> > Hi, | >> > | >> > I ran into a encoding problem. I'm using seaside together | >> > with Glorp. For the web server I use WAKomEncoded39. | >> > WAKomEncoded39 converts the output to the browser to utf-8. | >> > But on incoming requests the url escaped characters are | >> > translated to something different. For me it appears to | >> > be latin-1 but I've no glue why it should be that way. | >> > I detected it because my postgresql session has client | >> > encoding utf-8 turned on and I get an error trying to | >> > store strings containing characters like ö. | >> | >> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with | >> (new) Squeak encoding in your image which is basically non-unified | >> unicode. For latin-1 characters this will be indistinguishable from | >> latin-1. If your database is utf-8 you need to encode your strings to | >> utf-8 when writing them to your database and decode your strings from | >> utf-8 when reading from the database (only to convert it back to utf-8 | >> when generating html). You can configure the PostgreS database driver | >> to do this automatically for you. | >> | >Oh, this seems quite easy. But I didn't found anything to configure | >in the Postgres driver. Do you have any hint? | | PGConnection >> class #buildDefaultFieldConverters | TestPGConnection >> #testFieldConverter | | You need to register a field converter for your string types that does | #convertFromEncoding: #utf8 | | Sorry that does only do the decoding and not the encoding. I guess in | your case Glorp does the encoding. I don't know how you can customize | the Sql generation there but it everything else fails you can change | PGConnection >> #execute (yes, this is a hack) | | sql := sqlString. | to | sql := sqlString convertToEncoding: #utf8. | | Philippe | | P.S.: | PGConnection >> class #buildDefaultFieldConverters | has given us a lot of pain because Squeak doesn't have full block closures | | >Norbert | > | >_______________________________________________ | >Seaside mailing list | >[hidden email] | >http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside | > | _______________________________________________ | Seaside mailing list | [hidden email] | http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
2007/2/28, Martial Boniou <[hidden email]>:
> Hi, > > I tought I had 'corrupted' 3.9 images but it seems to be a general issue > for 3.9. I had made the hack as Philippe said. Actually I changed the > Dialect class>>basicIsSqueak test so that it pass well during > installation (to get the SmalltalkImage things and not Smalltalk); I > added a Dialect class>>basicIsSqueak39 (true for SystemVersion number > > 7010); I subclass SqueakDatabaseAccessor to Squeak39DatabaseAccessor to > modify the instance method #basicExecuteSQLString to say: > > result := connection execute: (aString asWideString convertToEncoding: > 'utf-8'). would do anything if you already have a String. Philippe > It works well. I did it to test Ramon Leon's Seaside Blog but I didn't > post this because I wasn't sure it was a common problem and because of > the ugliness of the string conversion. > > I attach my two mods. > > -- > Martial > > Philippe Marschall a écrit : > | 2007/2/28, Norbert Hartl <[hidden email]>: > | >On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > | >> 2007/2/28, Norbert Hartl <[hidden email]>: > | >> > Hi, > | >> > > | >> > I ran into a encoding problem. I'm using seaside together > | >> > with Glorp. For the web server I use WAKomEncoded39. > | >> > WAKomEncoded39 converts the output to the browser to utf-8. > | >> > But on incoming requests the url escaped characters are > | >> > translated to something different. For me it appears to > | >> > be latin-1 but I've no glue why it should be that way. > | >> > I detected it because my postgresql session has client > | >> > encoding utf-8 turned on and I get an error trying to > | >> > store strings containing characters like ö. > | >> > | >> If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > | >> (new) Squeak encoding in your image which is basically non-unified > | >> unicode. For latin-1 characters this will be indistinguishable from > | >> latin-1. If your database is utf-8 you need to encode your strings to > | >> utf-8 when writing them to your database and decode your strings from > | >> utf-8 when reading from the database (only to convert it back to utf-8 > | >> when generating html). You can configure the PostgreS database driver > | >> to do this automatically for you. > | >> > | >Oh, this seems quite easy. But I didn't found anything to configure > | >in the Postgres driver. Do you have any hint? > | > | PGConnection >> class #buildDefaultFieldConverters > | TestPGConnection >> #testFieldConverter > | > | You need to register a field converter for your string types that does > | #convertFromEncoding: #utf8 > | > | Sorry that does only do the decoding and not the encoding. I guess in > | your case Glorp does the encoding. I don't know how you can customize > | the Sql generation there but it everything else fails you can change > | PGConnection >> #execute (yes, this is a hack) > | > | sql := sqlString. > | to > | sql := sqlString convertToEncoding: #utf8. > | > | Philippe > | > | P.S.: > | PGConnection >> class #buildDefaultFieldConverters > | has given us a lot of pain because Squeak doesn't have full block closures > | > | >Norbert > | > > | >_______________________________________________ > | >Seaside mailing list > | >[hidden email] > | >http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > | > > > | _______________________________________________ > | Seaside mailing list > | [hidden email] > | http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > > > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > > > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by NorbertHartl
2007/2/28, Norbert Hartl <[hidden email]>:
> On Wed, 2007-02-28 at 10:03 +0100, Philippe Marschall wrote: > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > > > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > > > Hi, > > > > > > > > > > I ran into a encoding problem. I'm using seaside together > > > > > with Glorp. For the web server I use WAKomEncoded39. > > > > > WAKomEncoded39 converts the output to the browser to utf-8. > > > > > But on incoming requests the url escaped characters are > > > > > translated to something different. For me it appears to > > > > > be latin-1 but I've no glue why it should be that way. > > > > > I detected it because my postgresql session has client > > > > > encoding utf-8 turned on and I get an error trying to > > > > > store strings containing characters like ö. > > > > > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > > > (new) Squeak encoding in your image which is basically non-unified > > > > unicode. For latin-1 characters this will be indistinguishable from > > > > latin-1. If your database is utf-8 you need to encode your strings to > > > > utf-8 when writing them to your database and decode your strings from > > > > utf-8 when reading from the database (only to convert it back to utf-8 > > > > when generating html). You can configure the PostgreS database driver > > > > to do this automatically for you. > > > > > > > Oh, this seems quite easy. But I didn't found anything to configure > > > in the Postgres driver. Do you have any hint? > > > > PGConnection >> class #buildDefaultFieldConverters > > TestPGConnection >> #testFieldConverter > > > > You need to register a field converter for your string types that does > > #convertFromEncoding: #utf8 > > > This way it is working already. I think as long as no one is touching > the string it comes as utf-8 from the database und gets encoded a > second time by WAKomEncoded39 which has no effect. > > > Sorry that does only do the decoding and not the encoding. I guess in > > your case Glorp does the encoding. I don't know how you can customize > > the Sql generation there but it everything else fails you can change > > PGConnection >> #execute (yes, this is a hack) > > > I don't think Glorp does encoding and I think it shouldn't. > Glorp should be happy with strings. If there is conversion happening > it should happen in the postgres driver (it is the only one who > could know which encoding is needed for the database). > > My strings are carried by ByteString. It seems that ByteString (got > from WAKomEncoded39) contains a bunch of bytes with any encoding ( > ok, it is the non-unified unicode, you said, and i don't know what > that means :) ). > I can convert it with convertToEncoding: to another encoding still > using ByteString. But there is no information about encoding in the > object. I think this is really dangerous. I have to look at WideString. > I'm curious how those deal with encodings they are created from. > > I think there are only two possibilities. Handle it like Java, Lisp > and convert every encoding to the internal (UCS-2) on string creation. > The other option which would be easier (i think) is to add the > character encoding information into the string. > > What do you think? fuck up in this area considering this is a 'basic' data type. Having more information about a String (what encoding, what escaping, ..) would definitely help. UCS-2 is not a "the solution" since it handles only characters in the BMP. Additionally we don't want to do Han unification. > > sql := sqlString. > > to > > sql := sqlString convertToEncoding: #utf8. > > > The hack is actually adding the conversion to > SqueakDatabaseAccessor>>basicExecuteSQLString: > > > I understand a lot more now. Thanks very much. > > Norbert > > P.S.: > > PGConnection >> class #buildDefaultFieldConverters > > has given us a lot of pain because Squeak doesn't have full block closures > > > Oh, wow, another day hearing a lot of basic things I don't have any idea > about :) What are "full" block closures? same temporary variable. If multiple of these are activated at the same time, you have a problem. See: http://bugs.impara.de/view.php?id=4636 Philippe > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
Philippe Marschall a écrit :
| 2007/2/28, Martial Boniou <[hidden email]>: | >Hi, | > | >I tought I had 'corrupted' 3.9 images but it seems to be a general issue | >for 3.9. I had made the hack as Philippe said. Actually I changed the | >Dialect class>>basicIsSqueak test so that it pass well during | >installation (to get the SmalltalkImage things and not Smalltalk); I | >added a Dialect class>>basicIsSqueak39 (true for SystemVersion number > | >7010); I subclass SqueakDatabaseAccessor to Squeak39DatabaseAccessor to | >modify the instance method #basicExecuteSQLString to say: | > | >result := connection execute: (aString asWideString convertToEncoding: | >'utf-8'). | | Do you really need to send #asWideString? It doesn't look like it | would do anything if you already have a String. Of course! That means nothing. My brain had lack of oxygen at this moment ;-) | Philippe | | >It works well. I did it to test Ramon Leon's Seaside Blog but I didn't | >post this because I wasn't sure it was a common problem and because of | >the ugliness of the string conversion. | > | >I attach my two mods. | > | >-- | >Martial _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
On Wed, 2007-02-28 at 10:03 +0100, Philippe Marschall wrote:
> 2007/2/28, Norbert Hartl <[hidden email]>: > > On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > > Hi, > > > > > > > > I ran into a encoding problem. I'm using seaside together > > > > with Glorp. For the web server I use WAKomEncoded39. > > > > WAKomEncoded39 converts the output to the browser to utf-8. > > > > But on incoming requests the url escaped characters are > > > > translated to something different. For me it appears to > > > > be latin-1 but I've no glue why it should be that way. > > > > I detected it because my postgresql session has client > > > > encoding utf-8 turned on and I get an error trying to > > > > store strings containing characters like ö. > > > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > > (new) Squeak encoding in your image which is basically non-unified > > > unicode. For latin-1 characters this will be indistinguishable from > > > latin-1. If your database is utf-8 you need to encode your strings to > > > utf-8 when writing them to your database and decode your strings from > > > utf-8 when reading from the database (only to convert it back to utf-8 > > > when generating html). You can configure the PostgreS database driver > > > to do this automatically for you. > > > > > Oh, this seems quite easy. But I didn't found anything to configure > > in the Postgres driver. Do you have any hint? > > PGConnection >> class #buildDefaultFieldConverters > TestPGConnection >> #testFieldConverter > > You need to register a field converter for your string types that does > #convertFromEncoding: #utf8 > adding #(1043 ) do: [:each| converters at: each put: [:s | s convertFromEncoding: #utf8]]. to PGConnection class>>buildDefaultFieldConverters Hmmm, I guess I talk to Yanni about this. regards, Norbert _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Philippe Marschall
Just a question.
We introduced this change in 3.9 because apparently it was important. Certainly suggested by an eminent seasider. Now I would like to know if this was correct (ie it fixed a problem) and I would like to avoid to give the impression that "they broke again something in 3.9" because I can tell you that we ***really*** payed attention. This is also because of that kind of atmosphere and regular bashing that we lost marcus. Stef > 2007/2/28, Norbert Hartl <[hidden email]>: >> Hi, >> >> I ran into a encoding problem. I'm using seaside together >> with Glorp. For the web server I use WAKomEncoded39. >> WAKomEncoded39 converts the output to the browser to utf-8. >> But on incoming requests the url escaped characters are >> translated to something different. For me it appears to >> be latin-1 but I've no glue why it should be that way. >> I detected it because my postgresql session has client >> encoding utf-8 turned on and I get an error trying to >> store strings containing characters like ö. > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > (new) Squeak encoding in your image which is basically non-unified > unicode. For latin-1 characters this will be indistinguishable from > latin-1. If your database is utf-8 you need to encode your strings to > utf-8 when writing them to your database and decode your strings from > utf-8 when reading from the database (only to convert it back to utf-8 > when generating html). You can configure the PostgreS database driver > to do this automatically for you. > > An other option is to have utf-8 strings in your image. On Squeak 3.9 > this requires WAKom and a modified version of KomHttpServer not > publicly available. This has the advantage that you don't need to do > encoding conversion it has however the disadvantage that it won't work > with the debugger, #size doesn't work and directly indexing into the > string (creating substrings) won't work too. Additionally you need to > convert you string literals to utf-8 (unless they're ascii). > > Cheers > Philippe > >> I read on the net that this has something to do with 3.9. >> Is this still true? Is there a way to make it run or is >> the only way to go back to 3.8? >> >> thanks in advance, >> >> Norbert >> >> >> _______________________________________________ >> Seaside mailing list >> [hidden email] >> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside >> > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Well it kinda depends. It is really two things, bug fixes and
semantics changes and depending on point of view the semantics changes are a bugfix. The bug fixes surely helps but the different semantics in 3.8 vs. 3.9 make it hard to support Squeak 3.8 and 3.9 at the same time. I wasn't bashing anyone, just pointing out that the different semantics require different client code in 3.8 vs. 3.9. Cheers Philippe 2007/3/1, stephane ducasse <[hidden email]>: > Just a question. > We introduced this change in 3.9 because apparently it was important. > Certainly suggested by an eminent seasider. > Now I would like to know if this was correct (ie it fixed a problem) > and I would like to avoid to give the impression that "they broke > again something in 3.9" because I can tell you that we ***really*** > payed attention. This is also because of that kind of atmosphere > and regular bashing that we lost marcus. > > Stef > > > > 2007/2/28, Norbert Hartl <[hidden email]>: > >> Hi, > >> > >> I ran into a encoding problem. I'm using seaside together > >> with Glorp. For the web server I use WAKomEncoded39. > >> WAKomEncoded39 converts the output to the browser to utf-8. > >> But on incoming requests the url escaped characters are > >> translated to something different. For me it appears to > >> be latin-1 but I've no glue why it should be that way. > >> I detected it because my postgresql session has client > >> encoding utf-8 turned on and I get an error trying to > >> store strings containing characters like ö. > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > (new) Squeak encoding in your image which is basically non-unified > > unicode. For latin-1 characters this will be indistinguishable from > > latin-1. If your database is utf-8 you need to encode your strings to > > utf-8 when writing them to your database and decode your strings from > > utf-8 when reading from the database (only to convert it back to utf-8 > > when generating html). You can configure the PostgreS database driver > > to do this automatically for you. > > > > An other option is to have utf-8 strings in your image. On Squeak 3.9 > > this requires WAKom and a modified version of KomHttpServer not > > publicly available. This has the advantage that you don't need to do > > encoding conversion it has however the disadvantage that it won't work > > with the debugger, #size doesn't work and directly indexing into the > > string (creating substrings) won't work too. Additionally you need to > > convert you string literals to utf-8 (unless they're ascii). > > > > Cheers > > Philippe > > > >> I read on the net that this has something to do with 3.9. > >> Is this still true? Is there a way to make it run or is > >> the only way to go back to 3.8? > >> > >> thanks in advance, > >> > >> Norbert > >> > >> > >> _______________________________________________ > >> Seaside mailing list > >> [hidden email] > >> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > >> > > _______________________________________________ > > Seaside mailing list > > [hidden email] > > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > > _______________________________________________ > Seaside mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside > _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
> Well it kinda depends. It is really two things, bug fixes and > semantics changes and depending on point of view the semantics changes > are a bugfix. The bug fixes surely helps but the different semantics > in 3.8 vs. 3.9 make it hard to support Squeak 3.8 and 3.9 at the same > time. > > I wasn't bashing anyone, just pointing out that the different > semantics require different client code in 3.8 vs. 3.9. I know. ;) But I wanted to know if we introduced a bug or not. String encoding is a mess. May be once we should have strings that know their business (encoding and all the rest). _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by NorbertHartl
On Wed, 2007-02-28 at 23:41 +0100, Norbert Hartl wrote:
> On Wed, 2007-02-28 at 10:03 +0100, Philippe Marschall wrote: > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > On Wed, 2007-02-28 at 00:26 +0100, Philippe Marschall wrote: > > > > 2007/2/28, Norbert Hartl <[hidden email]>: > > > > > Hi, > > > > > > > > > > I ran into a encoding problem. I'm using seaside together > > > > > with Glorp. For the web server I use WAKomEncoded39. > > > > > WAKomEncoded39 converts the output to the browser to utf-8. > > > > > But on incoming requests the url escaped characters are > > > > > translated to something different. For me it appears to > > > > > be latin-1 but I've no glue why it should be that way. > > > > > I detected it because my postgresql session has client > > > > > encoding utf-8 turned on and I get an error trying to > > > > > store strings containing characters like ö. > > > > > > > > If you run WAKomEncoded39 on Squeak 3.9 you will have strings with > > > > (new) Squeak encoding in your image which is basically non-unified > > > > unicode. For latin-1 characters this will be indistinguishable from > > > > latin-1. If your database is utf-8 you need to encode your strings to > > > > utf-8 when writing them to your database and decode your strings from > > > > utf-8 when reading from the database (only to convert it back to utf-8 > > > > when generating html). You can configure the PostgreS database driver > > > > to do this automatically for you. > > > > > > > Oh, this seems quite easy. But I didn't found anything to configure > > > in the Postgres driver. Do you have any hint? > > > > PGConnection >> class #buildDefaultFieldConverters > > TestPGConnection >> #testFieldConverter > > > > You need to register a field converter for your string types that does > > #convertFromEncoding: #utf8 > > > Yes, you are right. For everybody who wants to know. You can fix it by > adding > > #(1043 ) > do: [:each| converters at: each put: [:s | s convertFromEncoding: > #utf8]]. > > to PGConnection class>>buildDefaultFieldConverters > To match varchar and text columns add #(1043 25) do: [:each| converters at: each put: [:s | s convertFromEncoding: #utf8]]. regards, Norbert _______________________________________________ Seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Free forum by Nabble | Edit this page |