character encoding / was: (Postgres / Glorp / Kom)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

character encoding / was: (Postgres / Glorp / Kom)

Ramiro Diaz Trepat
Summarizing

Using Squeak 3.9, for example:

1.
- Start Seaside with WAKom.
- Go to the SushiStore, search for 'Ñandú' (a kind of Argentinean ostrich).
- The method WAStoreFillCart>>search: receives a properly formed ByteString that reads 'Ñandú'
- Seaside then displays the corrupt String: No items match '?amd?'

2.
- Start Seaside with WAKomEncoded39.
- Go to the SushiStore, search for 'Ñandú' .
- The method WAStoreFillCart search: receives a properly formed ByteString (not a WideString or an UTF8 formatted ByteString) that reads 'Ñandú'
- Seaside then displays the correct String: No items match Ñandú

In spite that example 2 properly displays the string, methods like #search: never seem to receive an UTF8 or WideString instance.  What you get either with WAKom or with WAKomEncoded39 are always indistinguishable instances of ByteString.
WAKomEncoded39 encodes strings before sending and after receiving to UTF8, but you don't get to "see" these UTF8 Strings. When they get to you, they are always converted to Squeak´s default encoding (which I don't know what it is yet) ?




Summarizing some of the answers I got.

Philippe
Basically informs us that the handling of UTF8 strings in KomHttpServer / Squeak 3.9 got really broken, and that sadly the fix seems not to be on the way anytime soon.  But also says that everything works in 3.8.
In spite of this affirmation, I got no concrete answers from anyone using Seaside in production (and using special characters) about which platform are they using.  In particular, I didn't hear from the rest of the Seaside core developers "We are all using Squeak 3.8" nor I have the fix for KomHttpServer for 3.9 but I will not share it :)

Norbert
Being in a very similar context than me, that is having to use a Postgres DB encoded in UTF8,  was also unable to make it work out of the box (confirming Philippe's statements) and coded a very smart work around, that he kindly shared with us.

Sebastián
Everything works for him using WAKomEncoded39.  But probably as in the SushStore examples above with WAKomEncoded39.  That is, not receiving UTF8 or WideStrings.


_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Ramiro Diaz Trepat
Just for the record, I have been trying Norbert's Glorp hack for UTF8
and in conjunction with WAKomEncoded39 works really well.  It all
passes through without you having to do anything special in you object
model.

That was really a cool hack Norbert.
I don't know who is maintaining Glorp in Squeak, but probably we could
add the functionality to make it work more elegantly to whatever the
database encoding is.
Upon connection, we could set a default TextConverter in
SqueakDatabaseAccessor that we can easily read (at least from
Postgres) with the following statement:

select pg_encoding_to_char(encoding) from pg_database where datname='XXXXX';

Then if Squeak has the appropiate TextConverter, like it happens with
UTF8, we can automatically set it up, and the following line:

SqueakDatabaseAccessor>>basicExecuteSQLString: aString
     ....
     result := connection execute: (aString convertToEncoding: #utf8).
     ....

could be something like

     result := connection execute: (self encode: aString).

or something like it.
I suppose the other way around, converting strings from the database,
would be a bit harder to implement.  At least more codes would have to
be added in
PGConnection class>>buildDefaultFieldConverters
(I really don't have a clue were those codes came from) :)

Thank's again Norbert.


r.


On 5/14/07, Ramiro Diaz Trepat <[hidden email]> wrote:

> Summarizing
>
> Using Squeak 3.9, for example:
>
> 1.
> - Start Seaside with WAKom.
> - Go to the SushiStore, search for 'Ñandú' (a kind of Argentinean ostrich).
> - The method WAStoreFillCart>>search: receives a properly formed ByteString
> that reads 'Ñandú'
> - Seaside then displays the corrupt String: No items match '?amd?'
>
> 2.
> - Start Seaside with WAKomEncoded39.
>  - Go to the SushiStore, search for 'Ñandú' .
>  - The method WAStoreFillCart search: receives a properly formed ByteString
> (not a WideString or an UTF8 formatted ByteString) that reads 'Ñandú'
>  - Seaside then displays the correct String: No items match Ñandú
>
> In spite that example 2 properly displays the string, methods like #search:
> never seem to receive an UTF8 or WideString instance.  What you get either
> with WAKom or with WAKomEncoded39 are always indistinguishable instances of
> ByteString.
> WAKomEncoded39 encodes strings before sending and after receiving to UTF8,
> but you don't get to "see" these UTF8 Strings. When they get to you, they
> are always converted to Squeak´s default encoding (which I don't know what
> it is yet) ?
>
>
>
>
> Summarizing some of the answers I got.
>
> Philippe
> Basically informs us that the handling of UTF8 strings in KomHttpServer /
> Squeak 3.9 got really broken, and that sadly the fix seems not to be on the
> way anytime soon.  But also says that everything works in 3.8.
> In spite of this affirmation, I got no concrete answers from anyone using
> Seaside in production (and using special characters) about which platform
> are they using.  In particular, I didn't hear from the rest of the Seaside
> core developers "We are all using Squeak 3.8" nor I have the fix for
> KomHttpServer for 3.9 but I will not share it :)
>
> Norbert
> Being in a very similar context than me, that is having to use a Postgres DB
> encoded in UTF8,  was also unable to make it work out of the box (confirming
> Philippe's statements) and coded a very smart work around, that he kindly
> shared with us.
>
> Sebastián
> Everything works for him using WAKomEncoded39.  But probably as in the
> SushStore examples above with WAKomEncoded39.  That is, not receiving UTF8
> or WideStrings.
>
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Lukas Renggli
In reply to this post by Ramiro Diaz Trepat
> Philippe
> Basically informs us that the handling of UTF8 strings in KomHttpServer /
> Squeak 3.9 got really broken, and that sadly the fix seems not to be on the
> way anytime soon.  But also says that everything works in 3.8.
> In spite of this affirmation, I got no concrete answers from anyone using
> Seaside in production (and using special characters) about which platform
> are they using.  In particular, I didn't hear from the rest of the Seaside
> core developers "We are all using Squeak 3.8" nor I have the fix for
> KomHttpServer for 3.9 but I will not share it :)

No, all the Seaside applications I am working on run on top of 3.9 and
most of them include quite a bunch of strange (German, French, Korean,
Japanese) characters. To get it working I follow these principles:

1. In Kom Philippe patched: #decodeUrlEncodedFrom:multipleValues: and
#initialStatusString: to send unescapePercentsWithTextEncoding:
'latin-1' instead of just unescapePercents.

2. All XHTML pages are sent using UTF-8, avoiding a slow and error
prone transformation.

3. All the data inside the model is UTF-8, this means some string
operations do not work (e.g. #size might return something wrong) but
that doesn't matter much in my case.

4. All literal strings have to be converted to UTF-8 by sending
#latinToUtf8. For exisiting application this can be easily patched
using the RewriteEditor.

All in all it is quite ugly, but it works fast and reliable. I don't
have enough knowledge on character encodings to provide a better
solution. Unfortunately Kom is a dead project and nobody is willing to
fix it.

Hope this helps,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Philippe Marschall
In reply to this post by Ramiro Diaz Trepat
2007/5/15, Ramiro Diaz Trepat <[hidden email]>:

> Summarizing
>
> Using Squeak 3.9, for example:
>
> 1.
> - Start Seaside with WAKom.
> - Go to the SushiStore, search for 'Ñandú' (a kind of Argentinean ostrich).
> - The method WAStoreFillCart>>search: receives a properly formed ByteString
> that reads 'Ñandú'
> - Seaside then displays the corrupt String: No items match '?amd?'
>
> 2.
> - Start Seaside with WAKomEncoded39.
> - Go to the SushiStore, search for 'Ñandú' .
> - The method WAStoreFillCart search: receives a properly formed ByteString
> (not a WideString or an UTF8 formatted ByteString) that reads 'Ñandú'
Because the character codes are smaller than 256. Sorry I wasn't
explicit enought about this, you get a WideString as soon as you have
a character with a code point of 256 or bigger. For example Korean
form the uft8 sampler [1]. This is btw all explained in the commoents
of (Wide)Character and (Wide)String. Its correct Squeak encoding,
displays more or less correctly in the inspector (would probably be
much nicer with FreeType) and #size answers 5. This would all not be
the case if you had utf8 strings. The do not display correctly (for
non-ascii strings) and their #size is too big (for non-ascii strings).

So this test works. It was just my fault of not explaining Squeak
encoding of Strings in all details.

[1] http://www.columbia.edu/kermit/utf8.html

Cheers
Philippe

> - Seaside then displays the correct String: No items match Ñandú
>
> In spite that example 2 properly displays the string, methods like #search:
> never seem to receive an UTF8 or WideString instance.  What you get either
> with WAKom or with WAKomEncoded39 are always indistinguishable instances of
> ByteString.
> WAKomEncoded39 encodes strings before sending and after receiving to UTF8,
> but you don't get to "see" these UTF8 Strings. When they get to you, they
> are always converted to Squeak´s default encoding (which I don't know what
> it is yet) ?
>
>
>
>
> Summarizing some of the answers I got.
>
> Philippe
> Basically informs us that the handling of UTF8 strings in KomHttpServer /
> Squeak 3.9 got really broken, and that sadly the fix seems not to be on the
> way anytime soon.  But also says that everything works in 3.8.
> In spite of this affirmation, I got no concrete answers from anyone using
> Seaside in production (and using special characters) about which platform
> are they using.  In particular, I didn't hear from the rest of the Seaside
> core developers "We are all using Squeak 3.8" nor I have the fix for
> KomHttpServer for 3.9 but I will not share it :)
>
> Norbert
> Being in a very similar context than me, that is having to use a Postgres DB
> encoded in UTF8,  was also unable to make it work out of the box (confirming
> Philippe's statements) and coded a very smart work around, that he kindly
> shared with us.
>
> Sebastián
> Everything works for him using WAKomEncoded39.  But probably as in the
> SushStore examples above with WAKomEncoded39.  That is, not receiving UTF8
> or WideStrings.
>
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Jason Johnson-3
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
> Unfortunately Kom is a dead project and nobody is willing to
> fix it.

Kom is dead?  Is it part of Commanche (which I thought was not dead), or
how does that work?
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Philippe Marschall
Read the SqueakMap entry:
This package has been deprecated.  The latest version of Comanche is
registered as a SqueakMap package called KomHttpServer.

Cheers
Philippe

2007/5/20, Jason Johnson <[hidden email]>:

> Lukas Renggli wrote:
> > Unfortunately Kom is a dead project and nobody is willing to
> > fix it.
>
> Kom is dead?  Is it part of Commanche (which I thought was not dead), or
> how does that work?
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Jason Johnson-3
Philippe Marschall wrote:
> Read the SqueakMap entry:
> This package has been deprecated.  The latest version of Comanche is
> registered as a SqueakMap package called KomHttpServer.
>
> Cheers
> Philippe
>

Ok thanks.  That project is still active right?  Kom that Seaside uses
is related to that package isn't it?
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: character encoding / was: (Postgres / Glorp / Kom)

Philippe Marschall
2007/5/20, Jason Johnson <[hidden email]>:

> Philippe Marschall wrote:
> > Read the SqueakMap entry:
> > This package has been deprecated.  The latest version of Comanche is
> > registered as a SqueakMap package called KomHttpServer.
> >
> > Cheers
> > Philippe
> >
>
> Ok thanks.  That project is still active right?

If you can't integrate a one method patch in three months then the
project is de facto dead.

> Kom that Seaside uses
> is related to that package isn't it?

Yes, it's the http server Seaside uses on Squeak.

Cheers
Philippe

> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside