[CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9

Udo Schneider
All,

I'm currently working on a Seaside App which will go "live" next monday.
Currently I'm using Damiens 3.10 based sq3.10.2-7179web09.02.1 image.

Yesterday I had an horror experience when a Mac Betatester told me that
the whole app got screwed up when he entered a euro sign (€) in the app.
All following responses are totally garbage. Other characters like
german umlauts (or even the euro sign from a windows client!) where fine
and I tested them before - although they appear as (UTF8?) giberisch in
the image.

This triggered some memory and I changed from WAKom to WAKomEncoded
which obviously "solved" the problem.

However this morning (after reading the class comments) I'm a bit
confused about the whole issue and I have two questions:

1) What's the differnece between WAKom, WAKomEncoded and WAKomEncoded39?
I read the class comments but do not really understand the issue. In
addition they are dealing with 3.8 vs. 3.9 and I'm on 3.10... If
somebody advises me to deploy on 3.9 I'll do - I just need a working
configuration.

2) I assume that the browser side always uses UTF-8. But in which format
does the image "see" the input?

CU,

Udo

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9

Lukas Renggli
> 1) What's the differnece between WAKom, WAKomEncoded and WAKomEncoded39? I
> read the class comments but do not really understand the issue. In addition
> they are dealing with 3.8 vs. 3.9 and I'm on 3.10... If somebody advises me
> to deploy on 3.9 I'll do - I just need a working configuration.

WAKom does no conversion at all. So if you assume that the browser
side uses UTF-8 then you end up with UTF-8 strings inside your image.
However since the internal encoding of Squeak is *not UTF-8* many
strings will appear scrambled when looking at them using an inspector.
It works well though as long as you do not perform heavy string
scrambling, because the strings are sent back as is. If you have
string literals with foreign characters in your application code you
need to make sure that these are valid UTF-8 as well. This is very
efficient, but you need to be aware of the implications.

WAKomEncoded converts incoming data from UTF-8 to the internal
encoding of Squeak, as well it converts outgoing data from the
internal encoding to UTF-8. Like this, all strings are valid from
within the image, and common string operations like #=, #size and
#copyFrom:to: work like you would expect. If you use an external
database that expects UTF-8 you need to convert again.
Since there all incoming and outgoing data needs to be converted, this
approach is slightly less efficient.

WAKomEncoded39 is for compatibility with strange versions of Kom and
Squeak. You should not need to use it.

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9

Udo Schneider
Lukas,

thnanks for your detailed explanation.

> WAKom does no conversion at all. So if you assume that the browser
> side uses UTF-8 then you end up with UTF-8 strings inside your image.
[...]
> need to make sure that these are valid UTF-8 as well. This is very
> efficient, but you need to be aware of the implications.
This was the effect I was seeing before. I didn't care about it because
I thought that as long as the stuff that gets in gets out is fine.
However I learned my lesson with Safari/Mac/Euro sign which screwed up
everything.

> WAKomEncoded converts incoming data from UTF-8 to the internal
> encoding of Squeak, as well it converts outgoing data from the
> internal encoding to UTF-8. Like this, all strings are valid from
> within the image, and common string operations like #=, #size and
> #copyFrom:to: work like you would expect. If you use an external
> database that expects UTF-8 you need to convert again.
The current DB backend is Magma - so I assume I can ignore UTF-8
conversion for the DB for quite some time :-)

> WAKomEncoded39 is for compatibility with strange versions of Kom and
> Squeak. You should not need to use it.
Good to hear. I'm currently running the tests against WAEncodedKom and
everything looks great.

Thank you very much for your help.

CU,

Udo

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9 - utf8 internal encoding?

michal-list
In reply to this post by Lukas Renggli

Lukas wrote:
> However since the internal encoding of Squeak is *not UTF-8* many
> strings will appear scrambled when looking at them using an inspector.
> It works well though as long as you do not perform heavy string
> scrambling, because the strings are sent back as is. If you have
> string literals with foreign characters in your application code you
> need to make sure that these are valid UTF-8 as well. This is very
> efficient, but you need to be aware of the implications.

What happens if squeak is made to use UTF-8 internally? Ie the unix
man page and various postings on squeak-dev/newbies suggest that a
recent squeak VM/image combo started with '-encoding utf8' should work
well as a utf8 image (provided the correct font is supplied, etc).

In such a case, should plain WAKom be used? With no issue wrt to
string operations like #=, #size and #copyFrom:to: ? Or is there still
a need to convert from the incoming utf-8 and squeak's WideString (and
vice versa)?

> WAKomEncoded converts incoming data from UTF-8 to the internal
> encoding of Squeak, as well it converts outgoing data from the
> internal encoding to UTF-8.

The code and comments in #utf8ToSqueak: suggest that this is only true
if squeak uses latin-1 internally (which is does by the default), right?

> Since there all incoming and outgoing data needs to be converted,
> this approach is slightly less efficient.

Has anybody quantified the inefficiency? I'm starting a clean slate
seaside server, so I'd like to pick the optimal configuration...

Michal
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9 - utf8 internal encoding?

Philippe Marschall
2009/2/24 Michal <[hidden email]>:

>
> Lukas wrote:
>> However since the internal encoding of Squeak is *not UTF-8* many
>> strings will appear scrambled when looking at them using an inspector.
>> It works well though as long as you do not perform heavy string
>> scrambling, because the strings are sent back as is. If you have
>> string literals with foreign characters in your application code you
>> need to make sure that these are valid UTF-8 as well. This is very
>> efficient, but you need to be aware of the implications.
>
> What happens if squeak is made to use UTF-8 internally?

String and Character loose all semantics. For example #size will
answer the number of bytes, not the number of characters. #at: will
answer the byte at the given index, not the Character at the given
index. For example ä will be represented as (String with: (Character
value: 195) with: (Character value: 164)) 'À'.

> Ie the unix
> man page and various postings on squeak-dev/newbies suggest that a
> recent squeak VM/image combo started with '-encoding utf8' should work
> well as a utf8 image (provided the correct font is supplied, etc).

That's unrelated.

> In such a case, should plain WAKom be used?

If you're cool with the behavior described above, then use WAKom.

> With no issue wrt to
> string operations like #=, #size and #copyFrom:to: ?

#= has limited usability due to missing Unicode normalization. It's
actually a bit more useful because for WideStrings it would take the
leadingChar into account with is more or less random. #size and
#copyFrom:to: answer "random" data unless you know the ins and outs of
utf-8 and Unicode.

> Or is there still
> a need to convert from the incoming utf-8 and squeak's WideString (and
> vice versa)?

Yes, utf-8 conversion won't happen automatically. If you want it, you
need to do it yourself.

>> WAKomEncoded converts incoming data from UTF-8 to the internal
>> encoding of Squeak, as well it converts outgoing data from the
>> internal encoding to UTF-8.
>
> The code and comments in #utf8ToSqueak: suggest that this is only true
> if squeak uses latin-1 internally (which is does by the default), right?

Nope, it's required for non-ASCII input.

>> Since there all incoming and outgoing data needs to be converted,
>> this approach is slightly less efficient.
>
> Has anybody quantified the inefficiency?

Not that I'm aware of.

> I'm starting a clean slate
> seaside server, so I'd like to pick the optimal configuration...

What do you want to optimize for?

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9 - utf8 internal encoding?

michal-list


>> What happens if squeak is made to use UTF-8 internally?
 
> String and Character loose all semantics.

That's disappointing! But thanks Philippe for the quick and helpful
answer.

>> I'm starting a clean slate seaside server, so I'd like to pick the
>> optimal configuration...
 
> What do you want to optimize for?

I was hoping for a clean utf-8 image, and hence to be able to get rid
of "historical cruft" (anything related to macroman and iso-8859-1)
and at the same time gain some speed (no conversion needed on input /
output while preserving #findString: , #copyFrom:to: and friends).

Michal
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: [CONFUSED]: WAKom, WAKomEncoded or WAKomEncoded 3.9 - utf8 internal encoding?

Philippe Marschall
2009/3/6 Michal <[hidden email]>:

>
>
>>> What happens if squeak is made to use UTF-8 internally?
>
>> String and Character loose all semantics.
>
> That's disappointing! But thanks Philippe for the quick and helpful
> answer.
>
>>> I'm starting a clean slate seaside server, so I'd like to pick the
>>> optimal configuration...
>
>> What do you want to optimize for?
>
> I was hoping for a clean utf-8 image, and hence to be able to get rid
> of "historical cruft" (anything related to macroman and iso-8859-1)
> and at the same time gain some speed (no conversion needed on input /
> output while preserving #findString: , #copyFrom:to: and friends).

In this case I would go for utf-8 as an external encoding and use
WAKomEncoded. That will give you at least better semantics.

You'll likely loose some speed but if you're lucky it won't be a
bottleneck and you won't notice it. You'll gain other historical cruft
(leadingChar). You might run into some WideString bugs. Some of them
have been fixed in Squeak 3.10 and likely will be fixed in Pharo as
well [1]. Should you chose to run Squeak 3.10 be aware that Seaside on
Squeak 3.10 doesn't receive the same developer attention and testing
as Seaside on Squeak 3.9 and Pharo so there might be hidden Seaside
bugs there.

Wow that was quite a reassuring post ;-)

 [1] http://code.google.com/p/pharo/issues/detail?id=524

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside