Umlaut problems in 1.0-beta.3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Umlaut problems in 1.0-beta.3

Tobias Pape
Hi,

I'm experiencing strange behaviour of my Topaz gems
(typical apache/fastcgi/seaside setup).
When using my Squeaksource2, and entering an umlaut (äöü)
somewhere, the Queried gem goes nuts, 100% CPU usage.
Does not output any information(i.e., logs).

Anybody experienced this?

So Long,
        -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

SeanTAllen
Where do you enter an umlat and experience this?

Can you give an example?


On Fri, May 7, 2010 at 8:27 AM, Tobias Pape <[hidden email]> wrote:

> Hi,
>
> I'm experiencing strange behaviour of my Topaz gems
> (typical apache/fastcgi/seaside setup).
> When using my Squeaksource2, and entering an umlaut (äöü)
> somewhere, the Queried gem goes nuts, 100% CPU usage.
> Does not output any information(i.e., logs).
>
> Anybody experienced this?
>
> So Long,
>        -Tobias
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Tobias Pape
Ok.
Go to our Squeaksource (ill mail the url privately on request)
click on users. enter something into the search bar, hit enter
and the Site does not return.

So Long,
        -Tobias

Am 2010-05-07 um 15:23 schrieb Sean Allen:

> Where do you enter an umlat and experience this?
>
> Can you give an example?
>
>
> On Fri, May 7, 2010 at 8:27 AM, Tobias Pape <[hidden email]> wrote:
>> Hi,
>>
>> I'm experiencing strange behaviour of my Topaz gems
>> (typical apache/fastcgi/seaside setup).
>> When using my Squeaksource2, and entering an umlaut (äöü)
>> somewhere, the Queried gem goes nuts, 100% CPU usage.
>> Does not output any information(i.e., logs).
>>
>> Anybody experienced this?
>>
>> So Long,
>>       -Tobias
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

SeanTAllen
Send me that url.

On Fri, May 7, 2010 at 9:59 AM, Tobias Pape <[hidden email]> wrote:

> Ok.
> Go to our Squeaksource (ill mail the url privately on request)
> click on users. enter something into the search bar, hit enter
> and the Site does not return.
>
> So Long,
>        -Tobias
>
> Am 2010-05-07 um 15:23 schrieb Sean Allen:
>
>> Where do you enter an umlat and experience this?
>>
>> Can you give an example?
>>
>>
>> On Fri, May 7, 2010 at 8:27 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi,
>>>
>>> I'm experiencing strange behaviour of my Topaz gems
>>> (typical apache/fastcgi/seaside setup).
>>> When using my Squeaksource2, and entering an umlaut (äöü)
>>> somewhere, the Queried gem goes nuts, 100% CPU usage.
>>> Does not output any information(i.e., logs).
>>>
>>> Anybody experienced this?
>>>
>>> So Long,
>>>       -Tobias
>>>
>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

SeanTAllen
In reply to this post by Tobias Pape
Send me that url.

On Fri, May 7, 2010 at 9:59 AM, Tobias Pape <[hidden email]> wrote:

> Ok.
> Go to our Squeaksource (ill mail the url privately on request)
> click on users. enter something into the search bar, hit enter
> and the Site does not return.
>
> So Long,
>        -Tobias
>
> Am 2010-05-07 um 15:23 schrieb Sean Allen:
>
>> Where do you enter an umlat and experience this?
>>
>> Can you give an example?
>>
>>
>> On Fri, May 7, 2010 at 8:27 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi,
>>>
>>> I'm experiencing strange behaviour of my Topaz gems
>>> (typical apache/fastcgi/seaside setup).
>>> When using my Squeaksource2, and entering an umlaut (äöü)
>>> somewhere, the Queried gem goes nuts, 100% CPU usage.
>>> Does not output any information(i.e., logs).
>>>
>>> Anybody experienced this?
>>>
>>> So Long,
>>>       -Tobias
>>>
>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
In reply to this post by Tobias Pape
Tobias,

You've probably run into a bug in decodeUTF8. Norbert reported this problem here:

  http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html

The decode is done in a primitive, so we'd need to probably ship a new version of the server with a fix and that would most likely be piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x

Besides the primitive implementation of of decodeUTF8, there is a Smalltalk implementation of a UTF8 decoder, but I get an error from that as well:

  UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber) asString)

So there is something fishy going on and at this point, I'm not sure.

Dale

----- "Tobias Pape" <[hidden email]> wrote:

| Hi,
|
| I'm experiencing strange behaviour of my Topaz gems
| (typical apache/fastcgi/seaside setup).
| When using my Squeaksource2, and entering an umlaut (äöü)
| somewhere, the Queried gem goes nuts, 100% CPU usage.
| Does not output any information(i.e., logs).
|
| Anybody experienced this?
|
| So Long,
| -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
Norbert and Tobias,

I've created Issue 109 for this issue: http://code.google.com/p/glassdb/issues/detail?id=109

If either one of you happens to come up with a smalltalk-based conversion solution let me know, since the resolution will probably also apply to the primitive ...

Dale

----- "Dale Henrichs" <[hidden email]> wrote:

| Tobias,
|
| You've probably run into a bug in decodeUTF8. Norbert reported this
| problem here:
|
|  
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
|
| The decode is done in a primitive, so we'd need to probably ship a new
| version of the server with a fix and that would most likely be
| piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
|
| Besides the primitive implementation of of decodeUTF8, there is a
| Smalltalk implementation of a UTF8 decoder, but I get an error from
| that as well:
|
|   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
| copyFrom: 2 to: 3)) asNumber) asString)
|
| So there is something fishy going on and at this point, I'm not sure.
|
| Dale
|
| ----- "Tobias Pape" <[hidden email]> wrote:
|
| | Hi,
| |
| | I'm experiencing strange behaviour of my Topaz gems
| | (typical apache/fastcgi/seaside setup).
| | When using my Squeaksource2, and entering an umlaut (äöü)
| | somewhere, the Queried gem goes nuts, 100% CPU usage.
| | Does not output any information(i.e., logs).
| |
| | Anybody experienced this?
| |
| | So Long,
| | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
Norbert and Tobias,

Okay, the problem is that we are not erroring out on invalid UTF8. the expression:

  ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber) asString)

produces an ASCII string whose value is > 256, but the ASCII is _not_ encoded in utf8 at this point. The following expression produces a correctly encoded UTF8 string:

  (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber) asString) encodeAsUTF8

And the followin expression correctly decodes the UTF8:
 
  (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber) asString) encodeAsUTF8) decodeFromUTF8

So the bug is that we go into an infinite loop trying to decode an invalid UTF8 string.


Given this info, we should be able to fix the logic in FastCGI (and elsewhere) to correctly handle the characters...

Dale

----- "Dale Henrichs" <[hidden email]> wrote:

| Norbert and Tobias,
|
| I've created Issue 109 for this issue:
| http://code.google.com/p/glassdb/issues/detail?id=109
|
| If either one of you happens to come up with a smalltalk-based
| conversion solution let me know, since the resolution will probably
| also apply to the primitive ...
|
| Dale
|
| ----- "Dale Henrichs" <[hidden email]> wrote:
|
| | Tobias,
| |
| | You've probably run into a bug in decodeUTF8. Norbert reported this
| | problem here:
| |
| |  
| |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| |
| | The decode is done in a primitive, so we'd need to probably ship a
| new
| | version of the server with a fix and that would most likely be
| | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
| |
| | Besides the primitive implementation of of decodeUTF8, there is a
| | Smalltalk implementation of a UTF8 decoder, but I get an error from
| | that as well:
| |
| |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
| | copyFrom: 2 to: 3)) asNumber) asString)
| |
| | So there is something fishy going on and at this point, I'm not
| sure.
| |
| | Dale
| |
| | ----- "Tobias Pape" <[hidden email]> wrote:
| |
| | | Hi,
| | |
| | | I'm experiencing strange behaviour of my Topaz gems
| | | (typical apache/fastcgi/seaside setup).
| | | When using my Squeaksource2, and entering an umlaut (äöü)
| | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| | | Does not output any information(i.e., logs).
| | |
| | | Anybody experienced this?
| | |
| | | So Long,
| | | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
Here's the latest info on Issue 109...I mention failing tests below, so I'm going to spend a little more time validating my assumption that they are incorrect before committing my changes ... I also haven't made the corresponding changes to the Hyper code, nor have I checked with the Seaside guys ... however, Given that these changes were _required_ to get Seaside2.8 and SqueakSource to handle ü  in the URLS and other fields for SqueakSource, I assume that at a minimum I'm on the right track:)

----

There are actually several problems that are involved here ... To this point I have
used SqueakSource on GLASS to do my investigation testing. by creating a project with
an ü in it's name, text and title, you seed ü in most (if not all) of the right
places.

It turns out that there are problems in SqueakSource, Seaside and FastCGI. Here are
my findings:

  1. SSSession>>charSet is wired to use the iso-8859-1 charSet which explains the
     first order problems using an ü anywhere in SqueakSource
  2. Norbert's suggestion to decode from UTF8 in FSSeasidehandler>>decodeString:
     is good one, but one must also:
       - add #decodeString calls to FSSeasidehandler>>unwrapHeaders:
       - remove the #decodeUrl: call in FSSeasidehandler>>unwrapFields:
       - encode stdin from the responder in FSSeasidehandler>>fieldsFromBody:
  3. Finally, I found that Seaside was incorrectly handling redirect urls  (in
     WAResponse) and was incorrectly encoding urls in WAUrlEncoder. In both cases
     the input string needs to be encoded into UTF8 _before_ going through the
     HTML encoding.

If you see an url that contains ü encoded as %FC, then you know that the url was
encoded _before_ the string was encoded in UTF8. The correct HTML encoding for ü when
the string is converted to UTF8 beforehand is %C3%BC.

With the changes that I've made 4 of the WAEncoderTests are failing but each of them
is encoding into an URL with first encoding into UTF8...

Dale
----- "Dale Henrichs" <[hidden email]> wrote:

| Norbert and Tobias,
|
| Okay, the problem is that we are not erroring out on invalid UTF8. the
| expression:
|
|   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber)
| asString)
|
| produces an ASCII string whose value is > 256, but the ASCII is _not_
| encoded in utf8 at this point. The following expression produces a
| correctly encoded UTF8 string:
|
|   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3)) asNumber)
| asString) encodeAsUTF8
|
| And the followin expression correctly decodes the UTF8:
|  
|   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| asNumber) asString) encodeAsUTF8) decodeFromUTF8
|
| So the bug is that we go into an infinite loop trying to decode an
| invalid UTF8 string.
|
|
| Given this info, we should be able to fix the logic in FastCGI (and
| elsewhere) to correctly handle the characters...
|
| Dale
|
| ----- "Dale Henrichs" <[hidden email]> wrote:
|
| | Norbert and Tobias,
| |
| | I've created Issue 109 for this issue:
| | http://code.google.com/p/glassdb/issues/detail?id=109
| |
| | If either one of you happens to come up with a smalltalk-based
| | conversion solution let me know, since the resolution will probably
| | also apply to the primitive ...
| |
| | Dale
| |
| | ----- "Dale Henrichs" <[hidden email]> wrote:
| |
| | | Tobias,
| | |
| | | You've probably run into a bug in decodeUTF8. Norbert reported
| this
| | | problem here:
| | |
| | |  
| | |
| |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| | |
| | | The decode is done in a primitive, so we'd need to probably ship
| a
| | new
| | | version of the server with a fix and that would most likely be
| | | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
| | |
| | | Besides the primitive implementation of of decodeUTF8, there is a
| | | Smalltalk implementation of a UTF8 decoder, but I get an error
| from
| | | that as well:
| | |
| | |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
| | | copyFrom: 2 to: 3)) asNumber) asString)
| | |
| | | So there is something fishy going on and at this point, I'm not
| | sure.
| | |
| | | Dale
| | |
| | | ----- "Tobias Pape" <[hidden email]> wrote:
| | |
| | | | Hi,
| | | |
| | | | I'm experiencing strange behaviour of my Topaz gems
| | | | (typical apache/fastcgi/seaside setup).
| | | | When using my Squeaksource2, and entering an umlaut (äöü)
| | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| | | | Does not output any information(i.e., logs).
| | | |
| | | | Anybody experienced this?
| | | |
| | | | So Long,
| | | | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
Here are the packages you can load to test out the fixes ... The fix for Issue 109 will be part of 1.0-beta.8 ....

Dale

Name: SqueakSource.gemstone-DaleHenrichs.1103
Author: DaleHenrichs
Time: 05/10/10, 15:49:49
UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
Ancestors: SqueakSource.gemstone-DaleHenrichs.1102

Name: Seaside2.8g1-DaleHenrichs.631
Author: DaleHenrichs
Time: 05/10/10, 15:43:09
UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
Ancestors: Seaside2.8g1-jgf.630

Name: HyperSeaside-DaleHenrichs.6
Author: DaleHenrichs
Time: 05/10/10, 15:53:22
UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
Ancestors: HyperSeaside-dkh.5

Name: FastCGISeaside-DaleHenrichs.51
Author: DaleHenrichs
Time: 05/10/10, 15:55:16
UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
Ancestors: FastCGISeaside-jgf.50

----- "Dale Henrichs" <[hidden email]> wrote:

| Here's the latest info on Issue 109...I mention failing tests below,
| so I'm going to spend a little more time validating my assumption that
| they are incorrect before committing my changes ... I also haven't
| made the corresponding changes to the Hyper code, nor have I checked
| with the Seaside guys ... however, Given that these changes were
| _required_ to get Seaside2.8 and SqueakSource to handle ü  in the URLS
| and other fields for SqueakSource, I assume that at a minimum I'm on
| the right track:)
|
| ----
|
| There are actually several problems that are involved here ... To this
| point I have
| used SqueakSource on GLASS to do my investigation testing. by creating
| a project with
| an ü in it's name, text and title, you seed ü in most (if not all) of
| the right
| places.
|
| It turns out that there are problems in SqueakSource, Seaside and
| FastCGI. Here are
| my findings:
|
|   1. SSSession>>charSet is wired to use the iso-8859-1 charSet which
| explains the
|      first order problems using an ü anywhere in SqueakSource
|   2. Norbert's suggestion to decode from UTF8 in
| FSSeasidehandler>>decodeString:
|      is good one, but one must also:
|        - add #decodeString calls to FSSeasidehandler>>unwrapHeaders:
|        - remove the #decodeUrl: call in
| FSSeasidehandler>>unwrapFields:
|        - encode stdin from the responder in
| FSSeasidehandler>>fieldsFromBody:
|   3. Finally, I found that Seaside was incorrectly handling redirect
| urls  (in
|      WAResponse) and was incorrectly encoding urls in WAUrlEncoder. In
| both cases
|      the input string needs to be encoded into UTF8 _before_ going
| through the
|      HTML encoding.
|
| If you see an url that contains ü encoded as %FC, then you know that
| the url was
| encoded _before_ the string was encoded in UTF8. The correct HTML
| encoding for ü when
| the string is converted to UTF8 beforehand is %C3%BC.
|
| With the changes that I've made 4 of the WAEncoderTests are failing
| but each of them
| is encoding into an URL with first encoding into UTF8...
|
| Dale
| ----- "Dale Henrichs" <[hidden email]> wrote:
|
| | Norbert and Tobias,
| |
| | Okay, the problem is that we are not erroring out on invalid UTF8.
| the
| | expression:
| |
| |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| asNumber)
| | asString)
| |
| | produces an ASCII string whose value is > 256, but the ASCII is
| _not_
| | encoded in utf8 at this point. The following expression produces a
| | correctly encoded UTF8 string:
| |
| |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| asNumber)
| | asString) encodeAsUTF8
| |
| | And the followin expression correctly decodes the UTF8:
| |  
| |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| | asNumber) asString) encodeAsUTF8) decodeFromUTF8
| |
| | So the bug is that we go into an infinite loop trying to decode an
| | invalid UTF8 string.
| |
| |
| | Given this info, we should be able to fix the logic in FastCGI (and
| | elsewhere) to correctly handle the characters...
| |
| | Dale
| |
| | ----- "Dale Henrichs" <[hidden email]> wrote:
| |
| | | Norbert and Tobias,
| | |
| | | I've created Issue 109 for this issue:
| | | http://code.google.com/p/glassdb/issues/detail?id=109
| | |
| | | If either one of you happens to come up with a smalltalk-based
| | | conversion solution let me know, since the resolution will
| probably
| | | also apply to the primitive ...
| | |
| | | Dale
| | |
| | | ----- "Dale Henrichs" <[hidden email]> wrote:
| | |
| | | | Tobias,
| | | |
| | | | You've probably run into a bug in decodeUTF8. Norbert reported
| | this
| | | | problem here:
| | | |
| | | |  
| | | |
| | |
| |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| | | |
| | | | The decode is done in a primitive, so we'd need to probably
| ship
| | a
| | | new
| | | | version of the server with a fix and that would most likely be
| | | | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
| | | |
| | | | Besides the primitive implementation of of decodeUTF8, there is
| a
| | | | Smalltalk implementation of a UTF8 decoder, but I get an error
| | from
| | | | that as well:
| | | |
| | | |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
| | | | copyFrom: 2 to: 3)) asNumber) asString)
| | | |
| | | | So there is something fishy going on and at this point, I'm not
| | | sure.
| | | |
| | | | Dale
| | | |
| | | | ----- "Tobias Pape" <[hidden email]> wrote:
| | | |
| | | | | Hi,
| | | | |
| | | | | I'm experiencing strange behaviour of my Topaz gems
| | | | | (typical apache/fastcgi/seaside setup).
| | | | | When using my Squeaksource2, and entering an umlaut (äöü)
| | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| | | | | Does not output any information(i.e., logs).
| | | | |
| | | | | Anybody experienced this?
| | | | |
| | | | | So Long,
| | | | | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Tobias Pape
Hey Dale,

Thank you very much for your effort.
I'll give it a shot today.

So Long,
        -Tobias
Am 2010-05-11 um 01:08 schrieb Dale Henrichs:

> Here are the packages you can load to test out the fixes ... The fix for Issue 109 will be part of 1.0-beta.8 ....
>
> Dale
>
> Name: SqueakSource.gemstone-DaleHenrichs.1103
> Author: DaleHenrichs
> Time: 05/10/10, 15:49:49
> UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
> Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
>
> Name: Seaside2.8g1-DaleHenrichs.631
> Author: DaleHenrichs
> Time: 05/10/10, 15:43:09
> UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
> Ancestors: Seaside2.8g1-jgf.630
>
> Name: HyperSeaside-DaleHenrichs.6
> Author: DaleHenrichs
> Time: 05/10/10, 15:53:22
> UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
> Ancestors: HyperSeaside-dkh.5
>
> Name: FastCGISeaside-DaleHenrichs.51
> Author: DaleHenrichs
> Time: 05/10/10, 15:55:16
> UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
> Ancestors: FastCGISeaside-jgf.50
>
> ----- "Dale Henrichs" <[hidden email]> wrote:
>
> | Here's the latest info on Issue 109...I mention failing tests below,
> | so I'm going to spend a little more time validating my assumption that
> | they are incorrect before committing my changes ... I also haven't
> | made the corresponding changes to the Hyper code, nor have I checked
> | with the Seaside guys ... however, Given that these changes were
> | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the URLS
> | and other fields for SqueakSource, I assume that at a minimum I'm on
> | the right track:)
> |
> | ----
> |
> | There are actually several problems that are involved here ... To this
> | point I have
> | used SqueakSource on GLASS to do my investigation testing. by creating
> | a project with
> | an ü in it's name, text and title, you seed ü in most (if not all) of
> | the right
> | places.
> |
> | It turns out that there are problems in SqueakSource, Seaside and
> | FastCGI. Here are
> | my findings:
> |
> |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet which
> | explains the
> |      first order problems using an ü anywhere in SqueakSource
> |   2. Norbert's suggestion to decode from UTF8 in
> | FSSeasidehandler>>decodeString:
> |      is good one, but one must also:
> |        - add #decodeString calls to FSSeasidehandler>>unwrapHeaders:
> |        - remove the #decodeUrl: call in
> | FSSeasidehandler>>unwrapFields:
> |        - encode stdin from the responder in
> | FSSeasidehandler>>fieldsFromBody:
> |   3. Finally, I found that Seaside was incorrectly handling redirect
> | urls  (in
> |      WAResponse) and was incorrectly encoding urls in WAUrlEncoder. In
> | both cases
> |      the input string needs to be encoded into UTF8 _before_ going
> | through the
> |      HTML encoding.
> |
> | If you see an url that contains ü encoded as %FC, then you know that
> | the url was
> | encoded _before_ the string was encoded in UTF8. The correct HTML
> | encoding for ü when
> | the string is converted to UTF8 beforehand is %C3%BC.
> |
> | With the changes that I've made 4 of the WAEncoderTests are failing
> | but each of them
> | is encoding into an URL with first encoding into UTF8...
> |
> | Dale
> | ----- "Dale Henrichs" <[hidden email]> wrote:
> |
> | | Norbert and Tobias,
> | |
> | | Okay, the problem is that we are not erroring out on invalid UTF8.
> | the
> | | expression:
> | |
> | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString)
> | |
> | | produces an ASCII string whose value is > 256, but the ASCII is
> | _not_
> | | encoded in utf8 at this point. The following expression produces a
> | | correctly encoded UTF8 string:
> | |
> | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString) encodeAsUTF8
> | |
> | | And the followin expression correctly decodes the UTF8:
> | |  
> | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
> | |
> | | So the bug is that we go into an infinite loop trying to decode an
> | | invalid UTF8 string.
> | |
> | |
> | | Given this info, we should be able to fix the logic in FastCGI (and
> | | elsewhere) to correctly handle the characters...
> | |
> | | Dale
> | |
> | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | |
> | | | Norbert and Tobias,
> | | |
> | | | I've created Issue 109 for this issue:
> | | | http://code.google.com/p/glassdb/issues/detail?id=109
> | | |
> | | | If either one of you happens to come up with a smalltalk-based
> | | | conversion solution let me know, since the resolution will
> | probably
> | | | also apply to the primitive ...
> | | |
> | | | Dale
> | | |
> | | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | | |
> | | | | Tobias,
> | | | |
> | | | | You've probably run into a bug in decodeUTF8. Norbert reported
> | | this
> | | | | problem here:
> | | | |
> | | | |  
> | | | |
> | | |
> | |
> | http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
> | | | |
> | | | | The decode is done in a primitive, so we'd need to probably
> | ship
> | | a
> | | | new
> | | | | version of the server with a fix and that would most likely be
> | | | | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
> | | | |
> | | | | Besides the primitive implementation of of decodeUTF8, there is
> | a
> | | | | Smalltalk implementation of a UTF8 decoder, but I get an error
> | | from
> | | | | that as well:
> | | | |
> | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
> | | | | copyFrom: 2 to: 3)) asNumber) asString)
> | | | |
> | | | | So there is something fishy going on and at this point, I'm not
> | | | sure.
> | | | |
> | | | | Dale
> | | | |
> | | | | ----- "Tobias Pape" <[hidden email]> wrote:
> | | | |
> | | | | | Hi,
> | | | | |
> | | | | | I'm experiencing strange behaviour of my Topaz gems
> | | | | | (typical apache/fastcgi/seaside setup).
> | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
> | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
> | | | | | Does not output any information(i.e., logs).
> | | | | |
> | | | | | Anybody experienced this?
> | | | | |
> | | | | | So Long,
> | | | | | -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

NorbertHartl
In reply to this post by Dale
Dale,

I loaded the packages. It pulled in Gemstone-Exceptions..35 again and I got the error that SafelyPerformBlockRequiringAbort is missing. Can you tell me what is the way to go? The comment in the commit says that SafelyPerformBlockRequiringAbort is moved to a monticello package. But to which package? Then I could load the newest exeptions package and the package containing SafelyPerformBlockRequiringAbort.

thanks,

Norbert

On 11.05.2010, at 01:08, Dale Henrichs wrote:

> Here are the packages you can load to test out the fixes ... The fix for Issue 109 will be part of 1.0-beta.8 ....
>
> Dale
>
> Name: SqueakSource.gemstone-DaleHenrichs.1103
> Author: DaleHenrichs
> Time: 05/10/10, 15:49:49
> UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
> Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
>
> Name: Seaside2.8g1-DaleHenrichs.631
> Author: DaleHenrichs
> Time: 05/10/10, 15:43:09
> UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
> Ancestors: Seaside2.8g1-jgf.630
>
> Name: HyperSeaside-DaleHenrichs.6
> Author: DaleHenrichs
> Time: 05/10/10, 15:53:22
> UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
> Ancestors: HyperSeaside-dkh.5
>
> Name: FastCGISeaside-DaleHenrichs.51
> Author: DaleHenrichs
> Time: 05/10/10, 15:55:16
> UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
> Ancestors: FastCGISeaside-jgf.50
>
> ----- "Dale Henrichs" <[hidden email]> wrote:
>
> | Here's the latest info on Issue 109...I mention failing tests below,
> | so I'm going to spend a little more time validating my assumption that
> | they are incorrect before committing my changes ... I also haven't
> | made the corresponding changes to the Hyper code, nor have I checked
> | with the Seaside guys ... however, Given that these changes were
> | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the URLS
> | and other fields for SqueakSource, I assume that at a minimum I'm on
> | the right track:)
> |
> | ----
> |
> | There are actually several problems that are involved here ... To this
> | point I have
> | used SqueakSource on GLASS to do my investigation testing. by creating
> | a project with
> | an ü in it's name, text and title, you seed ü in most (if not all) of
> | the right
> | places.
> |
> | It turns out that there are problems in SqueakSource, Seaside and
> | FastCGI. Here are
> | my findings:
> |
> |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet which
> | explains the
> |      first order problems using an ü anywhere in SqueakSource
> |   2. Norbert's suggestion to decode from UTF8 in
> | FSSeasidehandler>>decodeString:
> |      is good one, but one must also:
> |        - add #decodeString calls to FSSeasidehandler>>unwrapHeaders:
> |        - remove the #decodeUrl: call in
> | FSSeasidehandler>>unwrapFields:
> |        - encode stdin from the responder in
> | FSSeasidehandler>>fieldsFromBody:
> |   3. Finally, I found that Seaside was incorrectly handling redirect
> | urls  (in
> |      WAResponse) and was incorrectly encoding urls in WAUrlEncoder. In
> | both cases
> |      the input string needs to be encoded into UTF8 _before_ going
> | through the
> |      HTML encoding.
> |
> | If you see an url that contains ü encoded as %FC, then you know that
> | the url was
> | encoded _before_ the string was encoded in UTF8. The correct HTML
> | encoding for ü when
> | the string is converted to UTF8 beforehand is %C3%BC.
> |
> | With the changes that I've made 4 of the WAEncoderTests are failing
> | but each of them
> | is encoding into an URL with first encoding into UTF8...
> |
> | Dale
> | ----- "Dale Henrichs" <[hidden email]> wrote:
> |
> | | Norbert and Tobias,
> | |
> | | Okay, the problem is that we are not erroring out on invalid UTF8.
> | the
> | | expression:
> | |
> | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString)
> | |
> | | produces an ASCII string whose value is > 256, but the ASCII is
> | _not_
> | | encoded in utf8 at this point. The following expression produces a
> | | correctly encoded UTF8 string:
> | |
> | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString) encodeAsUTF8
> | |
> | | And the followin expression correctly decodes the UTF8:
> | |  
> | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
> | |
> | | So the bug is that we go into an infinite loop trying to decode an
> | | invalid UTF8 string.
> | |
> | |
> | | Given this info, we should be able to fix the logic in FastCGI (and
> | | elsewhere) to correctly handle the characters...
> | |
> | | Dale
> | |
> | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | |
> | | | Norbert and Tobias,
> | | |
> | | | I've created Issue 109 for this issue:
> | | | http://code.google.com/p/glassdb/issues/detail?id=109
> | | |
> | | | If either one of you happens to come up with a smalltalk-based
> | | | conversion solution let me know, since the resolution will
> | probably
> | | | also apply to the primitive ...
> | | |
> | | | Dale
> | | |
> | | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | | |
> | | | | Tobias,
> | | | |
> | | | | You've probably run into a bug in decodeUTF8. Norbert reported
> | | this
> | | | | problem here:
> | | | |
> | | | |  
> | | | |
> | | |
> | |
> | http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
> | | | |
> | | | | The decode is done in a primitive, so we'd need to probably
> | ship
> | | a
> | | | new
> | | | | version of the server with a fix and that would most likely be
> | | | | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
> | | | |
> | | | | Besides the primitive implementation of of decodeUTF8, there is
> | a
> | | | | Smalltalk implementation of a UTF8 decoder, but I get an error
> | | from
> | | | | that as well:
> | | | |
> | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
> | | | | copyFrom: 2 to: 3)) asNumber) asString)
> | | | |
> | | | | So there is something fishy going on and at this point, I'm not
> | | | sure.
> | | | |
> | | | | Dale
> | | | |
> | | | | ----- "Tobias Pape" <[hidden email]> wrote:
> | | | |
> | | | | | Hi,
> | | | | |
> | | | | | I'm experiencing strange behaviour of my Topaz gems
> | | | | | (typical apache/fastcgi/seaside setup).
> | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
> | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
> | | | | | Does not output any information(i.e., logs).
> | | | | |
> | | | | | Anybody experienced this?
> | | | | |
> | | | | | So Long,
> | | | | | -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

NorbertHartl
In reply to this post by Dale
Dale,

I did some tests and it looks quite good. Everyone that likes to test this with pier could do a quick workaround. Just exchange the last line in PRPath>>isValidName: from

                and: [ aString allSatisfy: [ :char | self validCharacters includes: char ] ] ] ] ]

to

                and: [ aString allSatisfy: [ :char | char isLetter or: [self validCharacters includes: char] ] ] ] ] ]

This way you can create pages in pier that contain extended characters. In my case it works good as you can see

http://herr-rosso.de/traube/gewürztraminer

thanks a lot,

Norbert

On 11.05.2010, at 01:08, Dale Henrichs wrote:

> Here are the packages you can load to test out the fixes ... The fix for Issue 109 will be part of 1.0-beta.8 ....
>
> Dale
>
> Name: SqueakSource.gemstone-DaleHenrichs.1103
> Author: DaleHenrichs
> Time: 05/10/10, 15:49:49
> UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
> Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
>
> Name: Seaside2.8g1-DaleHenrichs.631
> Author: DaleHenrichs
> Time: 05/10/10, 15:43:09
> UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
> Ancestors: Seaside2.8g1-jgf.630
>
> Name: HyperSeaside-DaleHenrichs.6
> Author: DaleHenrichs
> Time: 05/10/10, 15:53:22
> UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
> Ancestors: HyperSeaside-dkh.5
>
> Name: FastCGISeaside-DaleHenrichs.51
> Author: DaleHenrichs
> Time: 05/10/10, 15:55:16
> UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
> Ancestors: FastCGISeaside-jgf.50
>
> ----- "Dale Henrichs" <[hidden email]> wrote:
>
> | Here's the latest info on Issue 109...I mention failing tests below,
> | so I'm going to spend a little more time validating my assumption that
> | they are incorrect before committing my changes ... I also haven't
> | made the corresponding changes to the Hyper code, nor have I checked
> | with the Seaside guys ... however, Given that these changes were
> | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the URLS
> | and other fields for SqueakSource, I assume that at a minimum I'm on
> | the right track:)
> |
> | ----
> |
> | There are actually several problems that are involved here ... To this
> | point I have
> | used SqueakSource on GLASS to do my investigation testing. by creating
> | a project with
> | an ü in it's name, text and title, you seed ü in most (if not all) of
> | the right
> | places.
> |
> | It turns out that there are problems in SqueakSource, Seaside and
> | FastCGI. Here are
> | my findings:
> |
> |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet which
> | explains the
> |      first order problems using an ü anywhere in SqueakSource
> |   2. Norbert's suggestion to decode from UTF8 in
> | FSSeasidehandler>>decodeString:
> |      is good one, but one must also:
> |        - add #decodeString calls to FSSeasidehandler>>unwrapHeaders:
> |        - remove the #decodeUrl: call in
> | FSSeasidehandler>>unwrapFields:
> |        - encode stdin from the responder in
> | FSSeasidehandler>>fieldsFromBody:
> |   3. Finally, I found that Seaside was incorrectly handling redirect
> | urls  (in
> |      WAResponse) and was incorrectly encoding urls in WAUrlEncoder. In
> | both cases
> |      the input string needs to be encoded into UTF8 _before_ going
> | through the
> |      HTML encoding.
> |
> | If you see an url that contains ü encoded as %FC, then you know that
> | the url was
> | encoded _before_ the string was encoded in UTF8. The correct HTML
> | encoding for ü when
> | the string is converted to UTF8 beforehand is %C3%BC.
> |
> | With the changes that I've made 4 of the WAEncoderTests are failing
> | but each of them
> | is encoding into an URL with first encoding into UTF8...
> |
> | Dale
> | ----- "Dale Henrichs" <[hidden email]> wrote:
> |
> | | Norbert and Tobias,
> | |
> | | Okay, the problem is that we are not erroring out on invalid UTF8.
> | the
> | | expression:
> | |
> | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString)
> | |
> | | produces an ASCII string whose value is > 256, but the ASCII is
> | _not_
> | | encoded in utf8 at this point. The following expression produces a
> | | correctly encoded UTF8 string:
> | |
> | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | asNumber)
> | | asString) encodeAsUTF8
> | |
> | | And the followin expression correctly decodes the UTF8:
> | |  
> | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
> | |
> | | So the bug is that we go into an infinite loop trying to decode an
> | | invalid UTF8 string.
> | |
> | |
> | | Given this info, we should be able to fix the logic in FastCGI (and
> | | elsewhere) to correctly handle the characters...
> | |
> | | Dale
> | |
> | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | |
> | | | Norbert and Tobias,
> | | |
> | | | I've created Issue 109 for this issue:
> | | | http://code.google.com/p/glassdb/issues/detail?id=109
> | | |
> | | | If either one of you happens to come up with a smalltalk-based
> | | | conversion solution let me know, since the resolution will
> | probably
> | | | also apply to the primitive ...
> | | |
> | | | Dale
> | | |
> | | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | | |
> | | | | Tobias,
> | | | |
> | | | | You've probably run into a bug in decodeUTF8. Norbert reported
> | | this
> | | | | problem here:
> | | | |
> | | | |  
> | | | |
> | | |
> | |
> | http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
> | | | |
> | | | | The decode is done in a primitive, so we'd need to probably
> | ship
> | | a
> | | | new
> | | | | version of the server with a fix and that would most likely be
> | | | | piggy-backed with the imminent:) 1.0-beta.8 release for 2.4.x
> | | | |
> | | | | Besides the primitive implementation of of decodeUTF8, there is
> | a
> | | | | Smalltalk implementation of a UTF8 decoder, but I get an error
> | | from
> | | | | that as well:
> | | | |
> | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' , ('%FC'
> | | | | copyFrom: 2 to: 3)) asNumber) asString)
> | | | |
> | | | | So there is something fishy going on and at this point, I'm not
> | | | sure.
> | | | |
> | | | | Dale
> | | | |
> | | | | ----- "Tobias Pape" <[hidden email]> wrote:
> | | | |
> | | | | | Hi,
> | | | | |
> | | | | | I'm experiencing strange behaviour of my Topaz gems
> | | | | | (typical apache/fastcgi/seaside setup).
> | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
> | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
> | | | | | Does not output any information(i.e., logs).
> | | | | |
> | | | | | Anybody experienced this?
> | | | | |
> | | | | | So Long,
> | | | | | -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
In reply to this post by NorbertHartl
Norbert,

SafelyPerformBlockRequiringAbort was moved into the Monticello package:

  Monticello.g-Dalehenrichs.400

Dale
----- "Norbert Hartl" <[hidden email]> wrote:

| Dale,
|
| I loaded the packages. It pulled in Gemstone-Exceptions..35 again and
| I got the error that SafelyPerformBlockRequiringAbort is missing. Can
| you tell me what is the way to go? The comment in the commit says that
| SafelyPerformBlockRequiringAbort is moved to a monticello package. But
| to which package? Then I could load the newest exeptions package and
| the package containing SafelyPerformBlockRequiringAbort.
|
| thanks,
|
| Norbert
|
| On 11.05.2010, at 01:08, Dale Henrichs wrote:
|
| > Here are the packages you can load to test out the fixes ... The fix
| for Issue 109 will be part of 1.0-beta.8 ....
| >
| > Dale
| >
| > Name: SqueakSource.gemstone-DaleHenrichs.1103
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:49:49
| > UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
| > Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
| >
| > Name: Seaside2.8g1-DaleHenrichs.631
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:43:09
| > UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
| > Ancestors: Seaside2.8g1-jgf.630
| >
| > Name: HyperSeaside-DaleHenrichs.6
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:53:22
| > UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
| > Ancestors: HyperSeaside-dkh.5
| >
| > Name: FastCGISeaside-DaleHenrichs.51
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:55:16
| > UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
| > Ancestors: FastCGISeaside-jgf.50
| >
| > ----- "Dale Henrichs" <[hidden email]> wrote:
| >
| > | Here's the latest info on Issue 109...I mention failing tests
| below,
| > | so I'm going to spend a little more time validating my assumption
| that
| > | they are incorrect before committing my changes ... I also
| haven't
| > | made the corresponding changes to the Hyper code, nor have I
| checked
| > | with the Seaside guys ... however, Given that these changes were
| > | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the
| URLS
| > | and other fields for SqueakSource, I assume that at a minimum I'm
| on
| > | the right track:)
| > |
| > | ----
| > |
| > | There are actually several problems that are involved here ... To
| this
| > | point I have
| > | used SqueakSource on GLASS to do my investigation testing. by
| creating
| > | a project with
| > | an ü in it's name, text and title, you seed ü in most (if not all)
| of
| > | the right
| > | places.
| > |
| > | It turns out that there are problems in SqueakSource, Seaside and
| > | FastCGI. Here are
| > | my findings:
| > |
| > |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet
| which
| > | explains the
| > |      first order problems using an ü anywhere in SqueakSource
| > |   2. Norbert's suggestion to decode from UTF8 in
| > | FSSeasidehandler>>decodeString:
| > |      is good one, but one must also:
| > |        - add #decodeString calls to
| FSSeasidehandler>>unwrapHeaders:
| > |        - remove the #decodeUrl: call in
| > | FSSeasidehandler>>unwrapFields:
| > |        - encode stdin from the responder in
| > | FSSeasidehandler>>fieldsFromBody:
| > |   3. Finally, I found that Seaside was incorrectly handling
| redirect
| > | urls  (in
| > |      WAResponse) and was incorrectly encoding urls in
| WAUrlEncoder. In
| > | both cases
| > |      the input string needs to be encoded into UTF8 _before_
| going
| > | through the
| > |      HTML encoding.
| > |
| > | If you see an url that contains ü encoded as %FC, then you know
| that
| > | the url was
| > | encoded _before_ the string was encoded in UTF8. The correct HTML
| > | encoding for ü when
| > | the string is converted to UTF8 beforehand is %C3%BC.
| > |
| > | With the changes that I've made 4 of the WAEncoderTests are
| failing
| > | but each of them
| > | is encoding into an URL with first encoding into UTF8...
| > |
| > | Dale
| > | ----- "Dale Henrichs" <[hidden email]> wrote:
| > |
| > | | Norbert and Tobias,
| > | |
| > | | Okay, the problem is that we are not erroring out on invalid
| UTF8.
| > | the
| > | | expression:
| > | |
| > | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | asNumber)
| > | | asString)
| > | |
| > | | produces an ASCII string whose value is > 256, but the ASCII is
| > | _not_
| > | | encoded in utf8 at this point. The following expression produces
| a
| > | | correctly encoded UTF8 string:
| > | |
| > | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | asNumber)
| > | | asString) encodeAsUTF8
| > | |
| > | | And the followin expression correctly decodes the UTF8:
| > | |  
| > | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
| > | |
| > | | So the bug is that we go into an infinite loop trying to decode
| an
| > | | invalid UTF8 string.
| > | |
| > | |
| > | | Given this info, we should be able to fix the logic in FastCGI
| (and
| > | | elsewhere) to correctly handle the characters...
| > | |
| > | | Dale
| > | |
| > | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | |
| > | | | Norbert and Tobias,
| > | | |
| > | | | I've created Issue 109 for this issue:
| > | | | http://code.google.com/p/glassdb/issues/detail?id=109
| > | | |
| > | | | If either one of you happens to come up with a
| smalltalk-based
| > | | | conversion solution let me know, since the resolution will
| > | probably
| > | | | also apply to the primitive ...
| > | | |
| > | | | Dale
| > | | |
| > | | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | | |
| > | | | | Tobias,
| > | | | |
| > | | | | You've probably run into a bug in decodeUTF8. Norbert
| reported
| > | | this
| > | | | | problem here:
| > | | | |
| > | | | |  
| > | | | |
| > | | |
| > | |
| > |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| > | | | |
| > | | | | The decode is done in a primitive, so we'd need to probably
| > | ship
| > | | a
| > | | | new
| > | | | | version of the server with a fix and that would most likely
| be
| > | | | | piggy-backed with the imminent:) 1.0-beta.8 release for
| 2.4.x
| > | | | |
| > | | | | Besides the primitive implementation of of decodeUTF8, there
| is
| > | a
| > | | | | Smalltalk implementation of a UTF8 decoder, but I get an
| error
| > | | from
| > | | | | that as well:
| > | | | |
| > | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' ,
| ('%FC'
| > | | | | copyFrom: 2 to: 3)) asNumber) asString)
| > | | | |
| > | | | | So there is something fishy going on and at this point, I'm
| not
| > | | | sure.
| > | | | |
| > | | | | Dale
| > | | | |
| > | | | | ----- "Tobias Pape" <[hidden email]> wrote:
| > | | | |
| > | | | | | Hi,
| > | | | | |
| > | | | | | I'm experiencing strange behaviour of my Topaz gems
| > | | | | | (typical apache/fastcgi/seaside setup).
| > | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
| > | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| > | | | | | Does not output any information(i.e., logs).
| > | | | | |
| > | | | | | Anybody experienced this?
| > | | | | |
| > | | | | | So Long,
| > | | | | | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
In reply to this post by NorbertHartl
Norbert,

I've submitted Issue 113 (http://code.google.com/p/glassdb/issues/detail?id=113) to track the fix to Pier ... I should get to it after 1.0-beta.8 goes out.

I'm glad to see that we finally got http://herr-rosso.de/traube/gewürztraminer working for you:)

Dale
----- "Norbert Hartl" <[hidden email]> wrote:

| Dale,
|
| I did some tests and it looks quite good. Everyone that likes to test
| this with pier could do a quick workaround. Just exchange the last
| line in PRPath>>isValidName: from
|
| and: [ aString allSatisfy: [ :char | self validCharacters includes:
| char ] ] ] ] ]
|
| to
|
| and: [ aString allSatisfy: [ :char | char isLetter or: [self
| validCharacters includes: char] ] ] ] ] ]
|
| This way you can create pages in pier that contain extended
| characters. In my case it works good as you can see
|
| http://herr-rosso.de/traube/gewürztraminer
|
| thanks a lot,
|
| Norbert
|
| On 11.05.2010, at 01:08, Dale Henrichs wrote:
|
| > Here are the packages you can load to test out the fixes ... The fix
| for Issue 109 will be part of 1.0-beta.8 ....
| >
| > Dale
| >
| > Name: SqueakSource.gemstone-DaleHenrichs.1103
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:49:49
| > UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
| > Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
| >
| > Name: Seaside2.8g1-DaleHenrichs.631
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:43:09
| > UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
| > Ancestors: Seaside2.8g1-jgf.630
| >
| > Name: HyperSeaside-DaleHenrichs.6
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:53:22
| > UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
| > Ancestors: HyperSeaside-dkh.5
| >
| > Name: FastCGISeaside-DaleHenrichs.51
| > Author: DaleHenrichs
| > Time: 05/10/10, 15:55:16
| > UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
| > Ancestors: FastCGISeaside-jgf.50
| >
| > ----- "Dale Henrichs" <[hidden email]> wrote:
| >
| > | Here's the latest info on Issue 109...I mention failing tests
| below,
| > | so I'm going to spend a little more time validating my assumption
| that
| > | they are incorrect before committing my changes ... I also
| haven't
| > | made the corresponding changes to the Hyper code, nor have I
| checked
| > | with the Seaside guys ... however, Given that these changes were
| > | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the
| URLS
| > | and other fields for SqueakSource, I assume that at a minimum I'm
| on
| > | the right track:)
| > |
| > | ----
| > |
| > | There are actually several problems that are involved here ... To
| this
| > | point I have
| > | used SqueakSource on GLASS to do my investigation testing. by
| creating
| > | a project with
| > | an ü in it's name, text and title, you seed ü in most (if not all)
| of
| > | the right
| > | places.
| > |
| > | It turns out that there are problems in SqueakSource, Seaside and
| > | FastCGI. Here are
| > | my findings:
| > |
| > |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet
| which
| > | explains the
| > |      first order problems using an ü anywhere in SqueakSource
| > |   2. Norbert's suggestion to decode from UTF8 in
| > | FSSeasidehandler>>decodeString:
| > |      is good one, but one must also:
| > |        - add #decodeString calls to
| FSSeasidehandler>>unwrapHeaders:
| > |        - remove the #decodeUrl: call in
| > | FSSeasidehandler>>unwrapFields:
| > |        - encode stdin from the responder in
| > | FSSeasidehandler>>fieldsFromBody:
| > |   3. Finally, I found that Seaside was incorrectly handling
| redirect
| > | urls  (in
| > |      WAResponse) and was incorrectly encoding urls in
| WAUrlEncoder. In
| > | both cases
| > |      the input string needs to be encoded into UTF8 _before_
| going
| > | through the
| > |      HTML encoding.
| > |
| > | If you see an url that contains ü encoded as %FC, then you know
| that
| > | the url was
| > | encoded _before_ the string was encoded in UTF8. The correct HTML
| > | encoding for ü when
| > | the string is converted to UTF8 beforehand is %C3%BC.
| > |
| > | With the changes that I've made 4 of the WAEncoderTests are
| failing
| > | but each of them
| > | is encoding into an URL with first encoding into UTF8...
| > |
| > | Dale
| > | ----- "Dale Henrichs" <[hidden email]> wrote:
| > |
| > | | Norbert and Tobias,
| > | |
| > | | Okay, the problem is that we are not erroring out on invalid
| UTF8.
| > | the
| > | | expression:
| > | |
| > | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | asNumber)
| > | | asString)
| > | |
| > | | produces an ASCII string whose value is > 256, but the ASCII is
| > | _not_
| > | | encoded in utf8 at this point. The following expression produces
| a
| > | | correctly encoded UTF8 string:
| > | |
| > | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | asNumber)
| > | | asString) encodeAsUTF8
| > | |
| > | | And the followin expression correctly decodes the UTF8:
| > | |  
| > | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
| > | |
| > | | So the bug is that we go into an infinite loop trying to decode
| an
| > | | invalid UTF8 string.
| > | |
| > | |
| > | | Given this info, we should be able to fix the logic in FastCGI
| (and
| > | | elsewhere) to correctly handle the characters...
| > | |
| > | | Dale
| > | |
| > | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | |
| > | | | Norbert and Tobias,
| > | | |
| > | | | I've created Issue 109 for this issue:
| > | | | http://code.google.com/p/glassdb/issues/detail?id=109
| > | | |
| > | | | If either one of you happens to come up with a
| smalltalk-based
| > | | | conversion solution let me know, since the resolution will
| > | probably
| > | | | also apply to the primitive ...
| > | | |
| > | | | Dale
| > | | |
| > | | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | | |
| > | | | | Tobias,
| > | | | |
| > | | | | You've probably run into a bug in decodeUTF8. Norbert
| reported
| > | | this
| > | | | | problem here:
| > | | | |
| > | | | |  
| > | | | |
| > | | |
| > | |
| > |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| > | | | |
| > | | | | The decode is done in a primitive, so we'd need to probably
| > | ship
| > | | a
| > | | | new
| > | | | | version of the server with a fix and that would most likely
| be
| > | | | | piggy-backed with the imminent:) 1.0-beta.8 release for
| 2.4.x
| > | | | |
| > | | | | Besides the primitive implementation of of decodeUTF8, there
| is
| > | a
| > | | | | Smalltalk implementation of a UTF8 decoder, but I get an
| error
| > | | from
| > | | | | that as well:
| > | | | |
| > | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' ,
| ('%FC'
| > | | | | copyFrom: 2 to: 3)) asNumber) asString)
| > | | | |
| > | | | | So there is something fishy going on and at this point, I'm
| not
| > | | | sure.
| > | | | |
| > | | | | Dale
| > | | | |
| > | | | | ----- "Tobias Pape" <[hidden email]> wrote:
| > | | | |
| > | | | | | Hi,
| > | | | | |
| > | | | | | I'm experiencing strange behaviour of my Topaz gems
| > | | | | | (typical apache/fastcgi/seaside setup).
| > | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
| > | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| > | | | | | Does not output any information(i.e., logs).
| > | | | | |
| > | | | | | Anybody experienced this?
| > | | | | |
| > | | | | | So Long,
| > | | | | | -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

NorbertHartl

On 11.05.2010, at 20:39, Dale Henrichs wrote:

> Norbert,
>
> I've submitted Issue 113 (http://code.google.com/p/glassdb/issues/detail?id=113) to track the fix to Pier ... I should get to it after 1.0-beta.8 goes out.
>
Ok, but there a few more things to do. I'll see if I can find some time to investigate.

> I'm glad to see that we finally got http://herr-rosso.de/traube/gewürztraminer working for you:)
>
Thanks. Isn't it nice?  Well, and this is the only reason why I wanted this. :)

Norbert

> Dale
> ----- "Norbert Hartl" <[hidden email]> wrote:
>
> | Dale,
> |
> | I did some tests and it looks quite good. Everyone that likes to test
> | this with pier could do a quick workaround. Just exchange the last
> | line in PRPath>>isValidName: from
> |
> | and: [ aString allSatisfy: [ :char | self validCharacters includes:
> | char ] ] ] ] ]
> |
> | to
> |
> | and: [ aString allSatisfy: [ :char | char isLetter or: [self
> | validCharacters includes: char] ] ] ] ] ]
> |
> | This way you can create pages in pier that contain extended
> | characters. In my case it works good as you can see
> |
> | http://herr-rosso.de/traube/gewürztraminer
> |
> | thanks a lot,
> |
> | Norbert
> |
> | On 11.05.2010, at 01:08, Dale Henrichs wrote:
> |
> | > Here are the packages you can load to test out the fixes ... The fix
> | for Issue 109 will be part of 1.0-beta.8 ....
> | >
> | > Dale
> | >
> | > Name: SqueakSource.gemstone-DaleHenrichs.1103
> | > Author: DaleHenrichs
> | > Time: 05/10/10, 15:49:49
> | > UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
> | > Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
> | >
> | > Name: Seaside2.8g1-DaleHenrichs.631
> | > Author: DaleHenrichs
> | > Time: 05/10/10, 15:43:09
> | > UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
> | > Ancestors: Seaside2.8g1-jgf.630
> | >
> | > Name: HyperSeaside-DaleHenrichs.6
> | > Author: DaleHenrichs
> | > Time: 05/10/10, 15:53:22
> | > UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
> | > Ancestors: HyperSeaside-dkh.5
> | >
> | > Name: FastCGISeaside-DaleHenrichs.51
> | > Author: DaleHenrichs
> | > Time: 05/10/10, 15:55:16
> | > UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
> | > Ancestors: FastCGISeaside-jgf.50
> | >
> | > ----- "Dale Henrichs" <[hidden email]> wrote:
> | >
> | > | Here's the latest info on Issue 109...I mention failing tests
> | below,
> | > | so I'm going to spend a little more time validating my assumption
> | that
> | > | they are incorrect before committing my changes ... I also
> | haven't
> | > | made the corresponding changes to the Hyper code, nor have I
> | checked
> | > | with the Seaside guys ... however, Given that these changes were
> | > | _required_ to get Seaside2.8 and SqueakSource to handle ü  in the
> | URLS
> | > | and other fields for SqueakSource, I assume that at a minimum I'm
> | on
> | > | the right track:)
> | > |
> | > | ----
> | > |
> | > | There are actually several problems that are involved here ... To
> | this
> | > | point I have
> | > | used SqueakSource on GLASS to do my investigation testing. by
> | creating
> | > | a project with
> | > | an ü in it's name, text and title, you seed ü in most (if not all)
> | of
> | > | the right
> | > | places.
> | > |
> | > | It turns out that there are problems in SqueakSource, Seaside and
> | > | FastCGI. Here are
> | > | my findings:
> | > |
> | > |   1. SSSession>>charSet is wired to use the iso-8859-1 charSet
> | which
> | > | explains the
> | > |      first order problems using an ü anywhere in SqueakSource
> | > |   2. Norbert's suggestion to decode from UTF8 in
> | > | FSSeasidehandler>>decodeString:
> | > |      is good one, but one must also:
> | > |        - add #decodeString calls to
> | FSSeasidehandler>>unwrapHeaders:
> | > |        - remove the #decodeUrl: call in
> | > | FSSeasidehandler>>unwrapFields:
> | > |        - encode stdin from the responder in
> | > | FSSeasidehandler>>fieldsFromBody:
> | > |   3. Finally, I found that Seaside was incorrectly handling
> | redirect
> | > | urls  (in
> | > |      WAResponse) and was incorrectly encoding urls in
> | WAUrlEncoder. In
> | > | both cases
> | > |      the input string needs to be encoded into UTF8 _before_
> | going
> | > | through the
> | > |      HTML encoding.
> | > |
> | > | If you see an url that contains ü encoded as %FC, then you know
> | that
> | > | the url was
> | > | encoded _before_ the string was encoded in UTF8. The correct HTML
> | > | encoding for ü when
> | > | the string is converted to UTF8 beforehand is %C3%BC.
> | > |
> | > | With the changes that I've made 4 of the WAEncoderTests are
> | failing
> | > | but each of them
> | > | is encoding into an URL with first encoding into UTF8...
> | > |
> | > | Dale
> | > | ----- "Dale Henrichs" <[hidden email]> wrote:
> | > |
> | > | | Norbert and Tobias,
> | > | |
> | > | | Okay, the problem is that we are not erroring out on invalid
> | UTF8.
> | > | the
> | > | | expression:
> | > | |
> | > | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | > | asNumber)
> | > | | asString)
> | > | |
> | > | | produces an ASCII string whose value is > 256, but the ASCII is
> | > | _not_
> | > | | encoded in utf8 at this point. The following expression produces
> | a
> | > | | correctly encoded UTF8 string:
> | > | |
> | > | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | > | asNumber)
> | > | | asString) encodeAsUTF8
> | > | |
> | > | | And the followin expression correctly decodes the UTF8:
> | > | |  
> | > | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
> | > | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
> | > | |
> | > | | So the bug is that we go into an infinite loop trying to decode
> | an
> | > | | invalid UTF8 string.
> | > | |
> | > | |
> | > | | Given this info, we should be able to fix the logic in FastCGI
> | (and
> | > | | elsewhere) to correctly handle the characters...
> | > | |
> | > | | Dale
> | > | |
> | > | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | > | |
> | > | | | Norbert and Tobias,
> | > | | |
> | > | | | I've created Issue 109 for this issue:
> | > | | | http://code.google.com/p/glassdb/issues/detail?id=109
> | > | | |
> | > | | | If either one of you happens to come up with a
> | smalltalk-based
> | > | | | conversion solution let me know, since the resolution will
> | > | probably
> | > | | | also apply to the primitive ...
> | > | | |
> | > | | | Dale
> | > | | |
> | > | | | ----- "Dale Henrichs" <[hidden email]> wrote:
> | > | | |
> | > | | | | Tobias,
> | > | | | |
> | > | | | | You've probably run into a bug in decodeUTF8. Norbert
> | reported
> | > | | this
> | > | | | | problem here:
> | > | | | |
> | > | | | |  
> | > | | | |
> | > | | |
> | > | |
> | > |
> | http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
> | > | | | |
> | > | | | | The decode is done in a primitive, so we'd need to probably
> | > | ship
> | > | | a
> | > | | | new
> | > | | | | version of the server with a fix and that would most likely
> | be
> | > | | | | piggy-backed with the imminent:) 1.0-beta.8 release for
> | 2.4.x
> | > | | | |
> | > | | | | Besides the primitive implementation of of decodeUTF8, there
> | is
> | > | a
> | > | | | | Smalltalk implementation of a UTF8 decoder, but I get an
> | error
> | > | | from
> | > | | | | that as well:
> | > | | | |
> | > | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' ,
> | ('%FC'
> | > | | | | copyFrom: 2 to: 3)) asNumber) asString)
> | > | | | |
> | > | | | | So there is something fishy going on and at this point, I'm
> | not
> | > | | | sure.
> | > | | | |
> | > | | | | Dale
> | > | | | |
> | > | | | | ----- "Tobias Pape" <[hidden email]> wrote:
> | > | | | |
> | > | | | | | Hi,
> | > | | | | |
> | > | | | | | I'm experiencing strange behaviour of my Topaz gems
> | > | | | | | (typical apache/fastcgi/seaside setup).
> | > | | | | | When using my Squeaksource2, and entering an umlaut (äöü)
> | > | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
> | > | | | | | Does not output any information(i.e., logs).
> | > | | | | |
> | > | | | | | Anybody experienced this?
> | > | | | | |
> | > | | | | | So Long,
> | > | | | | | -Tobias

Reply | Threaded
Open this post in threaded view
|

Re: Umlaut problems in 1.0-beta.3

Dale
Norbert,

If you want to update Issue 113 with your findings (or suspicions) go ahead...

Dale
----- "Norbert Hartl" <[hidden email]> wrote:

| On 11.05.2010, at 20:39, Dale Henrichs wrote:
|
| > Norbert,
| >
| > I've submitted Issue 113
| (http://code.google.com/p/glassdb/issues/detail?id=113) to track the
| fix to Pier ... I should get to it after 1.0-beta.8 goes out.
| >
| Ok, but there a few more things to do. I'll see if I can find some
| time to investigate.
|
| > I'm glad to see that we finally got
| http://herr-rosso.de/traube/gewürztraminer working for you:)
| >
| Thanks. Isn't it nice?  Well, and this is the only reason why I wanted
| this. :)
|
| Norbert
|
| > Dale
| > ----- "Norbert Hartl" <[hidden email]> wrote:
| >
| > | Dale,
| > |
| > | I did some tests and it looks quite good. Everyone that likes to
| test
| > | this with pier could do a quick workaround. Just exchange the
| last
| > | line in PRPath>>isValidName: from
| > |
| > | and: [ aString allSatisfy: [ :char | self validCharacters
| includes:
| > | char ] ] ] ] ]
| > |
| > | to
| > |
| > | and: [ aString allSatisfy: [ :char | char isLetter or: [self
| > | validCharacters includes: char] ] ] ] ] ]
| > |
| > | This way you can create pages in pier that contain extended
| > | characters. In my case it works good as you can see
| > |
| > | http://herr-rosso.de/traube/gewürztraminer
| > |
| > | thanks a lot,
| > |
| > | Norbert
| > |
| > | On 11.05.2010, at 01:08, Dale Henrichs wrote:
| > |
| > | > Here are the packages you can load to test out the fixes ... The
| fix
| > | for Issue 109 will be part of 1.0-beta.8 ....
| > | >
| > | > Dale
| > | >
| > | > Name: SqueakSource.gemstone-DaleHenrichs.1103
| > | > Author: DaleHenrichs
| > | > Time: 05/10/10, 15:49:49
| > | > UUID: 20d0151a-7228-4959-9eff-2f60b5c81f9e
| > | > Ancestors: SqueakSource.gemstone-DaleHenrichs.1102
| > | >
| > | > Name: Seaside2.8g1-DaleHenrichs.631
| > | > Author: DaleHenrichs
| > | > Time: 05/10/10, 15:43:09
| > | > UUID: f03f29e7-3c26-4a6f-9522-a0a63d785528
| > | > Ancestors: Seaside2.8g1-jgf.630
| > | >
| > | > Name: HyperSeaside-DaleHenrichs.6
| > | > Author: DaleHenrichs
| > | > Time: 05/10/10, 15:53:22
| > | > UUID: 72c674e1-70bb-4f7c-846b-2f40b459302e
| > | > Ancestors: HyperSeaside-dkh.5
| > | >
| > | > Name: FastCGISeaside-DaleHenrichs.51
| > | > Author: DaleHenrichs
| > | > Time: 05/10/10, 15:55:16
| > | > UUID: 18d5ef73-7637-4713-9efa-c3a93cc61ffb
| > | > Ancestors: FastCGISeaside-jgf.50
| > | >
| > | > ----- "Dale Henrichs" <[hidden email]> wrote:
| > | >
| > | > | Here's the latest info on Issue 109...I mention failing tests
| > | below,
| > | > | so I'm going to spend a little more time validating my
| assumption
| > | that
| > | > | they are incorrect before committing my changes ... I also
| > | haven't
| > | > | made the corresponding changes to the Hyper code, nor have I
| > | checked
| > | > | with the Seaside guys ... however, Given that these changes
| were
| > | > | _required_ to get Seaside2.8 and SqueakSource to handle ü  in
| the
| > | URLS
| > | > | and other fields for SqueakSource, I assume that at a minimum
| I'm
| > | on
| > | > | the right track:)
| > | > |
| > | > | ----
| > | > |
| > | > | There are actually several problems that are involved here ...
| To
| > | this
| > | > | point I have
| > | > | used SqueakSource on GLASS to do my investigation testing. by
| > | creating
| > | > | a project with
| > | > | an ü in it's name, text and title, you seed ü in most (if not
| all)
| > | of
| > | > | the right
| > | > | places.
| > | > |
| > | > | It turns out that there are problems in SqueakSource, Seaside
| and
| > | > | FastCGI. Here are
| > | > | my findings:
| > | > |
| > | > |   1. SSSession>>charSet is wired to use the iso-8859-1
| charSet
| > | which
| > | > | explains the
| > | > |      first order problems using an ü anywhere in SqueakSource
| > | > |   2. Norbert's suggestion to decode from UTF8 in
| > | > | FSSeasidehandler>>decodeString:
| > | > |      is good one, but one must also:
| > | > |        - add #decodeString calls to
| > | FSSeasidehandler>>unwrapHeaders:
| > | > |        - remove the #decodeUrl: call in
| > | > | FSSeasidehandler>>unwrapFields:
| > | > |        - encode stdin from the responder in
| > | > | FSSeasidehandler>>fieldsFromBody:
| > | > |   3. Finally, I found that Seaside was incorrectly handling
| > | redirect
| > | > | urls  (in
| > | > |      WAResponse) and was incorrectly encoding urls in
| > | WAUrlEncoder. In
| > | > | both cases
| > | > |      the input string needs to be encoded into UTF8 _before_
| > | going
| > | > | through the
| > | > |      HTML encoding.
| > | > |
| > | > | If you see an url that contains ü encoded as %FC, then you
| know
| > | that
| > | > | the url was
| > | > | encoded _before_ the string was encoded in UTF8. The correct
| HTML
| > | > | encoding for ü when
| > | > | the string is converted to UTF8 beforehand is %C3%BC.
| > | > |
| > | > | With the changes that I've made 4 of the WAEncoderTests are
| > | failing
| > | > | but each of them
| > | > | is encoding into an URL with first encoding into UTF8...
| > | > |
| > | > | Dale
| > | > | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | > |
| > | > | | Norbert and Tobias,
| > | > | |
| > | > | | Okay, the problem is that we are not erroring out on
| invalid
| > | UTF8.
| > | > | the
| > | > | | expression:
| > | > | |
| > | > | |   ((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to:
| 3))
| > | > | asNumber)
| > | > | | asString)
| > | > | |
| > | > | | produces an ASCII string whose value is > 256, but the ASCII
| is
| > | > | _not_
| > | > | | encoded in utf8 at this point. The following expression
| produces
| > | a
| > | > | | correctly encoded UTF8 string:
| > | > | |
| > | > | |   (Character codePoint: ('16r' , ('%FC' copyFrom: 2 to: 3))
| > | > | asNumber)
| > | > | | asString) encodeAsUTF8
| > | > | |
| > | > | | And the followin expression correctly decodes the UTF8:
| > | > | |  
| > | > | |   (((Character codePoint: ('16r' , ('%FC' copyFrom: 2 to:
| 3))
| > | > | | asNumber) asString) encodeAsUTF8) decodeFromUTF8
| > | > | |
| > | > | | So the bug is that we go into an infinite loop trying to
| decode
| > | an
| > | > | | invalid UTF8 string.
| > | > | |
| > | > | |
| > | > | | Given this info, we should be able to fix the logic in
| FastCGI
| > | (and
| > | > | | elsewhere) to correctly handle the characters...
| > | > | |
| > | > | | Dale
| > | > | |
| > | > | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | > | |
| > | > | | | Norbert and Tobias,
| > | > | | |
| > | > | | | I've created Issue 109 for this issue:
| > | > | | | http://code.google.com/p/glassdb/issues/detail?id=109
| > | > | | |
| > | > | | | If either one of you happens to come up with a
| > | smalltalk-based
| > | > | | | conversion solution let me know, since the resolution
| will
| > | > | probably
| > | > | | | also apply to the primitive ...
| > | > | | |
| > | > | | | Dale
| > | > | | |
| > | > | | | ----- "Dale Henrichs" <[hidden email]> wrote:
| > | > | | |
| > | > | | | | Tobias,
| > | > | | | |
| > | > | | | | You've probably run into a bug in decodeUTF8. Norbert
| > | reported
| > | > | | this
| > | > | | | | problem here:
| > | > | | | |
| > | > | | | |  
| > | > | | | |
| > | > | | |
| > | > | |
| > | > |
| > |
| http://forum.world.st/Enabling-non-7bit-characters-in-URL-path-info-tp2019071p2019626.html
| > | > | | | |
| > | > | | | | The decode is done in a primitive, so we'd need to
| probably
| > | > | ship
| > | > | | a
| > | > | | | new
| > | > | | | | version of the server with a fix and that would most
| likely
| > | be
| > | > | | | | piggy-backed with the imminent:) 1.0-beta.8 release for
| > | 2.4.x
| > | > | | | |
| > | > | | | | Besides the primitive implementation of of decodeUTF8,
| there
| > | is
| > | > | a
| > | > | | | | Smalltalk implementation of a UTF8 decoder, but I get
| an
| > | error
| > | > | | from
| > | > | | | | that as well:
| > | > | | | |
| > | > | | | |   UTF8Encoding decode: ((Character codePoint: ('16r' ,
| > | ('%FC'
| > | > | | | | copyFrom: 2 to: 3)) asNumber) asString)
| > | > | | | |
| > | > | | | | So there is something fishy going on and at this point,
| I'm
| > | not
| > | > | | | sure.
| > | > | | | |
| > | > | | | | Dale
| > | > | | | |
| > | > | | | | ----- "Tobias Pape" <[hidden email]> wrote:
| > | > | | | |
| > | > | | | | | Hi,
| > | > | | | | |
| > | > | | | | | I'm experiencing strange behaviour of my Topaz gems
| > | > | | | | | (typical apache/fastcgi/seaside setup).
| > | > | | | | | When using my Squeaksource2, and entering an umlaut
| (äöü)
| > | > | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
| > | > | | | | | Does not output any information(i.e., logs).
| > | > | | | | |
| > | > | | | | | Anybody experienced this?
| > | > | | | | |
| > | > | | | | | So Long,
| > | > | | | | | -Tobias
Reply | Threaded
Open this post in threaded view
|

[solved] Re: Umlaut problems in 1.0-beta.3

Tobias Pape
Hi.

Just for the records.
The latest changes regarding issue 113 fixed my inital
problem. So, thanks, Dale, thank you very much.

So Long,
        -Tobias

my initial musings where:

>
> | > | > | | | |
> | > | > | | | | | Hi,
> | > | > | | | | |
> | > | > | | | | | I'm experiencing strange behaviour of my Topaz gems
> | > | > | | | | | (typical apache/fastcgi/seaside setup).
> | > | > | | | | | When using my Squeaksource2, and entering an umlaut
> | (äöü)
> | > | > | | | | | somewhere, the Queried gem goes nuts, 100% CPU usage.
> | > | > | | | | | Does not output any information(i.e., logs).
> | > | > | | | | |
> | > | > | | | | | Anybody experienced this?
> | > | > | | | | |
> | > | > | | | | | So Long,
> | > | > | | | | | -Tobias