quadbytestring / doublebytestring -> normal string

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

quadbytestring / doublebytestring -> normal string

Johan Brichau-2
Hi,

I often get quadbytestring or doublebytestring instances running around in my image. They are values that come in through form submission.

Since a lot of methods in GS are not defined on these multibyte strings, I am getting quite some errors.

By converting the string to UTF8, I am able to get regular strings out of it. However, should I not get utf8 encoded strings in the Seaside callbacks? The webpage declares that charset anyway.

Any ideas anyone?

Johan
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

NorbertHartl

Am 06.09.2011 um 16:03 schrieb Johan Brichau:

> Hi,
>
> I often get quadbytestring or doublebytestring instances running around in my image. They are values that come in through form submission.
>
> Since a lot of methods in GS are not defined on these multibyte strings, I am getting quite some errors.
>
> By converting the string to UTF8, I am able to get regular strings out of it. However, should I not get utf8 encoded strings in the Seaside callbacks? The webpage declares that charset anyway.
>
> Any ideas anyone?
>
I'm not sure I understand your intention. A HTTP response declares a charset so a consumer can decode it properly. Inside the image there should be only characters and not some sort of encoded string. Strings that are encoded in utf-8 are to be considered a binary string where size, indexOf: etc. do not give the correct result. But I probably misunderstood your question completely.

Norbert

Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Philippe Marschall
In reply to this post by Johan Brichau-2
2011/9/6 Johan Brichau <[hidden email]>:
> Hi,
>
> I often get quadbytestring or doublebytestring instances running around in my image. They are values that come in through form submission.
>
> Since a lot of methods in GS are not defined on these multibyte strings, I am getting quite some errors.

Sounds like a bug in GS. Squeak/Pharo had similar problems (maybe still has).

> By converting the string to UTF8, I am able to get regular strings out of it. However, should I not get utf8 encoded strings in the Seaside callbacks? The webpage declares that charset anyway.

No. UTF-8 is how it's sent over the wire. It tells the browser how to
interpret the bytes. You're a level of abstraction higher, you work
with Characters/Strings.

You can run a server adapter with a GRNullCodec but then #size,
#indexOf:, #at: and friends don't really work anymore. The only thing
that continues to work is #,.

> Any ideas anyone?

Cheers
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Johan Brichau-2
In reply to this post by NorbertHartl
Hi Norbert,


On 06 Sep 2011, at 16:17, Norbert Hartl wrote:

By converting the string to UTF8, I am able to get regular strings out of it. However, should I not get utf8 encoded strings in the Seaside callbacks? The webpage declares that charset anyway.

I'm not sure I understand your intention. A HTTP response declares a charset so a consumer can decode it properly. Inside the image there should be only characters and not some sort of encoded string. Strings that are encoded in utf-8 are to be considered a binary string where size, indexOf: etc. do not give the correct result. But I probably misunderstood your question completely.

Converting the multibytestring to a string via utf8 encoding was just a workaround for stopping the crash. Everything comes out garbled afterwards because of the expected encoding that happens when sending the response.
The basic problem is that user input sometimes yields a multibytestring which does not have many of the seaside extension methods because they are defined on String.

For some methods, they have been moved to CharacterCollection. I guess we should move them all to CharacterCollection?

Johan
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Johan Brichau-2
In reply to this post by Philippe Marschall

On 06 Sep 2011, at 16:39, Philippe Marschall wrote:

> 2011/9/6 Johan Brichau <[hidden email]>:
>> Hi,
>>
>> I often get quadbytestring or doublebytestring instances running around in my image. They are values that come in through form submission.
>>
>> Since a lot of methods in GS are not defined on these multibyte strings, I am getting quite some errors.
>
> Sounds like a bug in GS. Squeak/Pharo had similar problems (maybe still has).

I'm not sure. I'm not an encoding hero (to say the least :-).

In Pharo, I see that WideString is a subclass of String.
In GS, MultiByteString is not a subclass of String.

Would it be correct to assume that the Gemstone port of Seaside should move all extension methods on String to CharacterCollection? I guess Seaside callbacks in Pharo can get WideString instances too?

Tell me if I'm telling rubish… ;-)

Johan
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Nick
Hi,
 
I'm not sure. I'm not an encoding hero (to say the least :-).

In Pharo, I see that WideString is a subclass of String.
In GS, MultiByteString is not a subclass of String.

Would it be correct to assume that the Gemstone port of Seaside should move all extension methods on String to CharacterCollection? I guess Seaside callbacks in Pharo can get WideString instances too?

Tell me if I'm telling rubish… ;-)

I suffer from exactly the same problem,  see for example: http://code.google.com/p/glassdb/issues/detail?id=227

I guess some extension methods have moved, but not the ones you are relying on. What methods are missing? Is there a well understood standard protocol for string?

Nick
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Philippe Marschall
In reply to this post by Johan Brichau-2
2011/9/8 Johan Brichau <[hidden email]>:

>
> On 06 Sep 2011, at 16:39, Philippe Marschall wrote:
>
>> 2011/9/6 Johan Brichau <[hidden email]>:
>>> Hi,
>>>
>>> I often get quadbytestring or doublebytestring instances running around in my image. They are values that come in through form submission.
>>>
>>> Since a lot of methods in GS are not defined on these multibyte strings, I am getting quite some errors.
>>
>> Sounds like a bug in GS. Squeak/Pharo had similar problems (maybe still has).
>
> I'm not sure. I'm not an encoding hero (to say the least :-).
>
> In Pharo, I see that WideString is a subclass of String.
> In GS, MultiByteString is not a subclass of String.

I see.

> Would it be correct to assume that the Gemstone port of Seaside should move all extension methods on String to CharacterCollection?

Seems like it. We should certainly also have tests for this.

> I guess Seaside callbacks in Pharo can get WideString instances too?

Yes, you either get a ByteString or WideString.

Cheers
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Dale Henrichs
In reply to this post by Johan Brichau-2
Johan,

The short term solution is for you to eliminate your pain by moving or reimplementing the missing methods up in CharacterCollection ... the longer term solution is to take all of the tests for String and recast them to test against String and the multi-byte string classes ...

I am in the final stages of the Seaside 3.0.6 work (ported to GemStone 2.4 and 3.0) and plan to start moving towards the GLASS 1.0-beta.8.7 release so now will be a good time for me help get this straightened out...

Dale

----- Original Message -----
| From: "Johan Brichau" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Thursday, September 8, 2011 1:56:08 AM
| Subject: Re: [GS/SS Beta] quadbytestring / doublebytestring -> normal string
|
|
| On 06 Sep 2011, at 16:39, Philippe Marschall wrote:
|
| > 2011/9/6 Johan Brichau <[hidden email]>:
| >> Hi,
| >>
| >> I often get quadbytestring or doublebytestring instances running
| >> around in my image. They are values that come in through form
| >> submission.
| >>
| >> Since a lot of methods in GS are not defined on these multibyte
| >> strings, I am getting quite some errors.
| >
| > Sounds like a bug in GS. Squeak/Pharo had similar problems (maybe
| > still has).
|
| I'm not sure. I'm not an encoding hero (to say the least :-).
|
| In Pharo, I see that WideString is a subclass of String.
| In GS, MultiByteString is not a subclass of String.
|
| Would it be correct to assume that the Gemstone port of Seaside
| should move all extension methods on String to CharacterCollection?
| I guess Seaside callbacks in Pharo can get WideString instances too?
|
| Tell me if I'm telling rubish… ;-)
|
| Johan
Reply | Threaded
Open this post in threaded view
|

Re: quadbytestring / doublebytestring -> normal string

Johan Brichau-2
Ok! I will take a look at that asap!

On 08 Sep 2011, at 18:51, Dale Henrichs wrote:

> Johan,
>
> The short term solution is for you to eliminate your pain by moving or reimplementing the missing methods up in CharacterCollection ... the longer term solution is to take all of the tests for String and recast them to test against String and the multi-byte string classes ...
>
> I am in the final stages of the Seaside 3.0.6 work (ported to GemStone 2.4 and 3.0) and plan to start moving towards the GLASS 1.0-beta.8.7 release so now will be a good time for me help get this straightened out...
>
> Dale
>
> ----- Original Message -----
> | From: "Johan Brichau" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Thursday, September 8, 2011 1:56:08 AM
> | Subject: Re: [GS/SS Beta] quadbytestring / doublebytestring -> normal string
> |
> |
> | On 06 Sep 2011, at 16:39, Philippe Marschall wrote:
> |
> | > 2011/9/6 Johan Brichau <[hidden email]>:
> | >> Hi,
> | >>
> | >> I often get quadbytestring or doublebytestring instances running
> | >> around in my image. They are values that come in through form
> | >> submission.
> | >>
> | >> Since a lot of methods in GS are not defined on these multibyte
> | >> strings, I am getting quite some errors.
> | >
> | > Sounds like a bug in GS. Squeak/Pharo had similar problems (maybe
> | > still has).
> |
> | I'm not sure. I'm not an encoding hero (to say the least :-).
> |
> | In Pharo, I see that WideString is a subclass of String.
> | In GS, MultiByteString is not a subclass of String.
> |
> | Would it be correct to assume that the Gemstone port of Seaside
> | should move all extension methods on String to CharacterCollection?
> | I guess Seaside callbacks in Pharo can get WideString instances too?
> |
> | Tell me if I'm telling rubish… ;-)
> |
> | Johan