[Glass] UTF8 character encoding translation between Pharo and Gemstone

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Glass] UTF8 character encoding translation between Pharo and Gemstone

Paul DeBruicker
I've got some unicode stings in my Pharo image that were taken as input
from a Seaside form.  I'm trying to use STON to get them into Gemstone.
 STON parses those strings as DoubleByteString and sends
#decodeFromUTF8.  DoubleByteString does not implement #decodeFromUTF8.
If I implement #decodeFromUTF8 as

DoubleByteString>>decodeFromUTF8
        ^self asByteArray decodeFromUTF8

I get a spurious null character (codePoint = 0) inserted by the
primitive and the string I'm attempting to decode doubles in size.  Then
when emitting that string as JSON for a webservice the null characters
are encoded to UTF8 (e.g. \u0000) and sent along.


So that makes me think my implementation of #decodeFromUTF8 is probably
not the way to do it.


Is there a better way that won't result in the extra null characters?

Should I do something to the string in Pharo before sending it along to
Gemstone?

Thanks

Paul
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] UTF8 character encoding translation between Pharo and Gemstone

Dale Henrichs-3
I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct?

I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ...

Dale



----- Original Message -----
| From: "Paul DeBruicker" <[hidden email]>
| To: [hidden email]
| Sent: Tuesday, October 1, 2013 12:55:12 PM
| Subject: [Glass] UTF8 character encoding translation between Pharo and Gemstone
|
| I've got some unicode stings in my Pharo image that were taken as
| input
| from a Seaside form.  I'm trying to use STON to get them into
| Gemstone.
|  STON parses those strings as DoubleByteString and sends
| #decodeFromUTF8.  DoubleByteString does not implement
| #decodeFromUTF8.
| If I implement #decodeFromUTF8 as
|
| DoubleByteString>>decodeFromUTF8
| ^self asByteArray decodeFromUTF8
|
| I get a spurious null character (codePoint = 0) inserted by the
| primitive and the string I'm attempting to decode doubles in size.
|  Then
| when emitting that string as JSON for a webservice the null
| characters
| are encoded to UTF8 (e.g. \u0000) and sent along.
|
|
| So that makes me think my implementation of #decodeFromUTF8 is
| probably
| not the way to do it.
|
|
| Is there a better way that won't result in the extra null characters?
|
| Should I do something to the string in Pharo before sending it along
| to
| Gemstone?
|
| Thanks
|
| Paul
| _______________________________________________
| Glass mailing list
| [hidden email]
| http://lists.gemtalksystems.com/mailman/listinfo/glass
|
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] UTF8 character encoding translation between Pharo and Gemstone

Philippe Marschall
On Wed, Oct 2, 2013 at 12:02 AM, Dale K. Henrichs
<[hidden email]> wrote:
> I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct?
>
> I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ...

Without knowing the details of the STON implementation I would say the
same. I would expect utf-8 decoding on a byte array or single byte
string, not a double byte string.

Cheers
Philippe
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] UTF8 character encoding translation between Pharo and Gemstone

Paul DeBruicker
Yeah this 'bug' was a case of me not understanding what I should've been doing.  

For my purposes the correct implementation of DoubleByteString>>#decodeFromUTF8 is

decodeFromUTF8
     ^self


After that change things worked fine. Thanks for the reminder to circle back to the list with the answer


Paul



On Oct 4, 2013, at 4:15 AM, Philippe Marschall <[hidden email]> wrote:

> On Wed, Oct 2, 2013 at 12:02 AM, Dale K. Henrichs
> <[hidden email]> wrote:
>> I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct?
>>
>> I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ...
>
> Without knowing the details of the STON implementation I would say the
> same. I would expect utf-8 decoding on a byte array or single byte
> string, not a double byte string.
>
> Cheers
> Philippe
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass