I've got some unicode stings in my Pharo image that were taken as input
from a Seaside form. I'm trying to use STON to get them into Gemstone. STON parses those strings as DoubleByteString and sends #decodeFromUTF8. DoubleByteString does not implement #decodeFromUTF8. If I implement #decodeFromUTF8 as DoubleByteString>>decodeFromUTF8 ^self asByteArray decodeFromUTF8 I get a spurious null character (codePoint = 0) inserted by the primitive and the string I'm attempting to decode doubles in size. Then when emitting that string as JSON for a webservice the null characters are encoded to UTF8 (e.g. \u0000) and sent along. So that makes me think my implementation of #decodeFromUTF8 is probably not the way to do it. Is there a better way that won't result in the extra null characters? Should I do something to the string in Pharo before sending it along to Gemstone? Thanks Paul _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct?
I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ... Dale ----- Original Message ----- | From: "Paul DeBruicker" <[hidden email]> | To: [hidden email] | Sent: Tuesday, October 1, 2013 12:55:12 PM | Subject: [Glass] UTF8 character encoding translation between Pharo and Gemstone | | I've got some unicode stings in my Pharo image that were taken as | input | from a Seaside form. I'm trying to use STON to get them into | Gemstone. | STON parses those strings as DoubleByteString and sends | #decodeFromUTF8. DoubleByteString does not implement | #decodeFromUTF8. | If I implement #decodeFromUTF8 as | | DoubleByteString>>decodeFromUTF8 | ^self asByteArray decodeFromUTF8 | | I get a spurious null character (codePoint = 0) inserted by the | primitive and the string I'm attempting to decode doubles in size. | Then | when emitting that string as JSON for a webservice the null | characters | are encoded to UTF8 (e.g. \u0000) and sent along. | | | So that makes me think my implementation of #decodeFromUTF8 is | probably | not the way to do it. | | | Is there a better way that won't result in the extra null characters? | | Should I do something to the string in Pharo before sending it along | to | Gemstone? | | Thanks | | Paul | _______________________________________________ | Glass mailing list | [hidden email] | http://lists.gemtalksystems.com/mailman/listinfo/glass | _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Wed, Oct 2, 2013 at 12:02 AM, Dale K. Henrichs
<[hidden email]> wrote: > I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct? > > I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ... Without knowing the details of the STON implementation I would say the same. I would expect utf-8 decoding on a byte array or single byte string, not a double byte string. Cheers Philippe _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Yeah this 'bug' was a case of me not understanding what I should've been doing.
For my purposes the correct implementation of DoubleByteString>>#decodeFromUTF8 is decodeFromUTF8 ^self After that change things worked fine. Thanks for the reminder to circle back to the list with the answer Paul On Oct 4, 2013, at 4:15 AM, Philippe Marschall <[hidden email]> wrote: > On Wed, Oct 2, 2013 at 12:02 AM, Dale K. Henrichs > <[hidden email]> wrote: >> I am curious how the Unicode strings are being encoded on the Pharo side? You are transferring the strings from Pharo to GemStone correct? >> >> I'm not sure what Pharo does with the WideStrings when creating the STON output, but it sounds wrong to be getting DoubleByeStrings from Pharo ... > > Without knowing the details of the STON implementation I would say the > same. I would expect utf-8 decoding on a byte array or single byte > string, not a double byte string. > > Cheers > Philippe Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |