cross platform multibyte strings

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

cross platform multibyte strings

Nick
Hi,

I'm struggling with multibyte strings in Gemstone. Is there a way to construct one from a ByteArray?

I'm trying something like:


QuadByteString withAll: ( #(0 0 1 146 0 0 0 97 0 0 1 144 0 0 0 98) asByteArray) 

Which generates an error. I've probably missed something simple - but can't see it at the moment.

Thanks

Nick


Reply | Threaded
Open this post in threaded view
|

Re: cross platform multibyte strings

Dale Henrichs
Nick,

The trouble with using a ByteArray is that we don't know the encoding...

You can use ByteArray>>asString to convert a ByteArray to a string, but the assumption is that you've got single byte characters.

If you look at ByteArray>>asString the code starts with a String and loops over the ByteArray and then converts each byte into a Character and then adds the character to the collection.

This is probably the direction that you want to go, because GemStone will automatically mutate a String to a DoubleByteString if you add a double byte character to the collection. If you know your encoding, you can loop over the bytearray, creating the appropriate Character instance and you are off to the races.

Another alternative, is to create your strings from UTF8 encoded Strings ... then you can take the String and use the #decodeFromUTF8 to create your DoubleByteString ... the decoding is done in a primitive and is pretty efficient ...

Does this help?

Dale

----- Original Message -----
| From: "Nick Ager" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Friday, February 24, 2012 9:49:50 AM
| Subject: [GS/SS Beta] cross platform multibyte strings
|
|
| Hi,
|
| I'm struggling with multibyte strings in Gemstone. Is there a way to
| construct one from a ByteArray?
|
|
| I'm trying something like:
|
|
|
|
| QuadByteString withAll: ( #(0 0 1 146 0 0 0 97 0 0 1 144 0 0 0 98)
| asByteArray)
|
|
| Which generates an error. I've probably missed something simple - but
| can't see it at the moment.
|
|
| Thanks
|
|
| Nick
|
|
|
|
|
Reply | Threaded
Open this post in threaded view
|

Re: cross platform multibyte strings

Philippe Marschall
In reply to this post by Nick
2012/2/24 Nick Ager <[hidden email]>:

> Hi,
>
> I'm struggling with multibyte strings in Gemstone. Is there a way to
> construct one from a ByteArray?
>
> I'm trying something like:
>
>
> QuadByteString withAll: ( #(0 0 1 146 0 0 0 97 0 0 1 144 0 0 0 98)
> asByteArray)
>
> Which generates an error. I've probably missed something simple - but can't
> see it at the moment.

What problem are you trying to solve? Can you use the grease (GRCodec)?

Cheers
Philippe
Reply | Threaded
Open this post in threaded view
|

Re: cross platform multibyte strings

Nick

 
What problem are you trying to solve? Can you use the grease (GRCodec)?

The pier-code-exporter generates code from a Pier Kernel which when executed regenerates the kernel - providing a method for VC, upgrading, etc.

The issue was how to generate cross-platform (Pharo and Gemstone) code and tests which deal with multibyte strings. The difficult comes as Gemstone and Pharo's multibyte implementation is divergent - Pharo implements WideString which is derived from String whereas Gemstone's MultiByteString hierarchy has CharacterClass as a common base.

Anyway Grease's utf-8 codec has provided a suitable cross-platform multi-byte string representation. The passing test is now:

PRExporterCodeTests>>testMultiByteStringAsCode
| multibyteString |
multibyteString := String with: (Character value: 402) with: $a with: (Character value: 400) with: $b.
self assert: multibyteString asCode equals: '((GRCodec forEncoding: ''utf-8'') decode: (#(198 146 97 198 144 98) asByteArray))'

Thanks for the helpful hints

Nick