Issue 201 in glassdb: UTF8 encoding should produce ByteArray from String and vice versa

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 201 in glassdb: UTF8 encoding should produce ByteArray from String and vice versa

glassdb
Status: Accepted
Owner: [hidden email]
Labels: Type-Enhancement Priority-Medium GLASS-Server Version-1.0-beta.8

New issue 201 by [hidden email]: UTF8 encoding should produce ByteArray  
from String and vice versa
http://code.google.com/p/glassdb/issues/detail?id=201

see http://forum.world.st/primitive-468-td3069590.html for the initial  
discussion...

Reply | Threaded
Open this post in threaded view
|

Re: Issue 201 in glassdb: UTF8 encoding should produce ByteArray from String and vice versa

glassdb

Comment #1 on issue 201 by [hidden email]: UTF8 encoding should produce  
ByteArray from String and vice versa
http://code.google.com/p/glassdb/issues/detail?id=201

Philippe's take:

2010/12/2 Norbert Hartl <[hidden email]>:

> I'm asking myself at the moment if primitive 468 will also work for bytes  
> instead of characters. I've taken the method from String

> String>>_unicodePrim: opCode

> "opCode 0 - Encode receiver in UTF8 format. If all characters in receiver
>             are US-ASCII, answer the receiver. Result is a String.
>   opCode 1 - Decode receiver from UTF8 format into either a String or
>             QuadByteString depending upon the range of characters involved.
>   opCode 2 - Decode receiver from UTF8 format into either a String or
>             Double/QuadByteString depending upon the range of characters
>             involved."

> <primitive: 468>
> self _primitiveFailed: #_unicodePrim:

> As it is going native there isn't a real distinctions between an 8-bit  
> character and a byte, right?. But I want to have an estimation before I  
> try to bork my image :)

> The rationale behind it is again utf-8 handling. In gemstone everything  
> is a string regardless if it is encoded or not. To be honest I don't  
> think this is a good idea. To get a clear understanding of the issue it  
> should be possible to separate things. An easy to follow rule might be  
> that there is

> string -> encoder -> byte array
> byte array -> decoder -> string
... [show rest of quote]

I agree that this is the long term solution where we want to be but
for mostly history reasons that's not how it currently works in
Seaside and Pharo. The current situation in Seaside is:

string -> encoder -> string
string/byte array -> decoder -> string

It's on our todo list to change this, but not in the short term.

Cheers
Philippe