Feature request: automatically coerce strings to wide string

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature request: automatically coerce strings to wide string

Richard Sargent
Administrator
If attempting to insert a double-byte character into a String instance, it would be nice if the receiver were automatically converted to a double-byte variant.
(The same would be true for DBStrings, Unicode7, and Unicode16 strings if > 16 bit Characters were supported and if Unicode were supported.)

Example:
'That costs ' , (String with: (Character codePoint: 16r20AC)) , '47'.    "Euro symbol"

Example implementation (String>>#at:put: and Character>>#coerceString:using:):

at: anInteger put: aCharacter

   
<primitive: VMprStringBasicAtPut>
   
self primitiveErrorCode = PrimErrReadOnly ifTrue: [
       
^self basicAt: anInteger put: aCharacter
   
].
   
(self primitiveErrorCode = PrimErrValueOutOfRange and: [self primitiveBadArgumentNumber = 2]) ifTrue: [
       
^aCharacter coerceString: self with: [:new | new at: anInteger put: aCharacter].
   
].
   
^self primitiveFailed


coerceString: aString with: aOneArgBlock
   
"If we couldn't insert the character, it must be too wide for the string's class.
     Convert to a wide string and try again."


   
| new |
   
new := aString asTwoByteString.
    aOneArgBlock value
: new.
    aString become
: new.
   
^aString




--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: automatically coerce strings to wide string

John O'Keefe-3
Richard -

This sort of coercion is a required part of our upcoming Unicode support.

John
On Friday, September 26, 2014 3:11:29 PM UTC-4, Richard Sargent wrote:
If attempting to insert a double-byte character into a String instance, it would be nice if the receiver were automatically converted to a double-byte variant.
(The same would be true for DBStrings, Unicode7, and Unicode16 strings if > 16 bit Characters were supported and if Unicode were supported.)

Example:
'That costs ' , (String with: (Character codePoint: 16r20AC)) , '47'.    "Euro symbol"

Example implementation (String>>#at:put: and Character>>#coerceString:using:):

at: anInteger put: aCharacter

   
<primitive: VMprStringBasicAtPut>
   
self primitiveErrorCode = PrimErrReadOnly ifTrue: [
       
^self basicAt: anInteger put: aCharacter
   
].
   
(self primitiveErrorCode = PrimErrValueOutOfRange and: [self primitiveBadArgumentNumber = 2]) ifTrue: [
       
^aCharacter coerceString: self with: [:new | new at: anInteger put: aCharacter].
   
].
   
^self primitiveFailed


coerceString: aString with: aOneArgBlock
   
"If we couldn't insert the character, it must be too wide for the string's class.
     Convert to a wide string and try again."


   
| new |
   
new := aString asTwoByteString.
    aOneArgBlock value
: new.
    aString become
: new.
   
^aString




--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: automatically coerce strings to wide string

Richard Sargent
Administrator
On Wednesday, October 1, 2014 10:19:25 AM UTC-7, John O'Keefe wrote:
This sort of coercion is a required part of our upcoming Unicode support.


Excellent, John!
Can you comment on whether Unicode support is targetted for your 2015 release?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: automatically coerce strings to wide string

John O'Keefe-3
Richard -

Yes, Unicode is targeted for the next release after 8.6.1.

Does GemStone use DBString as a surrogate (alias) for UTF-16 in the VA Smalltalk environment or do you have a separate UTF-16 class? Is your preferred internal representation UTF-8 or UTF-16? I know there are lots of good arguments for both (for a start, UTF-8 is more natural for UNIX; UTF-16 is more natural for Windows).

John

On Wednesday, October 1, 2014 2:30:30 PM UTC-4, Richard Sargent wrote:
On Wednesday, October 1, 2014 10:19:25 AM UTC-7, John O'Keefe wrote:
This sort of coercion is a required part of our upcoming Unicode support.


Excellent, John!
Can you comment on whether Unicode support is targetted for your 2015 release?

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: automatically coerce strings to wide string

Richard Sargent
Administrator
On Tuesday, October 28, 2014 12:52:31 PM UTC-7, John O'Keefe wrote:
Yes, Unicode is targeted for the next release after 8.6.1.

I'm looking forward to that!


Does GemStone use DBString as a surrogate (alias) for UTF-16 in the VA Smalltalk environment or do you have a separate UTF-16 class? Is your preferred internal representation UTF-8 or UTF-16? I know there are lots of good arguments for both (for a start, UTF-8 is more natural for UNIX; UTF-16 is more natural for Windows).

I was under the impression that UTF-n were representations for external file systems (i.e., the transport of multi-byte characters) rather than internal representations. For example, our server has string classes for 1, 2, and 4 byte sized characters. We have parallel Unicode7, Unicode16, and Unicode32 classes. We read and write files using straight byte representations (not good for non-ASCII, in this day and age) or UTF-8 encoding.

So, I have been representing our server's DoubleByteString via DBString in GBS. It would be nice to see a 16-bit Unicode class which would display the correct Unicode characters, instead of the characters corresponding to the individual bytes as DBString does. And of course, it would be welcome to see support for character code points up to the current Unicode limit of 10FFFF.

[I have also defined a 32-bit string, via "EsString variableLongSubclass: #GbxNonFunctionalQBString", which holds 32-bit values but doesn't provide any string operations other than what are inherited.]


I don't know how well that answers your question.
Richard

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.