Smalltalk › Usenets › Dolphin Smalltalk

How many nulls terminate a UnicodeString ?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Chris Uppal-3

How many nulls terminate a UnicodeString ?

Are UnicodeStrings implicitly null-terminated with one null-byte or two ? And
if the later, how does the VM know that it should use two ? Can one create
byte-objects with > 2 bytes of implicit null-padding.

TIA

-- chris

Marc Michael

Re: How many nulls terminate a UnicodeString ?

Chris Uppal wrotes:

> Are UnicodeStrings implicitly null-terminated with one null-byte or two ? And
> if the later, how does the VM know that it should use two ? Can one create
> byte-objects with > 2 bytes of implicit null-padding.

It depends on the encoding. If UTF-8 is used, the 0-byte has only 1
Byte, whereas in UTF-16 it will have 2 Bytes.

Take a look:

<http://en.wikipedia.org/wiki/Utf>
<http://en.wikipedia.org/wiki/Utf-8>
<http://en.wikipedia.org/wiki/UTF-16>

Chris Uppal-3

Re: How many nulls terminate a UnicodeString ?

Marc,

[me:]
> > Are UnicodeStrings implicitly null-terminated with one null-byte or two
> > ? And if the later, how does the VM know that it should use two ? Can
> > one create byte-objects with > 2 bytes of implicit null-padding.
>
> It depends on the encoding. If UTF-8 is used, the 0-byte has only 1
> Byte, whereas in UTF-16 it will have 2 Bytes.

Thanks for the reply. I'm sorry, I wasn't as clear as I should have been, I
was asking about the specific implementation in Dolphin rather than the UTF-x
standards themselves.

What I want to know about is the implicit null-termination which is provided by
the Dolphin VM for instances of class String and also for instances of class
UnicodeString. As you say, null-terminated UTF-16 /should/ have 2 null-bytes,
but I haven't yet found anything in the image which states that they do (though
I assume that's true), and more importantly (for me) I haven't found whatever
it is which arranges for that to be the case.

-- chris

Marc Michael

Re: How many nulls terminate a UnicodeString ?

Chris Uppal wrotes:

> What I want to know about is the implicit null-termination which is provided by
> the Dolphin VM for instances of class String and also for instances of class
> UnicodeString. As you say, null-terminated UTF-16 /should/ have 2 null-bytes,
> but I haven't yet found anything in the image which states that they do (though
> I assume that's true), and more importantly (for me) I haven't found whatever
> it is which arranges for that to be the case.

Ok, I haven't looked at it very deeply before, but look at the
following workspace:

'foo' asUnicodeString byteSize. 8
'foo' byteSize. 4
'foo' asUnicodeString size. 3
'foo' size. 3

If I understand the "Petzold" correctly, Windows internaly works with
16-Bit-Strings.

uliashkevich

Re: How many nulls terminate a UnicodeString ?

In reply to this post by Chris Uppal-3

Dolphin doesn't distinguish unicode string from ansi string. So all
strings ends with a single null-byte. The problem doesn't appear in
most cases, because the next byte to unicode string buffer is often
null as well.

To fix the problem I created WString class as a descendant of
UnicodeString and implemented the following constructor method:

new: aLength
^self basicNew: 2 * aLength + 1

Also I overriden String class>>unicodeClass to return WString. It
proved to be more convenient than overriding behavior of UnicodeString.
Also my library has such classes as WCharacter, WStringField, FileW,
FileStreamW, etc. for complete (not so, actually) unicode support. But
this is a matter of another topic.