Smalltalk › Usenets › Dolphin Smalltalk

UnicodeString problem

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Hideo Mizoguchi

UnicodeString problem

Hi,

I think UnicodeString>>asString has a bug. In its code (copied below), buf
is initialized as a multi-byte string of the length same as the unicode
string itself. That assumption is correct when only single byte characters
are used, i.e., the code does not work with such environment as Japanese and
Korean, where multi-byte characters can be indeed double-bytes.

Hideo Mizoguchi

asString
"Answer a byte string representation of the receiver."
| buf size |
size := self size.
buf := String new: size.
size == 0 ifTrue: [^buf]. "Avoid 'The Parameter is Incorrect' error"
(KernelLibrary default
wideCharToMultiByte: 0
dwFlags: 0
lpWideCharStr: self
cchWideChar: size
lpMultiByteStr: buf
cchMultiByte: size
lpDefaultChar: nil
lpUsedDefaultChar: nil) == 0
ifTrue: [KernelLibrary default systemError].
^buf

Blair McGlashan

Re: UnicodeString problem

"Hideo Mizoguchi" <[hidden email]> wrote in message
news:9a25sh$36dh1$[hidden email]...
>
> I think UnicodeString>>asString has a bug. In its code (copied below), buf
> is initialized as a multi-byte string of the length same as the unicode
> string itself. That assumption is correct when only single byte characters
> are used, i.e., the code does not work with such environment as Japanese
and
> Korean, where multi-byte characters can be indeed double-bytes.
> ... code snip...

Indeed that method is incorrect, and could perhaps be fixed as the attached.
However Dolphin does not currently support multi-byte characters in general,
so I'm not sure this will help much because it is widely assumed elsewhere
in the image and VM that characters require only a single byte to represent
a code point. In fact the Character class has 256 fixed instances. Full
support for multi-byte character sets will not be available until a future
release (unless someone knows how to work around the limitations in the
image, but I would be expect that to be pretty difficult). Sorry.

Regards

Blair

begin 666 UnicodeString_asString.st
M(55N:6-O9&53=')I;F<@;65T:&]D<T9O<B$-"@T*87-3=')I;F<-"@DB06YS
M=V5R(&$@8GET92!S=')I;F<@<F5P<F5S96YT871I;VX@;V8@=&AE(')E8V5I
M=F5R+B(-"@T*"7P@8G5F('-I>F4@8GET97,@? T*"7-I>F4@.CT@<V5L9B!S
M:7IE+@T*"6)U9B Z/2!3=')I;F<@;F5W.B!S:7IE*C(N#0H)<VEZ92 ]/2 P
M(&EF5')U93H@6UYB=69=+@DB079O:60@)U1H92!087)A;65T97(@:7,@26YC
M;W)R96-T)R!E<G)O<B(-"@EB>71E<R Z/2!+97)N96Q,:6)R87)Y(&1E9F%U
M;'0-"@D)=VED94-H87)4;TUU;'1I0GET93H@, T*"0ED=T9L86=S.B P#0H)
M"6QP5VED94-H87)3='(Z('-E;&8-"@D)8V-H5VED94-H87(Z('-I>F4-"@D)
M;'!-=6QT:4)Y=&53='(Z(&)U9@T*"0EC8VA-=6QT:4)Y=&4Z(&)U9B!S:7IE
M#0H)"6QP1&5F875L=$-H87(Z(&YI; T*"0EL<%5S961$969A=6QT0VAA<CH@
M;FEL+@T*"6)Y=&5S(#T](# @:694<G5E.B!;7DME<FYE;$QI8G)A<GD@9&5F
M875L="!S>7-T96U%<G)O<ETN#0H)8G5F(')E<VEZ93H@8GET97,N#0H)7F)U
$9B$@(0``
`
end

Takeya Suzuki

Re: UnicodeString problem

Hi,
I can't write English with the Japanese well.
Therefore, forgive it in poor English though I am sorry.

There was this problem from the time of Dolphin3.
It appeared when COM was used in the case of me.
Then, I modified two methods of the UnicodeString class.
One is asString.
One more is UnicodeString>>replaceFrom:to:with:startingAt:.
One byte of the Japanese ends is removed only with asString.
I modified it in the end of trial and error as follows.

^super replaceFrom: start+start-1 to: stop+stop with: aString startingAt:
startAt

There is no problem in the range of the use of me.
(COM access (especially, ADO and DTS)).

However, what in fact do you do?
Is it impossible to have had it answer already?
Teach if it is good.

Regards

Takeya Suzuki

Blair McGlashan

Re: UnicodeString problem

Takeya Suzuki

You wrote in message news:9a3pir$3eqdm$[hidden email]...

> Hi,
> I can't write English with the Japanese well.
> Therefore, forgive it in poor English though I am sorry.
>
> There was this problem from the time of Dolphin3.
> It appeared when COM was used in the case of me.
> Then, I modified two methods of the UnicodeString class.
> One is asString.
> One more is UnicodeString>>replaceFrom:to:with:startingAt:.
> One byte of the Japanese ends is removed only with asString.
> I modified it in the end of trial and error as follows.
>
> ^super replaceFrom: start+start-1 to: stop+stop with: aString startingAt:
> startAt

Thank you. I think there are other methods that UnicodeString may strictly
need to override if it were to be a full String implementation, however as
its class comment says it is a "minimal" class.

>
> There is no problem in the range of the use of me.
> (COM access (especially, ADO and DTS)).

If I understand you correctly, you found no problem with that fix in your
own use.

>
> However, what in fact do you do?
> Is it impossible to have had it answer already?
> Teach if it is good.

I'm sorry, but I cannot understand that. Can you try again and rephrase
slightly?

Regards

Blair

Takeya Suzuki

Re: UnicodeString problem

Blair,

Thank you.

I am sorry in poor English.

An UnicodeString is made [UnicodeString>>fromAddress:length:].

ex) The case of 'ab'

1)UnicodeString>>fromAddress:length:
| answer |
answer := self new: anInteger.
^answer replaceFrom: 1 to: anInteger
with: anAddress asExternalAddress startingAt: 1

2)UnicodeString>>replaceFrom:to:with:startingAt:
^super replaceFrom: start+start-1 to: stop+stop-1
with: aString startingAt: startAt

The value which is actually delivered is this.

^super replaceFrom: 1 to: 3 with: 'ab'(unicode) startingAt: 1
^super replaceFrom: 1 to: 4 with: 'ab'(unicode) startingAt: 1

1 to 3 ?
1 to 4 ?

Correct answer?

Therefore.

^super replaceFrom: start+start-1 to: stop+stop with: aString startingAt:
startAt

Regards

Takeya Suzuki

Blair McGlashan

Re: UnicodeString problem

Takeya Suzuki

You wrote in message news:9abomp$4d24k$[hidden email]...

>
> Thank you.
>
> I am sorry in poor English.
>
> An UnicodeString is made [UnicodeString>>fromAddress:length:].
>
> ex) The case of 'ab'
>
> 1)UnicodeString>>fromAddress:length:
> | answer |
> answer := self new: anInteger.
> ^answer replaceFrom: 1 to: anInteger
> with: anAddress asExternalAddress startingAt: 1
>
> 2)UnicodeString>>replaceFrom:to:with:startingAt:
> ^super replaceFrom: start+start-1 to: stop+stop-1
> with: aString startingAt: startAt
>
> The value which is actually delivered is this.
>
> ^super replaceFrom: 1 to: 3 with: 'ab'(unicode) startingAt: 1
> ^super replaceFrom: 1 to: 4 with: 'ab'(unicode) startingAt: 1
>
> 1 to 3 ?
> 1 to 4 ?
>
> Correct answer?
>
> Therefore.
>
> ^super replaceFrom: start+start-1 to: stop+stop with: aString startingAt:
> startAt

Now I understand thank you. There is an off-by-one error in
UnicodeString>>replaceFrom:to:with:startingAt:. Your fix is correct, thank
you. We will incorporate this in the new patch level.

Regards

Blair