Smalltalk › Squeak › Squeak - Dev

FFI wide-character type?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

13 messages Options

Ron Teitelbaum

FFI wide-character type?

All,

The code I was writing worked in C++ but didn't in FFI, it turns out that I
didn't quite understand the string L"MY".

It turns out that this is a wide-character string.

To specify a string of type wide-character (wchar_t[]), precede the opening
double quotation mark with the character L. For example:
wchar_t wszStr[] = L"1a1g";

Is it possible to send a wide-character, or to change the string so that it
looks like a wide character to FFI?

I've tried sending in WideString fromString: 'MY' but I get "can't coerce
arguments". I've tried changing the arguments to accept the wide string but
that didn't work either.

Thanks for your help!

Ron Teitelbaum

johnmci

Re: FFI wide-character type?

Likely many ways.

However the trick is converting from a WideString to or from another
format.
So for example if I have a UTF16 string I can make a WideString via

aWideString := utf16String convertFromWithConverter:
(UTF16TextConverter new).

or convert back with

utf16String := aWideString convertToWithConverter:
(UTF16TextConverter new useByteOrderMark: true).

I'll note that if you have a string (8 bits) in Squeak you must
decide what the bits mean, is that a
latin 1 string, a mac roman, or something else.

converter := Smalltalk platformName = 'Mac OS'
ifTrue: [MacRomanUnicodeTextConverter new]
ifFalse: [Latin1TextConverter new].
wideStringMangled := string convertFromWithConverter: converter.

On 2-Aug-06, at 8:57 PM, Ron Teitelbaum wrote:

> All,
>
> The code I was writing worked in C++ but didn't in FFI, it turns
> out that I
> didn't quite understand the string L"MY".
>
> It turns out that this is a wide-character string.
>
> To specify a string of type wide-character (wchar_t[]), precede the
> opening
> double quotation mark with the character L. For example:
> wchar_t wszStr[] = L"1a1g";
>
> Is it possible to send a wide-character, or to change the string so
> that it
> looks like a wide character to FFI?
>
> I've tried sending in WideString fromString: 'MY' but I get "can't
> coerce
> arguments". I've tried changing the arguments to accept the wide
> string but
> that didn't work either.
>
> Thanks for your help!
>
> Ron Teitelbaum
>
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Ron Teitelbaum

RE: FFI wide-character type?

John,

Thanks for the suggestion.

What I have is a smalltalk string. I need to send it to the external
function as a widestring. I tried to find a smalltalk string that I could
try sending as char* by trying this:

TextConverter allSubclasses collect: [:aConverter |
'MY' convertFromWithConverter: aConverter new
]

I don't get anything different then a regular smalltalk string.

So that didn't work.

Do you have any other suggestions?

> From: John M McIntosh
> Sent: Thursday, August 03, 2006 12:14 AM
>
> Likely many ways.
>
> However the trick is converting from a WideString to or from another
> format.
> So for example if I have a UTF16 string I can make a WideString via
>
> aWideString := utf16String convertFromWithConverter:
> (UTF16TextConverter new).
>
> or convert back with
>
> utf16String := aWideString convertToWithConverter:
> (UTF16TextConverter new useByteOrderMark: true).
>
> I'll note that if you have a string (8 bits) in Squeak you must
> decide what the bits mean, is that a
> latin 1 string, a mac roman, or something else.
>
>
> converter := Smalltalk platformName = 'Mac OS'
> ifTrue: [MacRomanUnicodeTextConverter new]
> ifFalse: [Latin1TextConverter new].
> wideStringMangled := string convertFromWithConverter: converter.
>
>
>
> On 2-Aug-06, at 8:57 PM, Ron Teitelbaum wrote:
>
> > All,
> >
> > The code I was writing worked in C++ but didn't in FFI, it turns
> > out that I
> > didn't quite understand the string L"MY".
> >
> > It turns out that this is a wide-character string.
> >
> > To specify a string of type wide-character (wchar_t[]), precede the
> > opening
> > double quotation mark with the character L. For example:
> > wchar_t wszStr[] = L"1a1g";
> >
> > Is it possible to send a wide-character, or to change the string so
> > that it
> > looks like a wide character to FFI?
> >
> > I've tried sending in WideString fromString: 'MY' but I get "can't
> > coerce
> > arguments". I've tried changing the arguments to accept the wide
> > string but
> > that didn't work either.
> >
> > Thanks for your help!
> >
> > Ron Teitelbaum
> >
> >

johnmci

Re: FFI wide-character type?

Ok, assuming you are using characters ascii 0-127 you could say

>> wideStringMangled := string convertFromWithConverter:
>> (Latin1TextConverter new).

If you are using characters 128-255 then you do need to decide what
character set they are...

On 3-Aug-06, at 3:38 PM, Ron Teitelbaum wrote:

> John,
>
> Thanks for the suggestion.
>
> What I have is a smalltalk string. I need to send it to the external
> function as a widestring. I tried to find a smalltalk string that
> I could
> try sending as char* by trying this:
>
> TextConverter allSubclasses collect: [:aConverter |
> 'MY' convertFromWithConverter: aConverter new
> ]
>
> I don't get anything different then a regular smalltalk string.
>
> So that didn't work.
>
> Do you have any other suggestions?

Ron Teitelbaum

RE: FFI wide-character type?

I found a solution that allows me to use ascii characters.

The problem I was having is that I had a squeak string 'MY' that I needed to
send to an FFI call which was expecting a wide character. In C++ I just
needed to do L"MY" to get it to work (MY represents the specific store that
I want opened in this case my personal certificates). When I tried the code
wideStringMangled := string convertFromWithConverter:
(Latin1TextConverter new). I received a string that was the same as the
regular 'MY' string in squeak. So that didn't help any. My attempts to
fake the system out didn't work either. I tried:

'L\"MY\"'
'L"MY"'
'LMLY'
"LMLYL\0"

But nothing satisfied the dll that I was passing in a wide string. I
finally found a define in the dll that allowed me to use an ascii string.
That was a long battle.

What is really needed is a patch to FFI for wide characters. At least this
problem is solved for now (unless you want to access a store that has a
Japanese name!).

Thanks for your help!!

Ron Teitelbaum

> From: John M McIntosh
> Sent: Thursday, August 03, 2006 9:46 PM
>
> Ok, assuming you are using characters ascii 0-127 you could say
>
>
> >> wideStringMangled := string convertFromWithConverter:
> >> (Latin1TextConverter new).
>
> If you are using characters 128-255 then you do need to decide what
> character set they are...
>
>
> On 3-Aug-06, at 3:38 PM, Ron Teitelbaum wrote:
>
> > John,
> >
> > Thanks for the suggestion.
> >
> > What I have is a smalltalk string. I need to send it to the external
> > function as a widestring. I tried to find a smalltalk string that
> > I could
> > try sending as char* by trying this:
> >
> > TextConverter allSubclasses collect: [:aConverter |
> > 'MY' convertFromWithConverter: aConverter new
> > ]
> >
> > I don't get anything different then a regular smalltalk string.
> >
> > So that didn't work.
> >
> > Do you have any other suggestions?
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>

johnmci

Re: FFI wide-character type?

Sigh, too much backward/forward compatibility here.

a) In Sophie most of the interesting testable conversion calls
convert from/to UTF16 for access to mac system api. That appears to
work and
of course in reviewing this I see the converter uses
nextPut:toStream: to deal with the BOM and stuff 16 bits as needed
for each character.

b) When I use foo convertToWithConverter:
(MacRomanUnicodeTextConverter new), where foo contains characters
that are unicode 32 after conversion I get a WideString

Ah, but when I say 'abc' convertToWithConverter:
(MacRomanUnicodeTextConverter new), why I get the ByteString 'abc'

I'll mutter things at this point.

Ok it seems that on the
WriteStream>>nextPut:
<primitive: 66>
((collection class == ByteString) and: [
anObject isCharacter and:[anObject isOctetCharacter not]]) ifTrue: [
collection _ (WideString from: collection).
^self nextPut: anObject.

Oh, how clever, if the primitive fails it looks to see if the
collection we're writing to is a ByteString, if so and it's a
character that is > 255 why lets convert everything to
a WideString. Then of course my testing using data which would result
in a a character mapped > 255 always produces a WideString.

Mmm ok, what if I make
String>>convertFromToWideStringWithConverter: converter

| readStream writeStream c |
readStream _ self readStream.
writeStream _ WideString new writeStream.
converter ifNil: [^ self].
[readStream atEnd] whileFalse: [
c _ converter nextFromStream: readStream.
c ifNotNil: [writeStream nextPut: c] ifNil: [^ writeStream contents]
].
^ writeStream contents

Fine, lets test, oops fails... I get a String back.

How curious, let see
writeStream contents invokes

WideString>>copyFrom: start to: stop

| n |
n _ super copyFrom: start to: stop.
n isOctetString ifTrue: [^ n asOctetString].
^ n.

Which invokes isOctetString
which cheerfully scans the entire string to see if any character
values are > 255 if not then
why it converts the WideString we have into a String and returns that
how clever but total breaks what I want to happen.

Mmm

Fine, I'm sure there is a reason for all this, but I'd rather keep my
WideString as a WideString not have it compressed to a String as side
effects of working on it.

So create a class UTF32String for lack of a better name.
Add this method
isOctetString
^false

Go back and change

convertFromToWideStringWithConverter:
to say
convertFromToUTF32StringWithConverter:
and alter one line
writeStream _ UTF32String new writeStream.

Then we get a UTF32 wide string, that stays as a wide string, and
sending asByteArray to my 'abcd' example gets us a 16 byte object.

I'll ask for comment. I

'm sure now I'll sit up and think of the side effects in Sophie about
*thinking* I've converted things to a WideString, yet it's silently
in most cases just a String.

PS MacRomanUnicodeTextConverter is a converter we added for Sophie
that does macroman to unicode, versus the illl named
MacRomanTextConverter which does
conversion from macroman to something else (latin1?)

On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote:

> I found a solution that allows me to use ascii characters.
>
> The problem I was having is that I had a squeak string 'MY' that I
> needed to
> send to an FFI call which was expecting a wide character. In C++ I
> just
> needed to do L"MY" to get it to work (MY represents the specific
> store that
> I want opened in this case my personal certificates). When I tried
> the code
> wideStringMangled := string convertFromWithConverter:
> (Latin1TextConverter new). I received a string that was the same
> as the
> regular 'MY' string in squeak. So that didn't help any. My
> attempts to
> fake the system out didn't work either. I tried:
>
> 'L\"MY\"'
> 'L"MY"'
> 'LMLY'
> "LMLYL\0"
>
> But nothing satisfied the dll that I was passing in a wide string. I
> finally found a define in the dll that allowed me to use an ascii
> string.
> That was a long battle.
>
> What is really needed is a patch to FFI for wide characters. At
> least this
> problem is solved for now (unless you want to access a store that
> has a
> Japanese name!).
>
> Thanks for your help!!
>
> Ron Teitelbaum

johnmci

Re: FFI wide-character type?

In reply to this post by Ron Teitelbaum

Oh sure after I spend an hour typing out a note about this.

'abc' asWideString asByteArray

will do the right thing assuming your 'abc' String doesn't need
conversion, even so

('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new))
asWideString asByteArray

will also do the right thing, although it's a bit more expensive as
it rummages about deciding if the intermediate parts are String or
WideString

On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote:

> I found a solution that allows me to use ascii characters.
--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Ron Teitelbaum

RE: FFI wide-character type?

John,

Thanks for looking some more. I had tried asWideString and received back a
wideString class, but was confused by the bytes on the inspector. Since the
bytes were the same as a regular string I figured it was doing something
wrong, like the other things I'd tried.

So now that I understand that asByteArray forms bytes differently then what
is in the instance inspector, I thought I'd try it.

What I get is 'MY' asWideString asByteArray = a ByteArray(0 0 0 77 0 0 0 89)

Trying that didn't work.

I received a message from Torsten (Thank you!) that said he had a conversion
to wide string for COM. I tried that method to see what it returned.
COMWideString fromString: 'MY' = a ByteArray(77 0 89 0). I tried sending in
the byteArray and BINGO! It worked.

So the question is, are there multiple ways to do conversion to wide string,
or is asWideString asByteArray doing it wrong? Maybe there are ordering and
byte count multiplatform things to consider? (I'm on winXP, if it wasn't
painfully obvious)

Ron Teitelbaum

> -----Original Message-----
> From: [hidden email] [mailto:squeak-dev-
> [hidden email]] On Behalf Of John M McIntosh
> Sent: Friday, August 04, 2006 3:23 AM
> To: The general-purpose Squeak developers list
> Subject: Re: FFI wide-character type?
>
> Oh sure after I spend an hour typing out a note about this.
>
> 'abc' asWideString asByteArray
>
> will do the right thing assuming your 'abc' String doesn't need
> conversion, even so
>
> ('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new))
> asWideString asByteArray
>
> will also do the right thing, although it's a bit more expensive as
> it rummages about deciding if the intermediate parts are String or
> WideString
>
>
> On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote:
>
> > I found a solution that allows me to use ascii characters.
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>
>

Nicolas Cellier-3

Re: FFI wide-character type?

Obviously, Squeak WideString have 32 bits characters while Windows (and other
OS i presume) have 16bits characters, and yes, OS byte ordering does matter.

You need a method asTwoByteArray defined like this in String

String>>asTwoByteArray
| twoB |
twoB := ByteArray new: self size*2.
1 to: self size do: [:i |
twoB unsignedShortAt: (2*i-1) put: (self at: i) asInteger].
^twoB

Nicolas

Le Vendredi 04 Août 2006 15:48, Ron Teitelbaum a écrit :

> John,
>
> Thanks for looking some more. I had tried asWideString and received back a
> wideString class, but was confused by the bytes on the inspector. Since
> the bytes were the same as a regular string I figured it was doing
> something wrong, like the other things I'd tried.
>
> So now that I understand that asByteArray forms bytes differently then what
> is in the instance inspector, I thought I'd try it.
>
> What I get is 'MY' asWideString asByteArray = a ByteArray(0 0 0 77 0 0 0
> 89)
>
> Trying that didn't work.
>
> I received a message from Torsten (Thank you!) that said he had a
> conversion to wide string for COM. I tried that method to see what it
> returned. COMWideString fromString: 'MY' = a ByteArray(77 0 89 0). I tried
> sending in the byteArray and BINGO! It worked.
>
> So the question is, are there multiple ways to do conversion to wide
> string, or is asWideString asByteArray doing it wrong? Maybe there are
> ordering and byte count multiplatform things to consider? (I'm on winXP,
> if it wasn't painfully obvious)
>
> Ron Teitelbaum
>
> > -----Original Message-----
> > From: [hidden email] [mailto:squeak-dev-
> > [hidden email]] On Behalf Of John M McIntosh
> > Sent: Friday, August 04, 2006 3:23 AM
> > To: The general-purpose Squeak developers list
> > Subject: Re: FFI wide-character type?
> >
> > Oh sure after I spend an hour typing out a note about this.
> >
> > 'abc' asWideString asByteArray
> >
> > will do the right thing assuming your 'abc' String doesn't need
> > conversion, even so
> >
> > ('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new))
> > asWideString asByteArray
> >
> > will also do the right thing, although it's a bit more expensive as
> > it rummages about deciding if the intermediate parts are String or
> > WideString
> >
> > On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote:
> > > I found a solution that allows me to use ascii characters.
> >
> > --
> > ========================================================================
> > ===
> > John M. McIntosh <[hidden email]>
> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> > ========================================================================
> > ===

johnmci

Re: FFI wide-character type?

Actually the correct question is what the target character set is for
the conversion since I was under the
assumption that one wanted UTF-32 as a result.

So COM is UTF16? Or some special Windows encoding?

UTF16 can be either Big E or Little E and you can supply a Byte Order
Mark that helps resolve what it is.
When building the UTF16 converter you can specify these details.

On 4-Aug-06, at 2:00 PM, nicolas cellier wrote:

> Obviously, Squeak WideString have 32 bits characters while Windows
> (and other
> OS i presume) have 16bits characters, and yes, OS byte ordering
> does matter.
--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Philippe Marschall

Re: FFI wide-character type?

Not to mention that the UTF16TextConverter also handles characters
outside the BMP.

2006/8/5, John M McIntosh <[hidden email]>:

> Actually the correct question is what the target character set is for
> the conversion since I was under the
> assumption that one wanted UTF-32 as a result.
>
> So COM is UTF16? Or some special Windows encoding?
>
> UTF16 can be either Big E or Little E and you can supply a Byte Order
> Mark that helps resolve what it is.
> When building the UTF16 converter you can specify these details.
>
> On 4-Aug-06, at 2:00 PM, nicolas cellier wrote:
>
> > Obviously, Squeak WideString have 32 bits characters while Windows
> > (and other
> > OS i presume) have 16bits characters, and yes, OS byte ordering
> > does matter.
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>
>
>

Ron Teitelbaum

RE: FFI wide-character type?

In reply to this post by johnmci

John,

I tried your suggestion to support the Microsoft wide string:

aStream := MultiByteBinaryOrTextStream on: String new encoding: 'utf-16'.
aStream converter useLittleEndian: true.
aStream nextPutAll: 'abcde'.
^aStream

And it appears to work great except that the implementation of next throws
it all away. I'm guessing it will work for ms wide strings -> squeak just
fine, but doesn't support squeak -> ms wide strings very well. I figured
I'd write an MSWideString class to support this properly. Do you have any
suggestions and/or am I missing something?

To answer your questions not all calls are this form of wideString, there
are some ascii equivalent methods, but the cryptography code has a lot of
ANS.1 formats, so I guess this is their answer. I haven't run into other
formats of wide string yet on Microsoft's api.

Ron Teitelbaum

> -----Original Message-----
> From: [hidden email] [mailto:squeak-dev-
> [hidden email]] On Behalf Of John M McIntosh
> Sent: Friday, August 04, 2006 7:52 PM
> To: The general-purpose Squeak developers list
> Subject: Re: FFI wide-character type?
>
> Actually the correct question is what the target character set is for
> the conversion since I was under the
> assumption that one wanted UTF-32 as a result.
>
> So COM is UTF16? Or some special Windows encoding?
>
> UTF16 can be either Big E or Little E and you can supply a Byte Order
> Mark that helps resolve what it is.
> When building the UTF16 converter you can specify these details.
>
> On 4-Aug-06, at 2:00 PM, nicolas cellier wrote:
>
> > Obviously, Squeak WideString have 32 bits characters while Windows
> > (and other
> > OS i presume) have 16bits characters, and yes, OS byte ordering
> > does matter.
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>

Yoshiki Ohshima

Re: FFI wide-character type?

Ron,

> aStream := MultiByteBinaryOrTextStream on: String new encoding: 'utf-16'.
> aStream converter useLittleEndian: true.
> aStream nextPutAll: 'abcde'.
> ^aStream
>
> And it appears to work great except that the implementation of next throws
> it all away. I'm guessing it will work for ms wide strings -> squeak just
> fine, but doesn't support squeak -> ms wide strings very well. I figured
> I'd write an MSWideString class to support this properly. Do you have any
> suggestions and/or am I missing something?

If you have a ByteString that contains utf-16 encoded chars,

ret := aString convertFromWithConverter: (UTF16TextConverter new useLittleEndian: true).

should give you a correct ByteString or WideString.

For that matter, the conversion to an external format can be:

ret := 'abcde' convertToConverter: (UTF16TextConverter new useLittleEndian: true).

-- Yoshiki