All,
The code I was writing worked in C++ but didn't in FFI, it turns out that I didn't quite understand the string L"MY". It turns out that this is a wide-character string. To specify a string of type wide-character (wchar_t[]), precede the opening double quotation mark with the character L. For example: wchar_t wszStr[] = L"1a1g"; Is it possible to send a wide-character, or to change the string so that it looks like a wide character to FFI? I've tried sending in WideString fromString: 'MY' but I get "can't coerce arguments". I've tried changing the arguments to accept the wide string but that didn't work either. Thanks for your help! Ron Teitelbaum |
Likely many ways.
However the trick is converting from a WideString to or from another format. So for example if I have a UTF16 string I can make a WideString via aWideString := utf16String convertFromWithConverter: (UTF16TextConverter new). or convert back with utf16String := aWideString convertToWithConverter: (UTF16TextConverter new useByteOrderMark: true). I'll note that if you have a string (8 bits) in Squeak you must decide what the bits mean, is that a latin 1 string, a mac roman, or something else. converter := Smalltalk platformName = 'Mac OS' ifTrue: [MacRomanUnicodeTextConverter new] ifFalse: [Latin1TextConverter new]. wideStringMangled := string convertFromWithConverter: converter. On 2-Aug-06, at 8:57 PM, Ron Teitelbaum wrote: > All, > > The code I was writing worked in C++ but didn't in FFI, it turns > out that I > didn't quite understand the string L"MY". > > It turns out that this is a wide-character string. > > To specify a string of type wide-character (wchar_t[]), precede the > opening > double quotation mark with the character L. For example: > wchar_t wszStr[] = L"1a1g"; > > Is it possible to send a wide-character, or to change the string so > that it > looks like a wide character to FFI? > > I've tried sending in WideString fromString: 'MY' but I get "can't > coerce > arguments". I've tried changing the arguments to accept the wide > string but > that didn't work either. > > Thanks for your help! > > Ron Teitelbaum > > > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
John,
Thanks for the suggestion. What I have is a smalltalk string. I need to send it to the external function as a widestring. I tried to find a smalltalk string that I could try sending as char* by trying this: TextConverter allSubclasses collect: [:aConverter | 'MY' convertFromWithConverter: aConverter new ] I don't get anything different then a regular smalltalk string. So that didn't work. Do you have any other suggestions? > From: John M McIntosh > Sent: Thursday, August 03, 2006 12:14 AM > > Likely many ways. > > However the trick is converting from a WideString to or from another > format. > So for example if I have a UTF16 string I can make a WideString via > > aWideString := utf16String convertFromWithConverter: > (UTF16TextConverter new). > > or convert back with > > utf16String := aWideString convertToWithConverter: > (UTF16TextConverter new useByteOrderMark: true). > > I'll note that if you have a string (8 bits) in Squeak you must > decide what the bits mean, is that a > latin 1 string, a mac roman, or something else. > > > converter := Smalltalk platformName = 'Mac OS' > ifTrue: [MacRomanUnicodeTextConverter new] > ifFalse: [Latin1TextConverter new]. > wideStringMangled := string convertFromWithConverter: converter. > > > > On 2-Aug-06, at 8:57 PM, Ron Teitelbaum wrote: > > > All, > > > > The code I was writing worked in C++ but didn't in FFI, it turns > > out that I > > didn't quite understand the string L"MY". > > > > It turns out that this is a wide-character string. > > > > To specify a string of type wide-character (wchar_t[]), precede the > > opening > > double quotation mark with the character L. For example: > > wchar_t wszStr[] = L"1a1g"; > > > > Is it possible to send a wide-character, or to change the string so > > that it > > looks like a wide character to FFI? > > > > I've tried sending in WideString fromString: 'MY' but I get "can't > > coerce > > arguments". I've tried changing the arguments to accept the wide > > string but > > that didn't work either. > > > > Thanks for your help! > > > > Ron Teitelbaum > > > > |
Ok, assuming you are using characters ascii 0-127 you could say
>> wideStringMangled := string convertFromWithConverter: >> (Latin1TextConverter new). If you are using characters 128-255 then you do need to decide what character set they are... On 3-Aug-06, at 3:38 PM, Ron Teitelbaum wrote: > John, > > Thanks for the suggestion. > > What I have is a smalltalk string. I need to send it to the external > function as a widestring. I tried to find a smalltalk string that > I could > try sending as char* by trying this: > > TextConverter allSubclasses collect: [:aConverter | > 'MY' convertFromWithConverter: aConverter new > ] > > I don't get anything different then a regular smalltalk string. > > So that didn't work. > > Do you have any other suggestions? ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
I found a solution that allows me to use ascii characters.
The problem I was having is that I had a squeak string 'MY' that I needed to send to an FFI call which was expecting a wide character. In C++ I just needed to do L"MY" to get it to work (MY represents the specific store that I want opened in this case my personal certificates). When I tried the code wideStringMangled := string convertFromWithConverter: (Latin1TextConverter new). I received a string that was the same as the regular 'MY' string in squeak. So that didn't help any. My attempts to fake the system out didn't work either. I tried: 'L\"MY\"' 'L"MY"' 'LMLY' "LMLYL\0" But nothing satisfied the dll that I was passing in a wide string. I finally found a define in the dll that allowed me to use an ascii string. That was a long battle. What is really needed is a patch to FFI for wide characters. At least this problem is solved for now (unless you want to access a store that has a Japanese name!). Thanks for your help!! Ron Teitelbaum > From: John M McIntosh > Sent: Thursday, August 03, 2006 9:46 PM > > Ok, assuming you are using characters ascii 0-127 you could say > > > >> wideStringMangled := string convertFromWithConverter: > >> (Latin1TextConverter new). > > If you are using characters 128-255 then you do need to decide what > character set they are... > > > On 3-Aug-06, at 3:38 PM, Ron Teitelbaum wrote: > > > John, > > > > Thanks for the suggestion. > > > > What I have is a smalltalk string. I need to send it to the external > > function as a widestring. I tried to find a smalltalk string that > > I could > > try sending as char* by trying this: > > > > TextConverter allSubclasses collect: [:aConverter | > > 'MY' convertFromWithConverter: aConverter new > > ] > > > > I don't get anything different then a regular smalltalk string. > > > > So that didn't work. > > > > Do you have any other suggestions? > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > |
Sigh, too much backward/forward compatibility here.
a) In Sophie most of the interesting testable conversion calls convert from/to UTF16 for access to mac system api. That appears to work and of course in reviewing this I see the converter uses nextPut:toStream: to deal with the BOM and stuff 16 bits as needed for each character. b) When I use foo convertToWithConverter: (MacRomanUnicodeTextConverter new), where foo contains characters that are unicode 32 after conversion I get a WideString Ah, but when I say 'abc' convertToWithConverter: (MacRomanUnicodeTextConverter new), why I get the ByteString 'abc' I'll mutter things at this point. Ok it seems that on the WriteStream>>nextPut: <primitive: 66> ((collection class == ByteString) and: [ anObject isCharacter and:[anObject isOctetCharacter not]]) ifTrue: [ collection _ (WideString from: collection). ^self nextPut: anObject. Oh, how clever, if the primitive fails it looks to see if the collection we're writing to is a ByteString, if so and it's a character that is > 255 why lets convert everything to a WideString. Then of course my testing using data which would result in a a character mapped > 255 always produces a WideString. Mmm ok, what if I make String>>convertFromToWideStringWithConverter: converter | readStream writeStream c | readStream _ self readStream. writeStream _ WideString new writeStream. converter ifNil: [^ self]. [readStream atEnd] whileFalse: [ c _ converter nextFromStream: readStream. c ifNotNil: [writeStream nextPut: c] ifNil: [^ writeStream contents] ]. ^ writeStream contents Fine, lets test, oops fails... I get a String back. How curious, let see writeStream contents invokes WideString>>copyFrom: start to: stop | n | n _ super copyFrom: start to: stop. n isOctetString ifTrue: [^ n asOctetString]. ^ n. Which invokes isOctetString which cheerfully scans the entire string to see if any character values are > 255 if not then why it converts the WideString we have into a String and returns that how clever but total breaks what I want to happen. Mmm Fine, I'm sure there is a reason for all this, but I'd rather keep my WideString as a WideString not have it compressed to a String as side effects of working on it. So create a class UTF32String for lack of a better name. Add this method isOctetString ^false Go back and change convertFromToWideStringWithConverter: to say convertFromToUTF32StringWithConverter: and alter one line writeStream _ UTF32String new writeStream. Then we get a UTF32 wide string, that stays as a wide string, and sending asByteArray to my 'abcd' example gets us a 16 byte object. I'll ask for comment. I 'm sure now I'll sit up and think of the side effects in Sophie about *thinking* I've converted things to a WideString, yet it's silently in most cases just a String. PS MacRomanUnicodeTextConverter is a converter we added for Sophie that does macroman to unicode, versus the illl named MacRomanTextConverter which does conversion from macroman to something else (latin1?) On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote: > I found a solution that allows me to use ascii characters. > > The problem I was having is that I had a squeak string 'MY' that I > needed to > send to an FFI call which was expecting a wide character. In C++ I > just > needed to do L"MY" to get it to work (MY represents the specific > store that > I want opened in this case my personal certificates). When I tried > the code > wideStringMangled := string convertFromWithConverter: > (Latin1TextConverter new). I received a string that was the same > as the > regular 'MY' string in squeak. So that didn't help any. My > attempts to > fake the system out didn't work either. I tried: > > 'L\"MY\"' > 'L"MY"' > 'LMLY' > "LMLYL\0" > > But nothing satisfied the dll that I was passing in a wide string. I > finally found a define in the dll that allowed me to use an ascii > string. > That was a long battle. > > What is really needed is a patch to FFI for wide characters. At > least this > problem is solved for now (unless you want to access a store that > has a > Japanese name!). > > Thanks for your help!! > > Ron Teitelbaum ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Ron Teitelbaum
Oh sure after I spend an hour typing out a note about this.
'abc' asWideString asByteArray will do the right thing assuming your 'abc' String doesn't need conversion, even so ('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new)) asWideString asByteArray will also do the right thing, although it's a bit more expensive as it rummages about deciding if the intermediate parts are String or WideString On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote: > I found a solution that allows me to use ascii characters. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
John,
Thanks for looking some more. I had tried asWideString and received back a wideString class, but was confused by the bytes on the inspector. Since the bytes were the same as a regular string I figured it was doing something wrong, like the other things I'd tried. So now that I understand that asByteArray forms bytes differently then what is in the instance inspector, I thought I'd try it. What I get is 'MY' asWideString asByteArray = a ByteArray(0 0 0 77 0 0 0 89) Trying that didn't work. I received a message from Torsten (Thank you!) that said he had a conversion to wide string for COM. I tried that method to see what it returned. COMWideString fromString: 'MY' = a ByteArray(77 0 89 0). I tried sending in the byteArray and BINGO! It worked. So the question is, are there multiple ways to do conversion to wide string, or is asWideString asByteArray doing it wrong? Maybe there are ordering and byte count multiplatform things to consider? (I'm on winXP, if it wasn't painfully obvious) Ron Teitelbaum > -----Original Message----- > From: [hidden email] [mailto:squeak-dev- > [hidden email]] On Behalf Of John M McIntosh > Sent: Friday, August 04, 2006 3:23 AM > To: The general-purpose Squeak developers list > Subject: Re: FFI wide-character type? > > Oh sure after I spend an hour typing out a note about this. > > 'abc' asWideString asByteArray > > will do the right thing assuming your 'abc' String doesn't need > conversion, even so > > ('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new)) > asWideString asByteArray > > will also do the right thing, although it's a bit more expensive as > it rummages about deciding if the intermediate parts are String or > WideString > > > On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote: > > > I found a solution that allows me to use ascii characters. > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > |
Obviously, Squeak WideString have 32 bits characters while Windows (and other
OS i presume) have 16bits characters, and yes, OS byte ordering does matter. You need a method asTwoByteArray defined like this in String String>>asTwoByteArray | twoB | twoB := ByteArray new: self size*2. 1 to: self size do: [:i | twoB unsignedShortAt: (2*i-1) put: (self at: i) asInteger]. ^twoB Nicolas Le Vendredi 04 Août 2006 15:48, Ron Teitelbaum a écrit : > John, > > Thanks for looking some more. I had tried asWideString and received back a > wideString class, but was confused by the bytes on the inspector. Since > the bytes were the same as a regular string I figured it was doing > something wrong, like the other things I'd tried. > > So now that I understand that asByteArray forms bytes differently then what > is in the instance inspector, I thought I'd try it. > > What I get is 'MY' asWideString asByteArray = a ByteArray(0 0 0 77 0 0 0 > 89) > > Trying that didn't work. > > I received a message from Torsten (Thank you!) that said he had a > conversion to wide string for COM. I tried that method to see what it > returned. COMWideString fromString: 'MY' = a ByteArray(77 0 89 0). I tried > sending in the byteArray and BINGO! It worked. > > So the question is, are there multiple ways to do conversion to wide > string, or is asWideString asByteArray doing it wrong? Maybe there are > ordering and byte count multiplatform things to consider? (I'm on winXP, > if it wasn't painfully obvious) > > Ron Teitelbaum > > > -----Original Message----- > > From: [hidden email] [mailto:squeak-dev- > > [hidden email]] On Behalf Of John M McIntosh > > Sent: Friday, August 04, 2006 3:23 AM > > To: The general-purpose Squeak developers list > > Subject: Re: FFI wide-character type? > > > > Oh sure after I spend an hour typing out a note about this. > > > > 'abc' asWideString asByteArray > > > > will do the right thing assuming your 'abc' String doesn't need > > conversion, even so > > > > ('abcd' convertFromWithConverter: (MacRomanUnicodeTextConverter new)) > > asWideString asByteArray > > > > will also do the right thing, although it's a bit more expensive as > > it rummages about deciding if the intermediate parts are String or > > WideString > > > > On 3-Aug-06, at 7:21 PM, Ron Teitelbaum wrote: > > > I found a solution that allows me to use ascii characters. > > > > -- > > ======================================================================== > > === > > John M. McIntosh <[hidden email]> > > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > > ======================================================================== > > === |
Actually the correct question is what the target character set is for
the conversion since I was under the assumption that one wanted UTF-32 as a result. So COM is UTF16? Or some special Windows encoding? UTF16 can be either Big E or Little E and you can supply a Byte Order Mark that helps resolve what it is. When building the UTF16 converter you can specify these details. On 4-Aug-06, at 2:00 PM, nicolas cellier wrote: > Obviously, Squeak WideString have 32 bits characters while Windows > (and other > OS i presume) have 16bits characters, and yes, OS byte ordering > does matter. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Not to mention that the UTF16TextConverter also handles characters
outside the BMP. 2006/8/5, John M McIntosh <[hidden email]>: > Actually the correct question is what the target character set is for > the conversion since I was under the > assumption that one wanted UTF-32 as a result. > > So COM is UTF16? Or some special Windows encoding? > > UTF16 can be either Big E or Little E and you can supply a Byte Order > Mark that helps resolve what it is. > When building the UTF16 converter you can specify these details. > > On 4-Aug-06, at 2:00 PM, nicolas cellier wrote: > > > Obviously, Squeak WideString have 32 bits characters while Windows > > (and other > > OS i presume) have 16bits characters, and yes, OS byte ordering > > does matter. > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > > |
In reply to this post by johnmci
John,
I tried your suggestion to support the Microsoft wide string: aStream := MultiByteBinaryOrTextStream on: String new encoding: 'utf-16'. aStream converter useLittleEndian: true. aStream nextPutAll: 'abcde'. ^aStream And it appears to work great except that the implementation of next throws it all away. I'm guessing it will work for ms wide strings -> squeak just fine, but doesn't support squeak -> ms wide strings very well. I figured I'd write an MSWideString class to support this properly. Do you have any suggestions and/or am I missing something? To answer your questions not all calls are this form of wideString, there are some ascii equivalent methods, but the cryptography code has a lot of ANS.1 formats, so I guess this is their answer. I haven't run into other formats of wide string yet on Microsoft's api. Ron Teitelbaum > -----Original Message----- > From: [hidden email] [mailto:squeak-dev- > [hidden email]] On Behalf Of John M McIntosh > Sent: Friday, August 04, 2006 7:52 PM > To: The general-purpose Squeak developers list > Subject: Re: FFI wide-character type? > > Actually the correct question is what the target character set is for > the conversion since I was under the > assumption that one wanted UTF-32 as a result. > > So COM is UTF16? Or some special Windows encoding? > > UTF16 can be either Big E or Little E and you can supply a Byte Order > Mark that helps resolve what it is. > When building the UTF16 converter you can specify these details. > > On 4-Aug-06, at 2:00 PM, nicolas cellier wrote: > > > Obviously, Squeak WideString have 32 bits characters while Windows > > (and other > > OS i presume) have 16bits characters, and yes, OS byte ordering > > does matter. > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > |
Ron,
> aStream := MultiByteBinaryOrTextStream on: String new encoding: 'utf-16'. > aStream converter useLittleEndian: true. > aStream nextPutAll: 'abcde'. > ^aStream > > And it appears to work great except that the implementation of next throws > it all away. I'm guessing it will work for ms wide strings -> squeak just > fine, but doesn't support squeak -> ms wide strings very well. I figured > I'd write an MSWideString class to support this properly. Do you have any > suggestions and/or am I missing something? If you have a ByteString that contains utf-16 encoded chars, ret := aString convertFromWithConverter: (UTF16TextConverter new useLittleEndian: true). should give you a correct ByteString or WideString. For that matter, the conversion to an external format can be: ret := 'abcde' convertToConverter: (UTF16TextConverter new useLittleEndian: true). -- Yoshiki |
Free forum by Nabble | Edit this page |