Optimization of BSTR>>asUnicode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimization of BSTR>>asUnicode

Ivar Maeland

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Henrik Høyer

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 


Henrik Høyer
Chief Software Architect
[hidden email] • (+45) 4029 2092
Tigervej 27 • 4600 Køge
www.sPeople.dk • (+45) 7023 7775


From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Ivar Maeland

Hi Henrik!

 

Good point, I did not consider “Internal Memory” BSTRs.

 

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

 

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Works for me.

 

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

startingAt: 1.

Looks wrong to me…

 

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 

 

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Henrik Høyer
Ahh, the “hidden gem” was in the BSTRMemoryAddress class. The method below should handle all situations
 
 
asUnicodeSPPL
 
        self isValid ifFalse: [ ^nil ].
 
        (contents isKindOf: BSTRMemoryAddress) ifTrue: [
                " since BSTRMemoryAddress handles the '4 byte size issue' we can this shorthand"
                ^UnicodeStringBuffer fromAddress: contents length: self size
        ] ifFalse: [
                | sz unicodeString |
                sz := self size.
                unicodeString := UnicodeStringBuffer new: sz.
                unicodeString contents
                        replaceFrom: 1
                        to: 2 * sz
                        with: contents
                        startingAt: 5. " skip our 4 byte size header "
                ^unicodeString
        ].
 
 
From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode
 
Hi Henrik!
 
Good point, I did not consider “Internal Memory” BSTRs.
 
If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.
If you create using fromString: then BSTR contents will be a ByteArray.
There is a method to check for this: isInExternalMemory.
 
(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'
 
Works for me.
 
From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode
 
startingAt: 1.
Looks wrong to me…
 
(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'
 
Answers false!
 
startingAt: 5.
Looks correct to me, and solves the test above.
 
Do you have an example that when I did that I am missing the first two characters of the string.” ?
 
 
 
Henrik Høyer
Chief Software Architect
[hidden email] • (+45) 4029 2092
Tigervej 27 • 4600 Køge
www.sPeople.dk • (+45) 7023 7775
 
From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode
 
I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.
I traced it down to the BSTR>>asUnicode which looks like this:
 
asUnicode
        "Answer a Unicode copy of the receiver.
        An invalid BSTR is mapped to nil. "
 
    | unicodeString |
    self isValid ifFalse: [ ^nil ].
    unicodeString := UnicodeStringBuffer new: self size.
    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].
    ^unicodeString
 
The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.
If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?
At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.
 
asUnicode
"
    Category Name: Modified by Tec4
    Description: Answer a Unicode copy of the receiver.
        An invalid BSTR is mapped to nil.
    Result Type: UnicodeStringBuffer
"
    | unicodeString sz |
 
    #modifiedByTec4.
   
    self isValid ifFalse: [^nil].
    sz := self size.
    unicodeString := UnicodeStringBuffer new: sz.
    unicodeString contents
        replaceFrom: 1
        to: 2 * sz
        with: contents
        startingAt: 1.
    ^unicodeString! !
 
Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems
 
Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com
 
*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
 
 
--
Henrik Høyer
Chief Software Architect
[hidden email] • (+45) 4029 2092
Tigervej 27 • 4600 Køge
www.sPeople.dk • (+45) 7023 7775
*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Todor Todorov
In reply to this post by Ivar Maeland

Under normal circumstances, BSTR contents should not be held inside a byte array. If you by some chance pass such BSTR to external API, either directly or indirectly as part of a struct or an array, the API may decide to release that BSTR. This also means the address where the contents are held. No chance it can release a Smalltalk byte array.

 

Byte array should only be used for temp BSTRs.

 

In general, BSTR is a pointer to wide character array. What’s special about it, is that at a negative offset, you will find a header with information about the buffer, i.e. size of the string. Also, the character buffer is allocated and deallocated by a special OLE/COM function. That means, you must use the dedicated APIs for that.

 

-Todor

 

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: Tuesday, 15. April, 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Hi Henrik!

 

Good point, I did not consider “Internal Memory” BSTRs.

 

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

 

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Works for me.

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

startingAt: 1.

Looks wrong to me…

 

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 

 

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Ivar Maeland
In reply to this post by Henrik Høyer

Very good.

Tusen takk Henrik!

 

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: Wednesday, April 16, 2014 3:48 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Ahh, the “hidden gem” was in the BSTRMemoryAddress class. The method below should handle all situations

 

 

asUnicodeSPPL

 

        self isValid ifFalse: [ ^nil ].

 

        (contents isKindOf: BSTRMemoryAddress) ifTrue: [

                " since BSTRMemoryAddress handles the '4 byte size issue' we can this shorthand"

                ^UnicodeStringBuffer fromAddress: contents length: self size

        ] ifFalse: [

                | sz unicodeString |

                sz := self size.

                unicodeString := UnicodeStringBuffer new: sz.

                unicodeString contents

                        replaceFrom: 1

                        to: 2 * sz

                        with: contents

                        startingAt: 5. " skip our 4 byte size header "

                ^unicodeString

        ].

 

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Hi Henrik!

 

Good point, I did not consider “Internal Memory” BSTRs.

 

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

 

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Works for me.

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

startingAt: 1.

Looks wrong to me…

 

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 

 

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

 

 

--

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Ivar Maeland
In reply to this post by Todor Todorov

Thanks for the insight Todor!

By dedicated API do you mean OLEAutomationDLL >>SysAllocString: ?

There is another bug in UnicodeStringBuffer >> stringFromUnicode:length:

 

result := KernelLibrary

                                wideCharToMultiByteCp: 0  " CP_ACP for ANSI code page "

                                flags: 0

                                lpwstr: aUnicodeBuffer

                                cchwstr: nChars * 2

                                lpstr: aString

                                cchlpstr: aString size

                                default: nil

                                defaultUsed: nil.

 

Should be

 

result := KernelLibrary

                                wideCharToMultiByteCp: 0  " CP_ACP for ANSI code page "

                                flags: 0

                                lpwstr: aUnicodeBuffer

                                cchwstr: nChars

                                lpstr: aString

                                cchlpstr: aString size

                                default: nil

                                defaultUsed: nil.

 

And we should probably add a check for return value

result  = 0 ifTrue: [self osError].

 

The old code “works” in that it copies # characters that there is room for in buffer (aString) then return 0 and OS error “buffer too small”.

 

 

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Todor Todorov
Sent: Wednesday, April 16, 2014 6:46 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Under normal circumstances, BSTR contents should not be held inside a byte array. If you by some chance pass such BSTR to external API, either directly or indirectly as part of a struct or an array, the API may decide to release that BSTR. This also means the address where the contents are held. No chance it can release a Smalltalk byte array.

 

Byte array should only be used for temp BSTRs.

 

In general, BSTR is a pointer to wide character array. What’s special about it, is that at a negative offset, you will find a header with information about the buffer, i.e. size of the string. Also, the character buffer is allocated and deallocated by a special OLE/COM function. That means, you must use the dedicated APIs for that.

 

-Todor

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: Tuesday, 15. April, 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Hi Henrik!

 

Good point, I did not consider “Internal Memory” BSTRs.

 

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

 

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Works for me.

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

startingAt: 1.

Looks wrong to me…

 

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 

 

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***
Reply | Threaded
Open this post in threaded view
|

Re: Optimization of BSTR>>asUnicode

Todor Todorov

Yes, SysAlloString etc.

 

BSTR functions:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms221105(v=vs.85).aspx

 

BSTR definition:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069(v=vs.85).aspx

 

Interesting info about BSTR. This explains some stuff.

http://blogs.msdn.com/b/ericlippert/archive/2003/09/12/52976.aspx

 

And if it is a native BSTR (not one allocated in byte-array), I suggest using WideCharToMultiByte and MultiByteToWideChar functions to copy data in and out of the BSTR.

 

-Todor

 

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: Wednesday, 16. April, 2014 15:48
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Thanks for the insight Todor!

By dedicated API do you mean OLEAutomationDLL >>SysAllocString: ?

There is another bug in UnicodeStringBuffer >> stringFromUnicode:length:

 

result := KernelLibrary

                                wideCharToMultiByteCp: 0  " CP_ACP for ANSI code page "

                                flags: 0

                                lpwstr: aUnicodeBuffer

                                cchwstr: nChars * 2

                                lpstr: aString

                                cchlpstr: aString size

                                default: nil

                                defaultUsed: nil.

 

Should be

 

result := KernelLibrary

                                wideCharToMultiByteCp: 0  " CP_ACP for ANSI code page "

                                flags: 0

                                lpwstr: aUnicodeBuffer

                                cchwstr: nChars

                                lpstr: aString

                                cchlpstr: aString size

                                default: nil

                                defaultUsed: nil.

 

And we should probably add a check for return value

result  = 0 ifTrue: [self osError].

 

The old code “works” in that it copies # characters that there is room for in buffer (aString) then return 0 and OS error “buffer too small”.

 

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Todor Todorov
Sent: Wednesday, April 16, 2014 6:46 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Under normal circumstances, BSTR contents should not be held inside a byte array. If you by some chance pass such BSTR to external API, either directly or indirectly as part of a struct or an array, the API may decide to release that BSTR. This also means the address where the contents are held. No chance it can release a Smalltalk byte array.

 

Byte array should only be used for temp BSTRs.

 

In general, BSTR is a pointer to wide character array. What’s special about it, is that at a negative offset, you will find a header with information about the buffer, i.e. size of the string. Also, the character buffer is allocated and deallocated by a special OLE/COM function. That means, you must use the dedicated APIs for that.

 

-Todor

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: Tuesday, 15. April, 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

Hi Henrik!

 

Good point, I did not consider “Internal Memory” BSTRs.

 

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

 

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Works for me.

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

 

startingAt: 1.

Looks wrong to me…

 

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

 

Answers false!

 

startingAt: 5.

Looks correct to me, and solves the test above.

 

Do you have an example that when I did that I am missing the first two characters of the string.” ?

 

 

 

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

 

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

 

I was doing some profiling on XMLDOM interface in Smalltalk.  And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

 

asUnicode

        "Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil. "

 

    | unicodeString |

    self isValid ifFalse: [ ^nil ].

    unicodeString := UnicodeStringBuffer new: self size.

    1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

    ^unicodeString

 

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time.  Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

 

asUnicode

"

    Category Name: Modified by Tec4

    Description: Answer a Unicode copy of the receiver.

        An invalid BSTR is mapped to nil.

    Result Type: UnicodeStringBuffer

"

    | unicodeString sz |

 

    #modifiedByTec4.

   

    self isValid ifFalse: [^nil].

    sz := self size.

    unicodeString := UnicodeStringBuffer new: sz.

    unicodeString contents

        replaceFrom: 1

        to: 2 * sz

        with: contents

        startingAt: 1.

    ^unicodeString! !

 

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

 

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

 

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***