Smalltalk › Cincom › Visual Smalltalk Enterprise

Optimization of BSTR>>asUnicode

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Ivar Maeland

Optimization of BSTR>>asUnicode

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

At first I wanted to use replaceFrom: 1 to: 2*sz with: contents startingAt: 5 (skip the size of the string which is stored at offset 0) but when I did that I am missing the first two characters of the string.

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Henrik Høyer

Re: Optimization of BSTR>>asUnicode

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Ivar Maeland

Re: Optimization of BSTR>>asUnicode

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 15:32
To: [hidden email]
Subject: Optimization of BSTR>>asUnicode

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Henrik Høyer

Re: Optimization of BSTR>>asUnicode

Ahh, the “hidden gem” was in the BSTRMemoryAddress class. The method below should handle all situations

asUnicodeSPPL

self isValid ifFalse: [ ^nil ].

(contents isKindOf: BSTRMemoryAddress) ifTrue: [

" since BSTRMemoryAddress handles the '4 byte size issue' we can this shorthand"

^UnicodeStringBuffer fromAddress: contents length: self size

] ifFalse: [

| sz unicodeString |

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 5. " skip our 4 byte size header "

^unicodeString

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: 15. april 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Henrik Høyer
Sent: Tuesday, April 15, 2014 10:40 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Todor Todorov

Re: Optimization of BSTR>>asUnicode

In reply to this post by Ivar Maeland

Under normal circumstances, BSTR contents should not be held inside a byte array. If you by some chance pass such BSTR to external API, either directly or indirectly as part of a struct or an array, the API may decide to release that BSTR. This also means the address where the contents are held. No chance it can release a Smalltalk byte array.

Byte array should only be used for temp BSTRs.

In general, BSTR is a pointer to wide character array. What’s special about it, is that at a negative offset, you will find a header with information about the buffer, i.e. size of the string. Also, the character buffer is allocated and deallocated by a special OLE/COM function. That means, you must use the dedicated APIs for that.

-Todor

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: Tuesday, 15. April, 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Ivar Maeland

Re: Optimization of BSTR>>asUnicode

In reply to this post by Henrik Høyer

Very good.

Tusen takk Henrik!

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: Wednesday, April 16, 2014 3:48 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Ahh, the “hidden gem” was in the BSTRMemoryAddress class. The method below should handle all situations

asUnicodeSPPL

self isValid ifFalse: [ ^nil ].

(contents isKindOf: BSTRMemoryAddress) ifTrue: [

" since BSTRMemoryAddress handles the '4 byte size issue' we can this shorthand"

^UnicodeStringBuffer fromAddress: contents length: self size

] ifFalse: [

| sz unicodeString |

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 5. " skip our 4 byte size header "

^unicodeString

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Henrik Høyer

Chief Software Architect

[hidden email] • (+45) 4029 2092

Tigervej 27 • 4600 Køge

www.sPeople.dk • (+45) 7023 7775

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Ivar Maeland

Re: Optimization of BSTR>>asUnicode

In reply to this post by Todor Todorov

Thanks for the insight Todor!

By dedicated API do you mean OLEAutomationDLL >>SysAllocString: ?

There is another bug in UnicodeStringBuffer >> stringFromUnicode:length:

result := KernelLibrary

wideCharToMultiByteCp: 0 " CP_ACP for ANSI code page "

flags: 0

lpwstr: aUnicodeBuffer

cchwstr: nChars * 2

lpstr: aString

cchlpstr: aString size

default: nil

defaultUsed: nil.

Should be

result := KernelLibrary

wideCharToMultiByteCp: 0 " CP_ACP for ANSI code page "

flags: 0

lpwstr: aUnicodeBuffer

cchwstr: nChars

lpstr: aString

cchlpstr: aString size

default: nil

defaultUsed: nil.

And we should probably add a check for return value

result = 0 ifTrue: [self osError].

The old code “works” in that it copies # characters that there is room for in buffer (aString) then return 0 and OS error “buffer too small”.

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Todor Todorov
Sent: Wednesday, April 16, 2014 6:46 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Byte array should only be used for temp BSTRs.

-Todor

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Ivar Maeland
Sent: Tuesday, 15. April, 2014 20:35
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***

Todor Todorov

Re: Optimization of BSTR>>asUnicode

Yes, SysAlloString etc.

BSTR functions:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms221105(v=vs.85).aspx

BSTR definition:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069(v=vs.85).aspx

Interesting info about BSTR. This explains some stuff.

http://blogs.msdn.com/b/ericlippert/archive/2003/09/12/52976.aspx

And if it is a native BSTR (not one allocated in byte-array), I suggest using WideCharToMultiByte and MultiByteToWideChar functions to copy data in and out of the BSTR.

-Todor

From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Ivar Maeland
Sent: Wednesday, 16. April, 2014 15:48
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Thanks for the insight Todor!

By dedicated API do you mean OLEAutomationDLL >>SysAllocString: ?

There is another bug in UnicodeStringBuffer >> stringFromUnicode:length:

result := KernelLibrary

wideCharToMultiByteCp: 0 " CP_ACP for ANSI code page "

flags: 0

lpwstr: aUnicodeBuffer

cchwstr: nChars * 2

lpstr: aString

cchlpstr: aString size

default: nil

defaultUsed: nil.

Should be

result := KernelLibrary

wideCharToMultiByteCp: 0 " CP_ACP for ANSI code page "

flags: 0

lpwstr: aUnicodeBuffer

cchwstr: nChars

lpstr: aString

cchlpstr: aString size

default: nil

defaultUsed: nil.

And we should probably add a check for return value

result = 0 ifTrue: [self osError].

The old code “works” in that it copies # characters that there is room for in buffer (aString) then return 0 and OS error “buffer too small”.

From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Todor Todorov
Sent: Wednesday, April 16, 2014 6:46 AM
To: [hidden email]
Subject: Re: Optimization of BSTR>>asUnicode

Byte array should only be used for temp BSTRs.

-Todor

Hi Henrik!

Good point, I did not consider “Internal Memory” BSTRs.

If you create the BSTR instance using allocateString: then BSTR contents will be an ExternalAddress.

If you create using fromString: then BSTR contents will be a ByteArray.

There is a method to check for this: isInExternalMemory.

(BSTR allocString: 'hello world') asUnicodeTec4 asString = 'hello world'

Works for me.

startingAt: 1.

Looks wrong to me…

(BSTR fromString: 'hello world') asUnicodeTec4 asString = 'hello world'

Answers false!

startingAt: 5.

Looks correct to me, and solves the test above.

Do you have an example that “when I did that I am missing the first two characters of the string.” ?

	Henrik Høyer
	Chief Software Architect
	[hidden email] • (+45) 4029 2092
	Tigervej 27 • 4600 Køge
	www.sPeople.dk • (+45) 7023 7775

I was doing some profiling on XMLDOM interface in Smalltalk. And the conversion from BSTR to String is quite slow on large strings.

I traced it down to the BSTR>>asUnicode which looks like this:

asUnicode

"Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil. "

| unicodeString |

self isValid ifFalse: [ ^nil ].

unicodeString := UnicodeStringBuffer new: self size.

1 to: self size do: [:i | unicodeString at: i put: ( self at: i ) ].

^unicodeString

The BSTR>>at: method is relatively slow and looking at UnicodeStringBuffer>>at:put: it appears to be doing the reverse step.

If I replace the code with the following it runs in about 1/5 of the time. Does anyone see any obvious problems with this code?

asUnicode

Category Name: Modified by Tec4

Description: Answer a Unicode copy of the receiver.

An invalid BSTR is mapped to nil.

Result Type: UnicodeStringBuffer

| unicodeString sz |

#modifiedByTec4.

self isValid ifFalse: [^nil].

sz := self size.

unicodeString := UnicodeStringBuffer new: sz.

unicodeString contents

replaceFrom: 1

to: 2 * sz

with: contents

startingAt: 1.

^unicodeString! !

Ivar Maeland, B.Sc., MCAD
Product Developer
Policy Works Inc.
Commercial Management Systems

Toll Free: 1.800.260.3676 ext.109
Direct: 403.450.1109
www.policyworks.com

*** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management ***