String comparison primitive preliminary speed-up result: 20-600x

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

String comparison primitive preliminary speed-up result: 20-600x

Clément Béra
 
Hi all,

The short version:

After looking into MiscPlugin with Eliot, we decided to work on a numbered primitive to speed up String comparison especially String>>#= (which matters for parsers that are widely used by our communities) as a long-term replacement for compare:with:collated: Misc primitive. Being numbered allows the JIT to have an entry for it. We (likely Eliot) will work on other Misc primitives to compile them differently in the future (No Smalltalk level smart syntax to ease cross language management) but aside from String comparison, they will remain in a plugin, since they're not as critical.

The new primitive is as follow (optional last parameter so 2 versions):
ByteString >> compareWith: anotherString
ByteString >> compareWith: anotherString collated: order
"Return a negative Smi, 0 or a positive Smi if self is < = or  > to anotherString, with the collating order of characters given optionally by the order array."

Preliminary benchmarks show 20 to 600x speed-up in String comparison (< <= = > >= on Strings; 600x on small strings, 20x on large strings). String comparison does not use any more the optional last parameter, only other APIs do. Sophie in CC did the bulk of the work as part of a 100h student project, though various discussion in the mailing list involving many of us but especially Levente, Eliot and me helped a lot. There are still some tests to write to make sure everything is stable before integration, but my understanding is that integration should happen anytime soon. Sophie, could you please answer a time frame for integration to this mail?

The long version:

The blog version includes assembly code, Cog's RTL code & Slang code for each version as well as the exact preliminary benchmarks we ran:

Levente maybe you want to review the long version in my blog if you have comments/advises before integration. I guess you will write accurate micro-benchmarks once it's integrated anyway. 

This LLVM/GCC performance difference is very annoying though for correct evaluation of speed-ups, I wonder, have any one seen a difference there before for other primitives? PrimitiveStringReplace might have similar issues. I don't know if we could report those things as bugs to the LLVM people somehow.

Best,


--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: String comparison primitive preliminary speed-up result: 20-600x

Denis Kudriashov
 
Thank's Clement and others.
 
Really huge improvement. And we will finally have standard return values for comparison. 

2018-02-21 11:33 GMT+01:00 Clément Bera <[hidden email]>:
 
Hi all,

The short version:

After looking into MiscPlugin with Eliot, we decided to work on a numbered primitive to speed up String comparison especially String>>#= (which matters for parsers that are widely used by our communities) as a long-term replacement for compare:with:collated: Misc primitive. Being numbered allows the JIT to have an entry for it. We (likely Eliot) will work on other Misc primitives to compile them differently in the future (No Smalltalk level smart syntax to ease cross language management) but aside from String comparison, they will remain in a plugin, since they're not as critical.

The new primitive is as follow (optional last parameter so 2 versions):
ByteString >> compareWith: anotherString
ByteString >> compareWith: anotherString collated: order
"Return a negative Smi, 0 or a positive Smi if self is < = or  > to anotherString, with the collating order of characters given optionally by the order array."

Preliminary benchmarks show 20 to 600x speed-up in String comparison (< <= = > >= on Strings; 600x on small strings, 20x on large strings). String comparison does not use any more the optional last parameter, only other APIs do. Sophie in CC did the bulk of the work as part of a 100h student project, though various discussion in the mailing list involving many of us but especially Levente, Eliot and me helped a lot. There are still some tests to write to make sure everything is stable before integration, but my understanding is that integration should happen anytime soon. Sophie, could you please answer a time frame for integration to this mail?

The long version:

The blog version includes assembly code, Cog's RTL code & Slang code for each version as well as the exact preliminary benchmarks we ran:

Levente maybe you want to review the long version in my blog if you have comments/advises before integration. I guess you will write accurate micro-benchmarks once it's integrated anyway. 

This LLVM/GCC performance difference is very annoying though for correct evaluation of speed-ups, I wonder, have any one seen a difference there before for other primitives? PrimitiveStringReplace might have similar issues. I don't know if we could report those things as bugs to the LLVM people somehow.

Best,


--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq


Reply | Threaded
Open this post in threaded view
|

Re: String comparison primitive preliminary speed-up result: 20-600x

Henrik Sperre Johansen
In reply to this post by Clément Béra
 
Ref. blog post, are you sure you are measuring equivalent entities?
Usually the AsciiOrder is a constant, preallocated array, no?

Seems to me, with the code posted in blog post, you're mostly measuring
array instantation, not collate...

var := ByteString new: 3.

 [ByteString compare: var with: var collated: (0 to: 255) asByteArray]
benchFor: 5 second.
a BenchmarkResult(500,460 iterations in 5 seconds 1 millisecond. 100,072 per
second)

AsciiCollate := (0 to: 255) asByteArray.

 [ByteString compare: var with: var collated: AsciiCollate] benchFor: 5
second.

a BenchmarkResult(79,507,782 iterations in 5 seconds 2 milliseconds.
15,895,198 per second)



--
Sent from: http://forum.world.st/Squeak-VM-f104410.html
Reply | Threaded
Open this post in threaded view
|

Re: String comparison primitive preliminary speed-up result: 20-600x

Clément Béra
 
Right I knew something was off.

Let me patch this.

On Wed, Feb 21, 2018 at 3:00 PM, Henrik Sperre Johansen <[hidden email]> wrote:

Ref. blog post, are you sure you are measuring equivalent entities?
Usually the AsciiOrder is a constant, preallocated array, no?

Seems to me, with the code posted in blog post, you're mostly measuring
array instantation, not collate...

var := ByteString new: 3.

 [ByteString compare: var with: var collated: (0 to: 255) asByteArray]
benchFor: 5 second.
a BenchmarkResult(500,460 iterations in 5 seconds 1 millisecond. 100,072 per
second)

AsciiCollate := (0 to: 255) asByteArray.

 [ByteString compare: var with: var collated: AsciiCollate] benchFor: 5
second.

a BenchmarkResult(79,507,782 iterations in 5 seconds 2 milliseconds.
15,895,198 per second)



--
Sent from: http://forum.world.st/Squeak-VM-f104410.html



--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: String comparison primitive preliminary speed-up result: 20-600x

Clément Béra
 
And now on Mac (LLVM), 

I get for string of size 3 (Misc primitive, Slang primitive, Slang + JIT primitive):

#('15,700,000 per second. 63.8 nanoseconds per run.' 
'41,600,000 per second. 24 nanoseconds per run.' 
'78,600,000 per second. 12.7 nanoseconds per run.')

for a string of size 1000:

#('841,000 per second. 1.19 microseconds per run.' 
'1,500,000 per second. 666 nanoseconds per run.' 
'2,190,000 per second. 457 nanoseconds per run.')

which is much closer to what I expected in the first place. The results we got really surprised me but I could not figure out what was wrong.

I will update the post.

Thanks Henrik.

On Wed, Feb 21, 2018 at 3:13 PM, Clément Bera <[hidden email]> wrote:
Right I knew something was off.

Let me patch this.

On Wed, Feb 21, 2018 at 3:00 PM, Henrik Sperre Johansen <[hidden email]> wrote:

Ref. blog post, are you sure you are measuring equivalent entities?
Usually the AsciiOrder is a constant, preallocated array, no?

Seems to me, with the code posted in blog post, you're mostly measuring
array instantation, not collate...

var := ByteString new: 3.

 [ByteString compare: var with: var collated: (0 to: 255) asByteArray]
benchFor: 5 second.
a BenchmarkResult(500,460 iterations in 5 seconds 1 millisecond. 100,072 per
second)

AsciiCollate := (0 to: 255) asByteArray.

 [ByteString compare: var with: var collated: AsciiCollate] benchFor: 5
second.

a BenchmarkResult(79,507,782 iterations in 5 seconds 2 milliseconds.
15,895,198 per second)



--
Sent from: http://forum.world.st/Squeak-VM-f104410.html



--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq



--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq