Smalltalk › Gemtalk › GLASS

Particular code that is much slower than Pharo

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

9 messages Options

GLASS mailing list

Particular code that is much slower than Pharo

Hi guys,

I have an algorithm which iterates a really large file (600k lines) and then iterates quite some times. I found that in gemstone the process takes 4 times more than Pharo.... even if GemStone is running in a much better hardware (server).

I tried to reproduce the issue (I am not sure if I succeeded in making the below snipped to reproduce the exact same issue I have in the original code), but at least I still see the 4x difference.

Please, do not pay much attention to the algorithm itself. Getting a 50% for THIS particular problem won't help me much as I may find these kind of issues on other places. So I would love to see if there is some general speedup I am missing. In other words, I would like to understand WHY it is slower than Pharo.

For the details, I am in GemStone 3.2.9, running in Linux, 2GB SPC, etc. Native code is enabled.

If I profile the code on GemStone I see there is a slowdown in:

86.2% (256) SequenceableCollection >> doWithIndex: [Array]

| 80.1% (238) block in executed code [ExecBlock2]

| | 40.1% (119) Array >> at:

| | 16.8% (50) SmallInteger >> =

| | 0.7% (2) OrderedCollection >> add:

| | 0.3% (1) Array >> at:put:

| | 0.3% (1) SmallInteger >> <=

| 3.7% (11) Array >> at:

So...it looks to me that Array >> at: and SmallInteger >> = take like 45% of the time....so I guess I cannot improve that.

Thank you very much in advance.

-----------

The code is this:

| array1 another |

array1 := #( 1 2 3 4 5 6 7 8 9).

another := #(8 9 10 11 12 115 9 116 117 16 118).

10000 timesRepeat: [

| answer i j selfSize anotherSize skipTestsTable |

answer := OrderedCollection new.

skipTestsTable := array1 collect: [ :each | Array new: another size ]. "all nils"

selfSize := array1 size.

anotherSize := another size.

array1 doWithIndex: [:each :selfStart |

| skipTableAtStart |

skipTableAtStart := skipTestsTable at: selfStart.

1 to: anotherSize do: [ :anotherStart|

(false or: [ (skipTableAtStart at: anotherStart) isNil]) ifTrue: [

each = (another at: anotherStart)

ifTrue: [

i := selfStart + 1.

j := anotherStart + 1.

[ (i <= selfSize and: [ j <= anotherSize ]) and: [ (array1 at: i) = (another at: j)]]

whileTrue: [

(skipTestsTable at: i) at: j put: true.

i := i + 1.

j := j + 1. ].

answer add: { selfStart. anotherStart. (i - selfStart). i - 1 }

] ]

]

answer asArray

]

] timeToRun

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

I plan look at a few other things when I get a chance ...

Dale

On 01/24/2017 11:42 AM, Mariano Martinez Peck via Glass wrote:

| array1 another |

array1 := #( 1 2 3 4 5 6 7 8 9).

another := #(8 9 10 11 12 115 9 116 117 16 118).

10000 timesRepeat: [

| answer i j selfSize anotherSize skipTestsTable |

answer := OrderedCollection new.

skipTestsTable := array1 collect: [ :each | Array new: another size ]. "all nils"

selfSize := array1 size.

anotherSize := another size.

array1 doWithIndex: [:each :selfStart |

| skipTableAtStart |

skipTableAtStart := skipTestsTable at: selfStart.

1 to: anotherSize do: [ :anotherStart|

(false or: [ (skipTableAtStart at: anotherStart) isNil]) ifTrue: [

each = (another at: anotherStart)

ifTrue: [

i := selfStart + 1.

j := anotherStart + 1.

[ (i <= selfSize and: [ j <= anotherSize ]) and: [ (array1 at: i) = (another at: j)]]

whileTrue: [

(skipTestsTable at: i) at: j put: true.

i := i + 1.

j := j + 1. ].

answer add: { selfStart. anotherStart. (i - selfStart). i - 1 }

] ]

]

].

answer asArray

]

] timeToRun

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Thanks for your answer.

Dale

On 01/24/2017 11:42 AM, Mariano Martinez Peck via Glass wrote:

| array1 another |

array1 := #( 1 2 3 4 5 6 7 8 9).

another := #(8 9 10 11 12 115 9 116 117 16 118).

10000 timesRepeat: [

| answer i j selfSize anotherSize skipTestsTable |

answer := OrderedCollection new.

skipTestsTable := array1 collect: [ :each | Array new: another size ]. "all nils"

selfSize := array1 size.

anotherSize := another size.

array1 doWithIndex: [:each :selfStart |

| skipTableAtStart |

skipTableAtStart := skipTestsTable at: selfStart.

1 to: anotherSize do: [ :anotherStart|

(false or: [ (skipTableAtStart at: anotherStart) isNil]) ifTrue: [

each = (another at: anotherStart)

ifTrue: [

i := selfStart + 1.

j := anotherStart + 1.

[ (i <= selfSize and: [ j <= anotherSize ]) and: [ (array1 at: i) = (another at: j)]]

whileTrue: [

(skipTestsTable at: i) at: j put: true.

i := i + 1.

j := j + 1. ].

answer add: { selfStart. anotherStart. (i - selfStart). i - 1 }

] ]

]

].

answer asArray

]

] timeToRun

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

On Tue, Jan 24, 2017 at 5:43 PM, Mariano Martinez Peck <[hidden email]> wrote:

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

Actually...any Collection subclass would help...

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Thanks for your answer.

Dale

On 01/24/2017 11:42 AM, Mariano Martinez Peck via Glass wrote:

| array1 another |

array1 := #( 1 2 3 4 5 6 7 8 9).

another := #(8 9 10 11 12 115 9 116 117 16 118).

10000 timesRepeat: [

| answer i j selfSize anotherSize skipTestsTable |

answer := OrderedCollection new.

skipTestsTable := array1 collect: [ :each | Array new: another size ]. "all nils"

selfSize := array1 size.

anotherSize := another size.

array1 doWithIndex: [:each :selfStart |

| skipTableAtStart |

skipTableAtStart := skipTestsTable at: selfStart.

1 to: anotherSize do: [ :anotherStart|

(false or: [ (skipTableAtStart at: anotherStart) isNil]) ifTrue: [

each = (another at: anotherStart)

ifTrue: [

i := selfStart + 1.

j := anotherStart + 1.

[ (i <= selfSize and: [ j <= anotherSize ]) and: [ (array1 at: i) = (another at: j)]]

whileTrue: [

(skipTestsTable at: i) at: j put: true.

i := i + 1.

j := j + 1. ].

answer add: { selfStart. anotherStart. (i - selfStart). i - 1 }

] ]

]

].

answer asArray

]

] timeToRun

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

--
Mariano
http://marianopeck.wordpress.com

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

In reply to this post by GLASS mailing list

On 01/24/2017 12:43 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

No we don't:)

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Have you thought of parallelizing the algorithm? If you can, you could load up all of the cpus on your machine and shorten the elapsed time ...

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

On Tue, Jan 24, 2017 at 6:00 PM, Dale Henrichs <[hidden email]> wrote:

On 01/24/2017 12:43 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

No we don't:)

Uhhhh :(

And do you know how hard would be to have those set of basic primitives with this flavor ? Because at image side a subclass of Array may not be that hard.

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Have you thought of parallelizing the algorithm? If you can, you could load up all of the cpus on your machine and shorten the elapsed time ...

Well, as I said, I was trying to get a generic speed up and not a per algorithm one, as we will have more and more.

And yeah, I can do what you say, but that requires modifying my code to launch multiple gems and then the synchronization etc etc. It's not always that easy to make an algorithm paralel. In addition, I would have to go with the 4 CPUs license.

There is no way to avoid the slowdown of the #at: related to persistent objects faulting? I mean, even with "immediate objects" ?

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

Mariano,

In today's staff meeting we discussed the possibility of optimizing #at: for temporary Arrays and we came to the conclusion that we should be able to improve the performance of simple primitives (like #at:) for small objects -- smaller than ~2028 slots.

I've submitted a feature request on your behalf, internal bug #46644 "Optimize selected simple primitives for small temporary objects (i.e., Array>>#at:)" and any improvements we make will be in 3.4.

If anyone else has seen performance degradation for a simple primitive, now would be a good time to nominate the primitive for consideration...

Dale

On 01/24/2017 05:18 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 6:00 PM, Dale Henrichs <[hidden email]> wrote:

On 01/24/2017 12:43 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

No we don't:)

Uhhhh :(

And do you know how hard would be to have those set of basic primitives with this flavor ? Because at image side a subclass of Array may not be that hard.

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Have you thought of parallelizing the algorithm? If you can, you could load up all of the cpus on your machine and shorten the elapsed time ...

Well, as I said, I was trying to get a generic speed up and not a per algorithm one, as we will have more and more.

And yeah, I can do what you say, but that requires modifying my code to launch multiple gems and then the synchronization etc etc. It's not always that easy to make an algorithm paralel. In addition, I would have to go with the 4 CPUs license.

There is no way to avoid the slowdown of the #at: related to persistent objects faulting? I mean, even with "immediate objects" ?

--

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

On Wed, Jan 25, 2017 at 4:06 PM, Dale Henrichs <[hidden email]> wrote:

Mariano,

In today's staff meeting we discussed the possibility of optimizing #at: for temporary Arrays and we came to the conclusion that we should be able to improve the performance of simple primitives (like #at:) for small objects -- smaller than ~2028 slots.

I've submitted a feature request on your behalf, internal bug #46644 "Optimize selected simple primitives for small temporary objects (i.e., Array>>#at:)" and any improvements we make will be in 3.4.

If anyone else has seen performance degradation for a simple primitive, now would be a good time to nominate the primitive for consideration...

Hi Dale,

Thanks for the feature request. I was starting to worry about performance :(

Let me ask: when you sat "temporary objects" you mean that the optimization will be possible only when all the contents of the array are NOT persistent objects? or when the Array object itself is not persistent? I guess the former, but just to confirm.

BTW, do you have an estimation of the next release 3.4 with such an improvement?

Dale

On 01/24/2017 05:18 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 6:00 PM, Dale Henrichs <[hidden email]> wrote:

On 01/24/2017 12:43 PM, Mariano Martinez Peck wrote:

On Tue, Jan 24, 2017 at 5:28 PM, Dale Henrichs via Glass <[hidden email]> wrote:

I ran this test using GemStone 3.4.0 and Pharo6.0 on the same hardware and came out as only 2x slower: 79ms (G/S) vs 36ms (pharo) .... Allen (our vm guy) thought that 3.3 might be a bit faster than 3.2 and sure enough, a run on 3.2 came up with 111ms (3x slower). No code gen changes have been made in 3.4.0 so the timings are similar.

mmmm I am getting between 22ms and 24ms in Pharo, and around 85 on GemStone 3.2.9.

GemStone is going to be doing a bit more work for #at: than Pharo, since we have to worry about faulting in objects and that costs a bit more ...

mmmm but in this case the contents of the Array are not persistent objects (not even objects as they are smallintegers)

Isn't there a particular Array subclass with an override of #at: which uses a different primitive that does not takes care of faulting objects (it assumes the objects are not persistent)?

No we don't:)

Uhhhh :(

And do you know how hard would be to have those set of basic primitives with this flavor ? Because at image side a subclass of Array may not be that hard.

I plan look at a few other things when I get a chance ...

Thanks! My process is taking hours on GemStone :(

Have you thought of parallelizing the algorithm? If you can, you could load up all of the cpus on your machine and shorten the elapsed time ...

Well, as I said, I was trying to get a generic speed up and not a per algorithm one, as we will have more and more.

And yeah, I can do what you say, but that requires modifying my code to launch multiple gems and then the synchronization etc etc. It's not always that easy to make an algorithm paralel. In addition, I would have to go with the 4 CPUs license.

There is no way to avoid the slowdown of the #at: related to persistent objects faulting? I mean, even with "immediate objects" ?

--

Mariano
http://marianopeck.wordpress.com

Mariano
http://marianopeck.wordpress.com

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass

GLASS mailing list

Re: Particular code that is much slower than Pharo

On 01/25/2017 11:20 AM, Mariano Martinez Peck wrote:

On Wed, Jan 25, 2017 at 4:06 PM, Dale Henrichs <[hidden email]> wrote:

Mariano,

In today's staff meeting we discussed the possibility of optimizing #at: for temporary Arrays and we came to the conclusion that we should be able to improve the performance of simple primitives (like #at:) for small objects -- smaller than ~2028 slots.

I've submitted a feature request on your behalf, internal bug #46644 "Optimize selected simple primitives for small temporary objects (i.e., Array>>#at:)" and any improvements we make will be in 3.4.

If anyone else has seen performance degradation for a simple primitive, now would be a good time to nominate the primitive for consideration...

Hi Dale,

Thanks for the feature request. I was starting to worry about performance :(

Let me ask: when you sat "temporary objects" you mean that the optimization will be possible only when all the contents of the array are NOT persistent objects? or when the Array object itself is not persistent? I guess the former, but just to confirm.

When the array object itself is not persistent ...

BTW, do you have an estimation of the next release 3.4 with such an improvement?

We do not have a target release date for 3.4 ... For 3.4 we are aiming for a late winter, early spring feature freeze with a late spring, early summer release date.

Dale

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass