Some test where Spur more slow than Cog

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Some test where Spur more slow than Cog

Denis Kudriashov
Hi.

I found case where Spur more slow than Cog. Not much but.

prototype := BorderedMorph new.

result := [prototype veryDeepCopy ] benchFor: 2 seconds.

In my computer I got:

Cog: a BenchmarkResult(16,982 iterations in 2 seconds 1 millisecond. 8,487 per second)
Spur: a BenchmarkResult(12,507 iterations in 2 seconds 33 milliseconds. 6,152 per second).

Is it expected behavior for such case?
Reply | Threaded
Open this post in threaded view
|

Re: Some test where Spur more slow than Cog

Eliot Miranda-2
Hi Denis,

On Fri, Dec 18, 2015 at 9:36 AM, Denis Kudriashov <[hidden email]> wrote:
Hi.

I found case where Spur more slow than Cog. Not much but.

Well, it might be indicative of a soluble problem.  One thing that Spur is slower at is primitive failure; it catches primitive failure in the VM as potentially caused by primitives encountering forwarders.  So many primitive failure involves a slow scan of the primitives arguments to some depth to find out if there are any forwarders which should be followed, and if so the primitive retried.  So that means some idioms that use primitive failure normally will go slower.  One of these is instVarAt: which used to be written

instVarAt: index 
"Primitive. Answer a fixed variable in an object. The numbering of the 
variables corresponds to the named instance variables. Fail if the index 
is not an Integer or is not the index of a fixed variable. Essential. See 
Object documentation whatIsAPrimitive."

<primitive: 73>
"Access beyond fixed variables."
^self basicAt: index - self class instSize

and used in some copying code to access indexable fields.  Chris Muller found this in some of his Magma code and so we added a pair of new primitives, primitiveSlotAt and primitiveSlotAtPut that access any slot within the sequence of named followed by indexable inst vars, avoiding the primitive failure. So in Squeak Spur instVarAt: now reads:

instVarAt: index
"Primitive. Answer a fixed variable in an object. The numbering of the variables
corresponds to the named instance variables, followed by the indexed instance
variables. Fail if the index is not an Integer or is not the index of a fixed variable.
Essential. See Object documentation whatIsAPrimitive."

<primitive: 173 error: ec>
self primitiveFailed

I suspect something like this is happening with the veryDeepCopy code.  You might want to use the profiler on both versions and compare.  You might find an issue.

HTH



prototype := BorderedMorph new.

result := [prototype veryDeepCopy ] benchFor: 2 seconds.

In my computer I got:

Cog: a BenchmarkResult(16,982 iterations in 2 seconds 1 millisecond. 8,487 per second)
Spur: a BenchmarkResult(12,507 iterations in 2 seconds 33 milliseconds. 6,152 per second).

Is it expected behavior for such case?



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Some test where Spur more slow than Cog

Clément Béra
Hello,

In the Smalltalk profiler it shows more time spent in primitive.

It could be also, as it is an older version of the spur vm currently used in Pharo, that this problem is already fixed. I can't know the problem for sure as I can't run the VM profiler in Pharo.





2015-12-19 4:18 GMT+01:00 Eliot Miranda <[hidden email]>:
Hi Denis,

On Fri, Dec 18, 2015 at 9:36 AM, Denis Kudriashov <[hidden email]> wrote:
Hi.

I found case where Spur more slow than Cog. Not much but.

Well, it might be indicative of a soluble problem.  One thing that Spur is slower at is primitive failure; it catches primitive failure in the VM as potentially caused by primitives encountering forwarders.  So many primitive failure involves a slow scan of the primitives arguments to some depth to find out if there are any forwarders which should be followed, and if so the primitive retried.  So that means some idioms that use primitive failure normally will go slower.  One of these is instVarAt: which used to be written

instVarAt: index 
"Primitive. Answer a fixed variable in an object. The numbering of the 
variables corresponds to the named instance variables. Fail if the index 
is not an Integer or is not the index of a fixed variable. Essential. See 
Object documentation whatIsAPrimitive."

<primitive: 73>
"Access beyond fixed variables."
^self basicAt: index - self class instSize

and used in some copying code to access indexable fields.  Chris Muller found this in some of his Magma code and so we added a pair of new primitives, primitiveSlotAt and primitiveSlotAtPut that access any slot within the sequence of named followed by indexable inst vars, avoiding the primitive failure. So in Squeak Spur instVarAt: now reads:

instVarAt: index
"Primitive. Answer a fixed variable in an object. The numbering of the variables
corresponds to the named instance variables, followed by the indexed instance
variables. Fail if the index is not an Integer or is not the index of a fixed variable.
Essential. See Object documentation whatIsAPrimitive."

<primitive: 173 error: ec>
self primitiveFailed

I suspect something like this is happening with the veryDeepCopy code.  You might want to use the profiler on both versions and compare.  You might find an issue.

HTH



prototype := BorderedMorph new.

result := [prototype veryDeepCopy ] benchFor: 2 seconds.

In my computer I got:

Cog: a BenchmarkResult(16,982 iterations in 2 seconds 1 millisecond. 8,487 per second)
Spur: a BenchmarkResult(12,507 iterations in 2 seconds 33 milliseconds. 6,152 per second).

Is it expected behavior for such case?



--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Some test where Spur more slow than Cog

Denis Kudriashov
In reply to this post by Eliot Miranda-2
Hi

2015-12-19 4:18 GMT+01:00 Eliot Miranda <[hidden email]>:
and used in some copying code to access indexable fields.  Chris Muller found this in some of his Magma code and so we added a pair of new primitives, primitiveSlotAt and primitiveSlotAtPut that access any slot within the sequence of named followed by indexable inst vars, avoiding the primitive failure. So in Squeak Spur instVarAt: now reads:

instVarAt: index
"Primitive. Answer a fixed variable in an object. The numbering of the variables
corresponds to the named instance variables, followed by the indexed instance
variables. Fail if the index is not an Integer or is not the index of a fixed variable.
Essential. See Object documentation whatIsAPrimitive."

<primitive: 173 error: ec>
self primitiveFailed

Same code used in Pharo too.

I try same test on latest squeak images:

Cog: '10,100 per second. 98.8 microseconds per run.'
Spur:  '7,610 per second. 131 microseconds per run.'
Reply | Threaded
Open this post in threaded view
|

Re: Some test where Spur more slow than Cog

Clément Béra
I could reproduce with Squeak 5 and Squeak 4.6 so I was able to use the VM profiler.

I used:

| prototype |
prototype := BorderedMorph new.
10000
timesRepeat: [prototype veryDeepCopy]

The profiler ran for ~0.8 second. Maybe you can have more accurate result with longer bench but it takes already a while to display results.

In Spur:

1044 samples in the VM (1057 samples in the entire program)  98.77% of total

754 samples in generated vm code 72.22% of entire vm (71.33% of total)
290 samples in vanilla vm code 27.78% of entire vm (27.44% of total)

% of generated vm code (% of total) (samples) (cumulative)
37.67%    (26.87%) Dictionary>>associationsDo: (284) (37.67%)
30.37%    (21.67%) Object>>at: (229) (68.04%)
  3.18%    (  2.27%) WeakKeyDictionary>>associationsDo: (24) (71.22%)
  2.12%    (  1.51%) MethodDictionary>>scanFor: (16) (73.34%)
  1.59%    (  1.14%) SmallInteger>>\\ (12) (74.93%)

% of vanilla vm code (% of total) (samples) (cumulative)
58.62%    (16.08%) scavengeReferentsOf (170) (58.62%)
24.48%    (  6.72%) instantiateClassindexableSize (71) (83.10%)
  2.76%    (  0.76%) copyAndForward (8) (85.86%)
  2.07%    (  0.57%) doScavenge (6) (87.93%)
  2.07%    (  0.57%) lookupOrdinaryNoMNUEtcInClass (6) (90.00%)

13 samples in the rest    1.23% of total

% of rest (% of total) (samples) (cumulative)
53.85%    (  0.66%) SCDynamicStoreCreate (7) (53.85%)
46.15%    (  0.57%) _platform_memmove$VARIANT$sse42 (6) (100.0%)
46.15%    (  0.57%) OSAtomicAdd32Barrier (6) (146.15%)

**GCs**
full 0 totalling 0ms (0% elapsed time)
scavenges 258 totalling 151ms (18.97% elapsed time), avg 0.585ms

In SqueakV3:

1189 samples in the VM (1196 samples in the entire program)  99.41% of total

998 samples in generated vm code 83.94% of entire vm (83.44% of total)
191 samples in vanilla vm code 16.06% of entire vm (15.97% of total)

% of generated vm code (% of total) (samples) (cumulative)
43.19%    (36.04%) Object>>at: (431) (43.19%)
35.07%    (29.26%) Dictionary>>associationsDo: (350) (78.26%)
  1.90%    (  1.59%) WeakKeyDictionary>>associationsDo: (19) (80.16%)
  1.30%    (  1.09%) Object>>veryDeepCopyWith: (13) (81.46%)
  1.20%    (  1.00%) BlockClosure>>value (12) (82.67%)

% of vanilla vm code (% of total) (samples) (cumulative)
30.37%    (  4.85%) instantiateClassindexableSize (58) (30.37%)
13.61%    (  2.17%) incrementalGC (26) (43.98%)
  9.42%    (  1.51%) markAndTrace (18) (53.40%)
  8.38%    (  1.34%) marryFrameSP (16) (61.78%)
  6.81%    (  1.09%) eeInstantiateSmallClassnumSlots (13) (68.59%)
  5.24%    (  0.84%) createClosureNumArgsnumCopiedstartpc (10) (73.82%)
  4.71%    (  0.75%) mapPointersInObjectsFromto (9) (78.53%)

**GCs**
full 0 totalling 0ms (0% elapsed time)
incr 270 totalling 46ms (5.094% elapsed time), avg 0.17ms

Conclusion ?

It feels that the main issue is with the scavenges that take overall more time in Spur in this case. Especially, the method scavengeReferentsOf seems to be guilty and may be worth looking at.

There are other details, like now a little bit of execution time seems to be used to allocate new memory segments and associationsDo: looks slower, likely because of == and forwarders, but this is not as important.

What do you think Eliot ?


2015-12-19 11:06 GMT+01:00 Denis Kudriashov <[hidden email]>:
Hi

2015-12-19 4:18 GMT+01:00 Eliot Miranda <[hidden email]>:
and used in some copying code to access indexable fields.  Chris Muller found this in some of his Magma code and so we added a pair of new primitives, primitiveSlotAt and primitiveSlotAtPut that access any slot within the sequence of named followed by indexable inst vars, avoiding the primitive failure. So in Squeak Spur instVarAt: now reads:

instVarAt: index
"Primitive. Answer a fixed variable in an object. The numbering of the variables
corresponds to the named instance variables, followed by the indexed instance
variables. Fail if the index is not an Integer or is not the index of a fixed variable.
Essential. See Object documentation whatIsAPrimitive."

<primitive: 173 error: ec>
self primitiveFailed

Same code used in Pharo too.

I try same test on latest squeak images:

Cog: '10,100 per second. 98.8 microseconds per run.'
Spur:  '7,610 per second. 131 microseconds per run.'

Reply | Threaded
Open this post in threaded view
|

Re: Some test where Spur more slow than Cog

Denis Kudriashov
In reply to this post by Denis Kudriashov
Hi.

With latest VM these benches are similar. New compaction solves the problem I guess

2015-12-18 18:36 GMT+01:00 Denis Kudriashov <[hidden email]>:
Hi.

I found case where Spur more slow than Cog. Not much but.

prototype := BorderedMorph new.

result := [prototype veryDeepCopy ] benchFor: 2 seconds.

In my computer I got:

Cog: a BenchmarkResult(16,982 iterations in 2 seconds 1 millisecond. 8,487 per second)
Spur: a BenchmarkResult(12,507 iterations in 2 seconds 33 milliseconds. 6,152 per second).

Is it expected behavior for such case?