Smalltalk › Squeak › Squeak - Dev

performance 3.8 vs. 3.9a

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Marcus Denker

performance 3.8 vs. 3.9a

Hi,

To me, 3.9a feels a little bit snappier than 3.8... (thanks to a lot
of work by a lot
of people, Eddie Cottongim I think did many of the big-gain changesets).

So here is the benchmark again that opens 5*10 browsers on
#abandonSources.

This, of course, is an "apples vs. oranges" benchmark... e.g. there
are many more
systemcategories and classes in 3.9 than in 3.8... and early version
don't render
TrueType fonts for the window title. But in the end, we just want to
open
a Browser, and that should not take a too long.

3.5 5021.4
3.6 5282.2
3.7: 8838.4
3.8: 12010.8
3.9a: 7648.0

(Powerbook 1.5Ghz, all 32bit color, same vm)

So at least, the trend of the "ever slower Squeak" has been stopped.
1.57 times faster then 3.8, faster then 3.7... but slower then 3.6.
Nevertheless, not too bad, as a start.

Marcus

Code: (Dan's 10-Browser benchmark):
----------------------------------------------------

time _ 0.
saveMorphs _ self currentWorld submorphs.
5 timesRepeat:[
self currentWorld removeAllMorphs. "heh, heh"
time _ time + (Time millisecondsToRun:
[
1 to: 10 do: [:i | Browser fullOnClass: SystemDictionary selector:
#abandonSources].
self currentWorld submorphs do: [:m | m delete. self currentWorld
doOneCycle]]).
].
self currentWorld addAllMorphs: saveMorphs.
time/5 asFloat "print it"

johnmci

Re: performance 3.8 vs. 3.9a

Well let's see I noticed the default image 3.9a is set to 16bit,
change that to 32bit because 16bit requires an extra transformation
step on
the way to 32bit screens.

41% is interpreter loop
15% is in garbage collection and object allocation
5% is copyLooopNoSource
4% is copyLoopPixMap
4% is proceedRenderingScanline
4% is lookupMethodInClass

Now if I alter things a bit and do

Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024.
Smalltalk setGCBiasToGrow: 1.

Then we shave 2% off the GC work and interpreter loop is now 43%

7006, with 3.8.11b4U vm on 1.5Ghz Pb 17 with other things running...

8536.0 16bit
8448.0 32bit
8349.8 32bit and GC tuning

And about 30%+ is tied up in rendering then.

I'll note in 3.8 a quick test shows times at 15451.6 with
interpreter loop at 54%

On 20-Mar-06, at 2:56 PM, Marcus Denker wrote:

--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Marcus Denker

Re: performance 3.8 vs. 3.9a

On 21.03.2006, at 01:48, John M McIntosh wrote:

> Well let's see I noticed the default image 3.9a is set to 16bit,
> change that to 32bit because 16bit requires an extra transformation
> step on
> the way to 32bit screens.

Yes, I did the test already with 32 bits (this will be default
starting 7015... it's 2006 after all...).

>
> 41% is interpreter loop
> 15% is in garbage collection and object allocation
> 5% is copyLooopNoSource
> 4% is copyLoopPixMap
> 4% is proceedRenderingScanline
> 4% is lookupMethodInClass
>
> Now if I alter things a bit and do
>
> Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024.
> Smalltalk setGCBiasToGrow: 1.
>

Would it make sense to set to be the default in 3.9a?

marcus

johnmci

Re: performance 3.8 vs. 3.9a

On 20-Mar-06, at 5:24 PM, Marcus Denker wrote:

>>
>
> Would it make sense to set to be the default in 3.9a?

Likely, but awaiting support from other VM...

>
> marcus

--
========================================================================
===
John M. McIntosh <[hidden email]> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

stéphane ducasse-2

Re: performance 3.8 vs. 3.9a

What do you imply? that it will never happen?

On 21 mars 06, at 03:41, John M McIntosh wrote:

>
> On 20-Mar-06, at 5:24 PM, Marcus Denker wrote:
>
>>>
>>
>> Would it make sense to set to be the default in 3.9a?
>
> Likely, but awaiting support from other VM...
>
>>
>> marcus
>
> --
> ======================================================================
> =====
> John M. McIntosh <[hidden email]> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd. http://
> www.smalltalkconsulting.com
> ======================================================================
> =====
>
>

stéphane ducasse-2

Re: performance 3.8 vs. 3.9a

In reply to this post by Marcus Denker

cool

I think that we should learn from this category speedup.

1- It seems to me that lot of the design choices which made absolute
sense in the early life of Smalltalk
(the way system category,class category for example are managed)
because running something
on that machines was a real challenge, now should be revised.

2- lot of small changes can make a difference. Incremental process is
good even if there
is no problem throwing away them when a better solution arrives.

So I encourage everybody to help :)
For example the network seems a bit requiring help
Stef

On 20 mars 06, at 23:56, Marcus Denker wrote:

> Hi,
>
> To me, 3.9a feels a little bit snappier than 3.8... (thanks to a
> lot of work by a lot
> of people, Eddie Cottongim I think did many of the big-gain
> changesets).
>
> So here is the benchmark again that opens 5*10 browsers on
> #abandonSources.
>
> This, of course, is an "apples vs. oranges" benchmark... e.g. there
> are many more
> systemcategories and classes in 3.9 than in 3.8... and early
> version don't render
> TrueType fonts for the window title. But in the end, we just want
> to open
> a Browser, and that should not take a too long.
>
> 3.5 5021.4
> 3.6 5282.2
> 3.7: 8838.4
> 3.8: 12010.8
> 3.9a: 7648.0
>
> (Powerbook 1.5Ghz, all 32bit color, same vm)
>
> So at least, the trend of the "ever slower Squeak" has been stopped.
> 1.57 times faster then 3.8, faster then 3.7... but slower then 3.6.
> Nevertheless, not too bad, as a start.
>
> Marcus
>
> Code: (Dan's 10-Browser benchmark):
> ----------------------------------------------------
>
> time _ 0.
> saveMorphs _ self currentWorld submorphs.
> 5 timesRepeat:[
> self currentWorld removeAllMorphs. "heh, heh"
> time _ time + (Time millisecondsToRun:
> [
> 1 to: 10 do: [:i | Browser fullOnClass: SystemDictionary selector:
> #abandonSources].
> self currentWorld submorphs do: [:m | m delete. self currentWorld
> doOneCycle]]).
> ].
> self currentWorld addAllMorphs: saveMorphs.
> time/5 asFloat "print it"

johnmci

Re: performance 3.8 vs. 3.9a

In reply to this post by stéphane ducasse-2

I suspect they run into edge cases with the old solution in Croquet,
so turning this
feature on is likely, however someone has to build a new unix and
windows vm first...

On 20-Mar-06, at 11:37 PM, stéphane ducasse wrote:

> What do you imply? that it will never happen?
>
> On 21 mars 06, at 03:41, John M McIntosh wrote:
>
>>
>> On 20-Mar-06, at 5:24 PM, Marcus Denker wrote:
>>
>>>>
>>>
>>> Would it make sense to set to be the default in 3.9a?
>>
>> Likely, but awaiting support from other VM...