The Trunk: Collections-mt.593.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Bert Freudenberg
On 16.01.2015, at 17:28, Marcel Taeumel <[hidden email]> wrote:

>
> Okay, the Array >> #do: comes down to Number's:
>
> to: stop do: aBlock
> "Normally compiled in-line, and therefore not overridable.
> Evaluate aBlock for each element of the interval (self to: stop by: 1)."
> | nextValue |
> nextValue := self.
> [nextValue <= stop]
> whileTrue:
> [aBlock value: nextValue.
> nextValue := nextValue + 1]
No, the to:do: is inlined. Look at the bytecodes ...

do: aBlock
        1 to: self size do:
                [:index | aBlock value: (self at: index)]

> And the LinkedList >> #do: is:
>
> do: aBlock
> | aLink |
> aLink := firstLink.
> [aLink == nil] whileFalse:
> [aBlock value: aLink.
> aLink := aLink nextLink]
>
> So Array iteration has the incrementation of 1 in each iteration. That makes
> the difference? I thought that #to:do: is optimized as it does not appear on
> the stack... :)
The difference is that #at: is much more expensive than #nextLink.

Interestingly, on the interpreter VM the Array version is ever so slightly faster.

- Bert -




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Levente Uzonyi-2
In reply to this post by Eliot Miranda-2
On Fri, 16 Jan 2015, Eliot Miranda wrote:

>
>
> On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <[hidden email]> wrote:
>
>>
>>> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote:
>>>
>>> Hi Marcel,
>>>
>>> On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote:
>>>
>>>> Ah, okay. I was curious about the performance of iterating over an array vs.
>>>> the linked list:
>>>>
>>>> a := Array new: 1000.
>>>> l := LinkedList new.
>>>>
>>>> 1000 timesRepeat: [l add: (StackLink with: nil)].
>>>>
>>>> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"
>>>>
>>>> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"
>>>>
>>>> I expected the opposite! :-O Why is that so?
>>>
>>> Look at the implementations of do:.
>>>
>>> In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element.
>>>
>>> In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element.
>>
>> Making it all the more surprising that the linked list is twice as fast.
>
> Ugh.  I was blindsided.  I saw the numbers as times not rates.  Marcel, I'm sorry.  OK indeed the difference would be the cost of the bounds check in each array access.  Interesting to see at what point the poorer memory locality of the list makes it worse.  Difficult to measure given the locality is decided by history and the GC.

In general the locality of the links are worse. In the example they are
allocated almost next to each other.
A slightly better example would be this:

| buffer |
buffer := Array new: 1000. "This will hold the filler objects."
#(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
  | list |
  list := LinkedList new.
  1 to: 1000 do: [ :index |
  list add: (StackLink with: nil).
  buffer at: index put: (Array new: fillerSize) ].
  Smalltalk garbageCollect.
  fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]

{ "gap between nodes (x4 bytes)" -> "runs"
1024->'78,000 per second.'.
2048->'78,600 per second.'.
4096->'77,500 per second.'.
8192->'73,300 per second.'.
16384->'70,200 per second.'.
32768->'66,400 per second.'.
65536->'56,400 per second.'}

(The result is '45,700 per second.' for Arrays)

The results depend on the CPU (and probably on the OS too), but the
numbers are still pretty good. Probably a longer list would cause more
problem for the caches.
The same for 10k elements in the list:

{
1024->'3,770 per second.'.
2048->'4,780 per second.'.
4096->'4,220 per second.'.
8192->'2,480 per second.'.
16384->'2,110 per second.'}

(The result is '4,380 per second.' for Arrays)

So the speed of a list depends on the cache locality of its elements. The
performance is still comparable with arrays for practical cases, but may
be both faster and slower depending on the cache locality.

Levente

>
>
>>
>> My guess would be that the two inst var reads are way more efficient than the at: primitive.
>>
>> - Bert -
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Tobias Pape

On 16.01.2015, at 17:55, Levente Uzonyi <[hidden email]> wrote:

> On Fri, 16 Jan 2015, Eliot Miranda wrote:
>
>>
>>
>> On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <[hidden email]> wrote:
>>
>>>
>>>> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote:
>>>>
>>>> Hi Marcel,
[snip]

These kind of discussions is why I love squeak-dev :)
I always lern.

Best
        -Tobias
Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Tobias Pape

On 16.01.2015, at 18:30, Tobias Pape <[hidden email]> wrote:

>
> On 16.01.2015, at 17:55, Levente Uzonyi <[hidden email]> wrote:
>
>> On Fri, 16 Jan 2015, Eliot Miranda wrote:
>>
>>>
>>>
>>> On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <[hidden email]> wrote:
>>>
>>>>
>>>>> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote:
>>>>>
>>>>> Hi Marcel,
> [snip]
>
> These kind of discussions is why I love squeak-dev :)
> I always lern.

(save spelling, apparently -.-')

>
> Best
> -Tobias



Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Bert Freudenberg

> On 16.01.2015, at 18:31, Tobias Pape <[hidden email]> wrote:
>
>
> On 16.01.2015, at 18:30, Tobias Pape <[hidden email]> wrote:
>
>>
>> On 16.01.2015, at 17:55, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Fri, 16 Jan 2015, Eliot Miranda wrote:
>>>
>>>>
>>>>
>>>> On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <[hidden email]> wrote:
>>>>
>>>>>
>>>>>> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote:
>>>>>>
>>>>>> Hi Marcel,
>> [snip]
>>
>> These kind of discussions is why I love squeak-dev :)
>> I always lern.
>
> (save spelling, apparently -.-')
That was perfect Germanish ;)

- Bert -





smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Object Structures

Chris Muller-3
In reply to this post by Eliot Miranda-2
> I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model.  For me, I read the blue book first, which is replete with pictures.

I like to envision the physical model in my mind.  I look at the class
definitions to identify the object links (for pointer objects) and
then the methods which modifies those vars.

Once I have that picture of the physical model in my head, I then in
my mind think about that shape model with each node "suited up" with
its behaviors.  Knowing their underlying links to other objects even
helps me evaluate the consistency and nuance of the API).

> I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs.  I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.

For years, I have been /totally/ interested in Alexandre's project,
not just to represent in-memory models but large Magma models,
visually.  Roassal might struggle to perform with millions of nodes, I
don't know, but I know Craig used Walrus to render a large model.
Unfortunately I just haven't had the time to dig into these projects..

> Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.

Squeak's Text model actually supports the inclusion of any Morph
embedded as a single character.  Maui uses that for its "documents" it
actually works very well.  Seeing animated, fully-functioning Morphs
as characters in a document is kind'a cool.

> A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.
>
> Either approach would make an interesting project, yes?
>
>>> Best,
>>> Marcel
>>>
>>>
>>>
>>> --
>>> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html
>>> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Paul DeBruicker
In reply to this post by Eliot Miranda-2
Eliot Miranda-2 wrote
Ugh.  I was blindsided.  I saw the numbers as times not rates.  Marcel, I'm sorry.  OK indeed the difference would be the cost of the bounds check in each array access.  Interesting to see at what point the poorer memory locality of the list makes it worse.  Difficult to measure given the locality is decided by history and the GC.
With Spur could you use object pinning to make a dense linked list? And then use that for arrays that need to be fast?
Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Eliot Miranda-2
Hi Paul,

On Fri, Jan 16, 2015 at 11:36 AM, Paul DeBruicker <[hidden email]> wrote:
Eliot Miranda-2 wrote
> Ugh.  I was blindsided.  I saw the numbers as times not rates.  Marcel,
> I'm sorry.  OK indeed the difference would be the cost of the bounds check
> in each array access.  Interesting to see at what point the poorer memory
> locality of the list makes it worse.  Difficult to measure given the
> locality is decided by history and the GC.

With Spur could you use object pinning to make a dense linked list? And then
use that for arrays that need to be fast?

It might make some difference but I doubt it'll work well.  There is code to cluster pinned objects in the segment or segments that contain pinned objects.  But it's not going to work well; if the object is in new space then it is copied to old space and forwarded, plus some scanning of stack pages.  Thats a lot of overhead.  If it's in old space anyway the GC just sets the bit.

Pinning is really there for objects to be passed through the FFI or space needed to be referred to from machine code, such as conditional branch counters in Sista.  it's not there to control locality.  Hence it is unoptimised.  But that's an interesting idea :-).

A more general solution would be to have some sort of reorganisation in the VM.  Currently the compaction algorithm used with global GC scrambles things.  It would be nice if the incremental global compactor could not scramble, and its my intent to at least avoid scrambling, if not undo it in the incremental global gc (which has yet to be started).

--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4800035.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.




--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Chris Muller-3
In reply to this post by Levente Uzonyi-2
> In general the locality of the links are worse. In the example they are
> allocated almost next to each other.
> A slightly better example would be this:
>
> | buffer |
> buffer := Array new: 1000. "This will hold the filler objects."
> #(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
>         | list |
>         list := LinkedList new.
>         1 to: 1000 do: [ :index |
>                 list add: (StackLink with: nil).
>                 buffer at: index put: (Array new: fillerSize) ].
>         Smalltalk garbageCollect.
>         fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]
>
> { "gap between nodes (x4 bytes)" -> "runs"
> 1024->'78,000 per second.'.
> 2048->'78,600 per second.'.
> 4096->'77,500 per second.'.
> 8192->'73,300 per second.'.
> 16384->'70,200 per second.'. 32768->'66,400 per second.'.
> 65536->'56,400 per second.'}
>
> (The result is '45,700 per second.' for Arrays)
>
> The results depend on the CPU (and probably on the OS too), but the numbers
> are still pretty good. Probably a longer list would cause more problem for
> the caches.
> The same for 10k elements in the list:
>
> {
> 1024->'3,770 per second.'.
> 2048->'4,780 per second.'.
> 4096->'4,220 per second.'.
> 8192->'2,480 per second.'.
> 16384->'2,110 per second.'}
>
> (The result is '4,380 per second.' for Arrays)
>
> So the speed of a list depends on the cache locality of its elements. The
> performance is still comparable with arrays for practical cases, but may be
> both faster and slower depending on the cache locality.

Geez you guys /seriously/ rock.  I never thought down to that level of
(CPU) cache locality.

You've got me feeling like I should go back to some
performance-critical code which is enumerating Array's and see about
converting them to LinkedLists..  too bad I don't have time to do
that!

Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Eliot Miranda-2


On Fri, Jan 16, 2015 at 11:49 AM, Chris Muller <[hidden email]> wrote:
> In general the locality of the links are worse. In the example they are
> allocated almost next to each other.
> A slightly better example would be this:
>
> | buffer |
> buffer := Array new: 1000. "This will hold the filler objects."
> #(1024 2048 4096 8192 16384 32768 65536) collect: [ :fillerSize |
>         | list |
>         list := LinkedList new.
>         1 to: 1000 do: [ :index |
>                 list add: (StackLink with: nil).
>                 buffer at: index put: (Array new: fillerSize) ].
>         Smalltalk garbageCollect.
>         fillerSize -> [ list do: [ :ea | 1 + 2 ] ] bench ]
>
> { "gap between nodes (x4 bytes)" -> "runs"
> 1024->'78,000 per second.'.
> 2048->'78,600 per second.'.
> 4096->'77,500 per second.'.
> 8192->'73,300 per second.'.
> 16384->'70,200 per second.'. 32768->'66,400 per second.'.
> 65536->'56,400 per second.'}
>
> (The result is '45,700 per second.' for Arrays)
>
> The results depend on the CPU (and probably on the OS too), but the numbers
> are still pretty good. Probably a longer list would cause more problem for
> the caches.
> The same for 10k elements in the list:
>
> {
> 1024->'3,770 per second.'.
> 2048->'4,780 per second.'.
> 4096->'4,220 per second.'.
> 8192->'2,480 per second.'.
> 16384->'2,110 per second.'}
>
> (The result is '4,380 per second.' for Arrays)
>
> So the speed of a list depends on the cache locality of its elements. The
> performance is still comparable with arrays for practical cases, but may be
> both faster and slower depending on the cache locality.

Geez you guys /seriously/ rock.  I never thought down to that level of
(CPU) cache locality.

You've got me feeling like I should go back to some
performance-critical code which is enumerating Array's and see about
converting them to LinkedLists..  too bad I don't have time to do
that!

Please, folks, unless you're really pressed don't bother.  We should have Sista deployed this year and that will take care of this nicely. 
--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

re: Understanding Object Structures

ccrraaiigg
In reply to this post by Chris Muller-3

Hi Chris--

> For years, I have been /totally/ interested in Alexandre's project,
> not just to represent in-memory models but large Magma models,
> visually.  Roassal might struggle to perform with millions of nodes, I
> don't know, but I know Craig used Walrus to render a large model.

     Yes, Walrus is fantastic. There's still nothing else that does
interactive hyperbolic trees, as far as I know.


-C

--
Craig Latta
netjam.org
+31   6 2757 7177 (SMS ok)
+ 1 415  287 3547 (no SMS)


Reply | Threaded
Open this post in threaded view
|

Re: OrderedDictionary (was: Re: [squeak-dev] Re: The Trunk: Collections-mt.593.mcz)

Eliot Miranda-2
In reply to this post by marcel.taeumel (old)
Hi Marcel,

On Fri, Jan 16, 2015 at 5:54 AM, Marcel Taeumel <[hidden email]> wrote:
Ah, okay. I was curious about the performance of iterating over an array vs.
the linked list:

a := Array new: 1000.
l := LinkedList new.

1000 timesRepeat: [l add: (StackLink with: nil)].

[a do: [:ea | 1 + 2]] bench. "'94,600 per second.'"

[l do: [:ea | 1 + 2]] bench. "'159,000 per second.'"

I expected the opposite! :-O Why is that so?

Best,
Marcel

OK, here's the profiling results, all on a .2GHz Core i7 MBP (and these numbers are noisy; there's a lot else going on on my machine).  First, using Cog and a slightly modified benchmark that uses the elements of the collections so that the benchmark is "real", and so that each block does the same ammount of work

| a l |
a := Array new: 1000 withAll: 1 -> 1.
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
{ [| t | t := 0. a do: [:ea | t := t + ea key]. t] bench.
  [| t | t := 0. l do: [:ea | t := t + ea element]. t] bench} #('51,900 per second.' '70,400 per second.')

Now the same code on Spur:
#('57,100 per second.' '63,300 per second.')

at: is faster, but (alas) LinkedList>>do: is slower because in Spur #== requires a read barrier to check for possible forwarders, so in "[aLink == nil] whileFalse:" Spur has to fetch a field from aLink to ensure it is not forwarded.

At the VM level we can see that Object>>at: is expensive, but the block overhead dominates.  First Cog

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.v3/Fast.app/Contents/MacOS/Squeak  1/16/2015 
eden size: 2,097,152  stack pages: 160  code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior.  clear prior.  
5.004 seconds; sampling frequency 1471 hz
7340 samples in the VM (7363 samples in the entire program)  99.69% of total

7302 samples in generated vm code 99.48% of entire vm (99.17% of total)
38 samples in vanilla vm code   0.52% of entire vm (  0.52% of total)

% of generated vm code (% of total) (samples) (cumulative)
30.33%    (30.08%) BlockClosure>>value: (2215) (30.33%)
18.47%    (18.32%) Object>>at: (1349) (48.81%)
16.83%    (16.69%) UndefinedObject>>DoIt (1229) (65.64%)
14.43%    (14.31%) SequenceableCollection>>do: (1054) (80.07%)
  9.57%    (  9.49%) LookupKey>>key (699) (89.65%)
  6.35%    (  6.30%) SmallInteger>>+ (464) (96.00%)
  3.75%    (  3.72%) PIC at: (274) (99.75%)
  0.07%    (  0.07%) BlockClosure>>value (5) (99.82%)
  0.05%    (  0.05%) BlockClosure>>bench (4) (99.88%)
  0.04%    (  0.04%) PIC size (3) (99.92%)
  0.04%    (  0.04%) ceClosureCopyTrampoline (3) (99.96%)
  0.03%    (  0.03%) Time class>>mi...condClockValue (2) (99.99%)
  0.01%    (  0.01%) ArrayedCollection>>size (1) (100.0%)


| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior.  clear prior.  
5.001 seconds; sampling frequency 1452 hz
7240 samples in the VM (7260 samples in the entire program)  99.72% of total

7186 samples in generated vm code 99.25% of entire vm (98.98% of total)
54 samples in vanilla vm code   0.75% of entire vm (  0.74% of total)

% of generated vm code (% of total) (samples) (cumulative)
29.22%    (28.93%) BlockClosure>>value: (2100) (29.22%)
26.32%    (26.05%) UndefinedObject>>DoIt (1891) (55.54%)
22.92%    (22.69%) LinkedList>>do: (1647) (78.46%)
  8.14%    (  8.06%) SmallInteger>>+ (585) (86.60%)
  6.76%    (  6.69%) StackLink>>element (486) (93.36%)
  6.46%    (  6.39%) Link>>nextLink (464) (99.82%)
  0.08%    (  0.08%) Time class>>...ndClockValue (6) (99.90%)
  0.04%    (  0.04%) BlockClosure>>bench (3) (99.94%)
  0.03%    (  0.03%) BlockClosure>>value (2) (99.97%)
  0.01%    (  0.01%) ceClosureCopyTrampoline (1) (99.99%)
  0.01%    (  0.01%) ceCreateNew...Trampoline (1) (100.0%)


and then Spur:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak  1/16/2015 
eden size: 4,101,312  stack pages: 160  code size: 1,048,576

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. a do: [:ea | t := t + ea key]. t] bench

gc prior.  clear prior.  
5.002 seconds; sampling frequency 1471 hz
7340 samples in the VM (7359 samples in the entire program)  99.74% of total

7330 samples in generated vm code 99.86% of entire vm (99.61% of total)
10 samples in vanilla vm code   0.14% of entire vm (  0.14% of total)

% of generated vm code (% of total) (samples) (cumulative)
33.06%    (32.93%) BlockClosure>>value: (2423) (33.06%)
18.66%    (18.59%) UndefinedObject>>DoIt (1368) (51.72%)
17.97%    (17.90%) SequenceableCollection>>do: (1317) (69.69%)
13.33%    (13.28%) Object>>at: (977) (83.02%)
  8.84%    (  8.81%) SmallInteger>>+ (648) (91.86%)
  4.90%    (  4.88%) PIC at: (359) (96.75%)
  3.08%    (  3.07%) LookupKey>>key (226) (99.84%)
  0.05%    (  0.05%) ceSmallBlockContext (4) (99.89%)
  0.04%    (  0.04%) BlockClosure>>value (3) (99.93%)
  0.03%    (  0.03%) Time class>>mi...condClockValue (2) (99.96%)
  0.01%    (  0.01%) ArrayedCollection>>size (1) (99.97%)
  0.01%    (  0.01%) BlockClosure>>bench (1) (99.99%)
  0.01%    (  0.01%) EventSensor>>eventTickler (1) (100.0%)

| l |
l := LinkedList new.
1000 timesRepeat: [l add: (StackLink with: 1)].
[| t | t := 0. l do: [:ea | t := t + ea element]. t] bench

gc prior.  clear prior.  
5.002 seconds; sampling frequency 1464 hz
7315 samples in the VM (7321 samples in the entire program)  99.92% of total

7307 samples in generated vm code 99.89% of entire vm (99.81% of total)
8 samples in vanilla vm code   0.11% of entire vm (  0.11% of total)

% of generated vm code (% of total) (samples) (cumulative)
27.33%    (27.28%) UndefinedObject>>DoIt (1997) (27.33%)
27.11%    (27.06%) LinkedList>>do: (1981) (54.44%)
24.39%    (24.34%) BlockClosure>>value: (1782) (78.83%)
11.15%    (11.13%) SmallInteger>>+ (815) (89.98%)
  5.30%    (  5.29%) Link>>nextLink (387) (95.28%)
  4.50%    (  4.49%) StackLink>>element (329) (99.78%)
  0.12%    (  0.12%) BlockClosure>>bench (9) (99.90%)
  0.07%    (  0.07%) Time class>>...ndClockValue (5) (99.97%)
  0.01%    (  0.01%) ceSmallBlockContext (1) (99.99%)
  0.01%    (  0.01%) BlockClosure>>value (1) (100.0%)


So to try and make sense of this we can say that the cost of at: vs the cost of block evaluation is, in Cog
1349 + 274 / 2215 ~= 0.73

and in Spur
977 + 359 / 2423 ~= 0.55

at: being much faster in Spur because the object representation makes it much simpler to do the type analysis (since Object>>at: works on any indexable class) and bounds check.


and approximately the difference between the Array and the LinkedList comes down to the relative costs of at: & SequenceableCollection>>#do: vs nextLink & LinkedList>>#do:, in Cog

1349 + 274 + 1054 / (1647 + 464) ~= 1.27

and in Spur

977 + 359 / (1981 + 387) ~= 0.56


Now Sista will make a big difference because it won't just eliminate the bounds checking overhead in at:, it'll also inline the block, so it'll be much more like

| a |
a := Array new: 1000 withAll: 1 -> 1.
[| t | t := 0. 1 to: 1000 do: [:i| t := t + (a at: i) key]. t] bench

Cog: 72,700 per second.
Spur: 87,300 per second.

but faster still because the bounds check in at: is completely eliminated.
--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Understanding Object Structures

Trygve
In reply to this post by Eliot Miranda-2
Eliot,
I suggest using BabySRI to help students internalize the difference between a class oriented and an object oriented mindset.  My message "A puzzle for the  New Year" of 02.01.2015 12:01 has an example.
--Trygve

On 16.01.2015 16:55, Eliot Miranda wrote:
Hi All,

On Jan 16, 2015, at 7:36 AM, Eliot Miranda [hidden email] wrote:

+++
This is interesting.  (Marcel, Chris, forgive me; I'm presuming; please don't take this personally).  Marcel above appears to lack an intuition about the structure of Array vs LinkedList.  And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers.

As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head.  Those pictures are key to my approach to design and optimization.

I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model.  For me, I read the blue book first, which is replete with pictures.

I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs.  I wonder how useful it would be to provide support for designers to include pictorial representations in class comments.

Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment.

A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration.

Either approach would make an interesting project, yes?

--

The essence of object orientation is that
objects collaborate  to achieve a goal.
The class hierarchy is in the nature of a comment;
it has no run time significance.
---
Trygve Reenskaug      
[hidden email]
Morgedalsvn. 5A       
http://folk.uio.no/trygver/
N-0378 Oslo             
http://fullOO.info
Norway                     Tel: (+47) 22 49 57 27



12