Hi all, This is what the Dictionary>>at:
implementation looks like in vw7.4.1: at: key "Answer the
value at key. If key is not found, raise a signal." ^self at: key
ifAbsent: [self keyNotFoundErrorFor: #at: index: key] The parameter for the ifAbsent: block is a copying
block because it references #self & #key. This means that for each
invocation of the #at: method, a copying block is created with two copied
values. This is unfortunate because most the times, the key will be present
when sending #at: and the ifAbsent block will never be evaluated. To work around this I added a shared variable called
#UndefinedValue on the Dictionary class and initialized it with Object new. This allows me to the change the #at: method to
this: at: key "Answer the
value at key. If key is not found, raise a signal." | result | result := self at:
key ifAbsent: [UndefinedValue]. result ==
UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key]. ^result In this implementation, the copying block is
missing, and when I run some benchmarks with it, it runs in 75% of the original
time. There are other methods on Dictionary that can be optimized in the same
way, so it might be interesting to introduce this kind of optimization in the
base product. But this was not the reason for the mail. It made me
think that in many cases copying blocks (or even full blocks) are not needed. In
the given example, there’s no need to make a copying block because, when
it gets evaluated, the calling stack frame will still exist and the block could
just have used the stack variables instead of copying these values locally. It would be very interesting to see the VM evolve in
such a way that it can detect these cases. I believe there are techniques
available that can achieve this. Strongtalk does it by using method inlining, but
perhaps there are other ways to do this? I can imagine that if this would find
its way into all the smalltalk code, it could offer a significant speedup
because blocks are used pretty much everywhere. I wonder whether Cincom has plans to put these kinds
of optimizations into their VM? Apart from this, is there documentation
available describing the kinds of optimizations that happen in the VisualWorks
VM? Thanks, Mark _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hmm, this made me think. As a typical curly brace compiler has
switches for optimization, suppose we had something similar. I bet if method
inlining were done at the bytecode level that we would see a
reasonable performance improvement for some benchmarks, and applications. Doing at the bytecode level would be a
minimal impact on the code base and would probably be fairly easy to
do. There would be a little additional work so the debugger was
aware of the change. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Mark Plas Hi all, This is what the Dictionary>>at:
implementation looks like in vw7.4.1: at: key "Answer the
value at key. If key is not found, raise a signal." ^self at: key
ifAbsent: [self keyNotFoundErrorFor: #at: index: key] The parameter for the ifAbsent: block is a copying
block because it references #self & #key. This means that for each
invocation of the #at: method, a copying block is created with two copied
values. This is unfortunate because most the times, the key will be present
when sending #at: and the ifAbsent block will never be evaluated. To work around this I added a shared variable called
#UndefinedValue on the Dictionary class and initialized it with Object new. This allows me to the change the #at: method to this: at: key "Answer the
value at key. If key is not found, raise a signal." | result | result := self at:
key ifAbsent: [UndefinedValue]. result ==
UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key]. ^result In this implementation, the copying block is
missing, and when I run some benchmarks with it, it runs in 75% of the original
time. There are other methods on Dictionary that can be optimized in the same
way, so it might be interesting to introduce this kind of optimization in the
base product. But this was not the reason for the mail. It made me
think that in many cases copying blocks (or even full blocks) are not needed.
In the given example, there’s no need to make a copying block because,
when it gets evaluated, the calling stack frame will still exist and the block
could just have used the stack variables instead of copying these values
locally. It would be very interesting to see the VM evolve in
such a way that it can detect these cases. I believe there are techniques
available that can achieve this. Strongtalk does it by using method inlining,
but perhaps there are other ways to do this? I can imagine that if this would
find its way into all the smalltalk code, it could offer a significant speedup
because blocks are used pretty much everywhere. I wonder whether Cincom has plans to put these kinds
of optimizations into their VM? Apart from this, is there documentation
available describing the kinds of optimizations that happen in the VisualWorks
VM? Thanks, Mark _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Mark Plas
Is native code cached at the level of granularity of CompiledMethod
or BlockClosure? I think your idea would only be useful for the latter, and if the
inlining was restricted to whole BlockClosures, and if the native code caching worked
so that each inlined BlockClosure mapped to the same native code block. Steve From:
[hidden email] [mailto:[hidden email]] On Behalf Of Terry
Raymond Hmm, this made me think. As a typical curly brace compiler has switches for optimization,
suppose we had something similar. I bet if method inlining were done at the bytecode level that we would see a reasonable performance
improvement for some benchmarks, and applications. Doing at the bytecode level would be a minimal impact on the code base and would probably be fairly easy to do. There would be a little additional work so the debugger was aware of the change. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Mark
Plas Hi
all, This
is what the Dictionary>>at: implementation looks like in vw7.4.1: at:
key
"Answer the value at key. If key is not found, raise a signal."
^self at: key ifAbsent: [self keyNotFoundErrorFor: #at: index: key] The
parameter for the ifAbsent: block is a copying block because it references
#self & #key. This means that for each invocation of the #at: method, a
copying block is created with two copied values. This is unfortunate because
most the times, the key will be present when sending #at: and the ifAbsent
block will never be evaluated. To
work around this I added a shared variable called #UndefinedValue on the
Dictionary class and initialized it with Object new. This
allows me to the change the #at: method to this: at:
key
"Answer the value at key. If key is not found, raise a signal."
| result |
result := self at: key ifAbsent: [UndefinedValue].
result == UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key].
^result In
this implementation, the copying block is missing, and when I run some
benchmarks with it, it runs in 75% of the original time. There are other
methods on Dictionary that can be optimized in the same way, so it might be
interesting to introduce this kind of optimization in the base product. But
this was not the reason for the mail. It made me think that in many cases
copying blocks (or even full blocks) are not needed. In the given example,
there’s no need to make a copying block because, when it gets evaluated, the
calling stack frame will still exist and the block could just have used the
stack variables instead of copying these values locally. It
would be very interesting to see the VM evolve in such a way that it can detect
these cases. I believe there are techniques available that can achieve this.
Strongtalk does it by using method inlining, but perhaps there are other ways
to do this? I can imagine that if this would find its way into all the smalltalk
code, it could offer a significant speedup because blocks are used pretty much
everywhere. I
wonder whether Cincom has plans to put these kinds of optimizations into their
VM? Apart from this, is there documentation available describing the kinds of
optimizations that happen in the VisualWorks VM? Thanks, Mark _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Steve Well, I guess I should provide more info. What I was thinking of, is how the RB does
the inlining. If #at:ifAbsent: is defined as at: aKey ifAbsent: aBlock |
index obj| index
:= self findKeyOrNil: key. obj
:= self basicAt: index. ^obj
== nil ifTrue:
aBlock <-
not really how it is written ifFalse:
[obj value] Then inlining of #at: would turn into at: key |
index obj| index
:= self findKeyOrNil: key. obj
:= self basicAt: index. ^obj
== nil ifTrue:
[self keyNotFoundErrorFor: #at: index: key] ifFalse:
[obj value] I think the inliner could be made smarter
so it could transform block arguments into the #ifTrue:ifFalse: form so they
would no longer require a copying block. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Steven Kelly Is native code
cached at the level of granularity of CompiledMethod or BlockClosure? I think
your idea would only be useful for the latter, and if the inlining was
restricted to whole BlockClosures, and if the native code caching worked so
that each inlined BlockClosure mapped to the same native code block. Steve From:
[hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond Hmm, this made me think. As a typical curly brace compiler has
switches for optimization, suppose we had something similar. I bet if method
inlining were done at the bytecode level that we would see a
reasonable performance improvement for some benchmarks, and applications. Doing at the bytecode level would be a
minimal impact on the code base and would probably be fairly easy to
do. There would be a little additional work so the debugger was
aware of the change. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Mark Plas Hi all, This is what the Dictionary>>at:
implementation looks like in vw7.4.1: at: key "Answer the
value at key. If key is not found, raise a signal." ^self at: key
ifAbsent: [self keyNotFoundErrorFor: #at: index: key] The parameter for the ifAbsent: block is a copying
block because it references #self & #key. This means that for each
invocation of the #at: method, a copying block is created with two copied
values. This is unfortunate because most the times, the key will be present
when sending #at: and the ifAbsent block will never be evaluated. To work around this I added a shared variable called
#UndefinedValue on the Dictionary class and initialized it with Object new. This allows me to the change the #at: method to this: at: key "Answer the
value at key. If key is not found, raise a signal." | result | result := self at:
key ifAbsent: [UndefinedValue]. result ==
UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key]. ^result In this implementation, the copying block is
missing, and when I run some benchmarks with it, it runs in 75% of the original
time. There are other methods on Dictionary that can be optimized in the same
way, so it might be interesting to introduce this kind of optimization in the
base product. But this was not the reason for the mail. It made me
think that in many cases copying blocks (or even full blocks) are not needed.
In the given example, there’s no need to make a copying block because,
when it gets evaluated, the calling stack frame will still exist and the block
could just have used the stack variables instead of copying these values
locally. It would be very interesting to see the VM evolve in
such a way that it can detect these cases. I believe there are techniques
available that can achieve this. Strongtalk does it by using method inlining,
but perhaps there are other ways to do this? I can imagine that if this would
find its way into all the smalltalk code, it could offer a significant speedup
because blocks are used pretty much everywhere. I wonder whether Cincom has plans to put these kinds
of optimizations into their VM? Apart from this, is there documentation
available describing the kinds of optimizations that happen in the VisualWorks
VM? Thanks, Mark _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
I find it remarkable that, with everything that goes on in a
dictionary lookup, that the cost of a copying versus a clean block would
be 25% of the time. I'm inclined to be suspicious of the
benchmark.
At 09:56 AM 10/8/2008, Terry Raymond wrote: Steve --
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Terry Raymond
Hi Terry, How would you
know when to begin inlining a message send? You’d need to keep track of
which parts of the code are ‘hot spots’. Would the VM do this for
you and send back some notification messages? I also think you’d need to
specialize a lot of methods. In the case of the #at: example, you cannot just
inline the #at:ifAbsent: code because it is overridden several times in the
subclasses of Dictionary. You’d have to duplicate the optimized version
of #at: for each subclass that overrides #at:ifAbsent:. But if you do this on
the smalltalk level, a developer would all of a sudden see methods
appearing/disappearing in his classes. Or are you
speaking of method inlining at compile time? Mark From:
[hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond Steve Well, I guess I should provide more info. What I was thinking of, is how the RB does
the inlining. If #at:ifAbsent: is defined as at: aKey ifAbsent: aBlock
| index obj|
index := self findKeyOrNil: key.
obj := self basicAt: index.
^obj == nil
ifTrue:
aBlock
<- not really how it is written
ifFalse: [obj value] Then inlining of #at: would turn into at: key
| index obj|
index := self findKeyOrNil: key.
obj := self basicAt: index.
^obj == nil
ifTrue: [self keyNotFoundErrorFor:
#at: index: key]
ifFalse: [obj value] I think the inliner could be made smarter
so it could transform block arguments into the #ifTrue:ifFalse: form so they
would no longer require a copying block. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Steven Kelly Is native code
cached at the level of granularity of CompiledMethod or BlockClosure? I think
your idea would only be useful for the latter, and if the inlining was
restricted to whole BlockClosures, and if the native code caching worked so
that each inlined BlockClosure mapped to the same native code block. Steve From:
[hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond Hmm, this made me think. As a typical curly brace compiler has
switches for optimization, suppose we had something similar. I bet if method
inlining were done at the bytecode level that we would see a
reasonable performance improvement for some benchmarks, and applications. Doing at the bytecode level would be a
minimal impact on the code base and would probably be fairly easy to
do. There would be a little additional work so the debugger was
aware of the change. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Mark Plas Hi all, This is what the Dictionary>>at:
implementation looks like in vw7.4.1: at: key "Answer the
value at key. If key is not found, raise a signal." ^self at: key
ifAbsent: [self keyNotFoundErrorFor: #at: index: key] The parameter for the ifAbsent: block is a copying
block because it references #self & #key. This means that for each
invocation of the #at: method, a copying block is created with two copied
values. This is unfortunate because most the times, the key will be present when
sending #at: and the ifAbsent block will never be evaluated. To work around this I added a shared variable called
#UndefinedValue on the Dictionary class and initialized it with Object new. This allows me to the change the #at: method to
this: at: key "Answer the
value at key. If key is not found, raise a signal." | result | result := self at:
key ifAbsent: [UndefinedValue]. result ==
UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key]. ^result In this implementation, the copying block is
missing, and when I run some benchmarks with it, it runs in 75% of the original
time. There are other methods on Dictionary that can be optimized in the same
way, so it might be interesting to introduce this kind of optimization in the
base product. But this was not the reason for the mail. It made me
think that in many cases copying blocks (or even full blocks) are not needed.
In the given example, there’s no need to make a copying block because,
when it gets evaluated, the calling stack frame will still exist and the block
could just have used the stack variables instead of copying these values
locally. It would be very interesting to see the VM evolve in
such a way that it can detect these cases. I believe there are techniques
available that can achieve this. Strongtalk does it by using method inlining,
but perhaps there are other ways to do this? I can imagine that if this would
find its way into all the smalltalk code, it could offer a significant speedup
because blocks are used pretty much everywhere. I wonder whether Cincom has plans to put these kinds
of optimizations into their VM? Apart from this, is there documentation
available describing the kinds of optimizations that happen in the VisualWorks
VM? Thanks, Mark _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Alan Knight-2
Hi Alan, Here’s my
test at the end of the mail. It was not so
much a test of the performance of #at: in a Dictionary. I was trying see what
the overhead of a copying block was compared to a clean block and tested it by
modifying the #at: method. The reason for posting here was to find out if there
is anything going on at Cincom concerning VM performance that could address the
issue of avoiding copying blocks when they’re not needed. Mark Here’s the
test: |d| d :=
IdentityDictionary new. d at: #ababa put:
1. d at: #abcacba
put: 2. d at: #wwabwaba
put: 3. d at: #tahhbg
put: 4. Time
millisecondsToRun: [1000000 timesRepeat: [d
at: #ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at: #ababa. d at: #abcacba. d at: #wwabwaba. d at: #tahhbg].] From:
[hidden email] [mailto:[hidden email]] On Behalf Of Alan Knight I find it remarkable that, with everything that goes on in a dictionary
lookup, that the cost of a copying versus a clean block would be 25% of the
time. I'm inclined to be suspicious of the benchmark. Steve From:
[hidden email] [[hidden email]] On Behalf Of Steven Kelly From:
[hidden email] [[hidden email]] On Behalf Of Mark Plas --
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Terry Raymond
The inlining seems like fun, the example got me thinking though of
another Collection method I'd like see go faster, which again led me to thinking about removeAll:, which again meant I was in a bad place for the rest of the day. Boo! Just so the day isn't entirely wasted, here's the one which it got me thinking about: OrderedCollection >> removeAllSuchThat: aBlock "Evaluate aBlock for each element of the receiver. Remove each element for which aBlock evaluates to true. " | element newCollection index removed | newCollection := self copyEmpty. index := firstIndex. removed := 0. [index <= lastIndex] whileTrue: [element := self basicAt: index. (aBlock value: element) ifTrue: [newCollection add: element. self basicAt: index - removed put: (self basicAt: index + 1). removed := removed + 1] ifFalse: [removed > 0 ifTrue: [self basicAt: index - removed put: (self basicAt: index)]]. index := index + 1]. lastIndex - removed + 1 to: lastIndex do: [:nilIndex | self basicAt: nilIndex put: nil]. lastIndex := lastIndex - removed. ^newCollection Testing an example in workspace, order := OrderedCollection new. order2 := OrderedCollection new. 1 to: 100000 do: [:i | order add: i. order2 add: i]. Time millisecondsToRun: [order removeAllSuchThat: [:each | each >= 20000 and: [each <= 50000]]]. 31242 Time millisecondsToRun: [order2 removeAllSuchThatNew: [:each | each >= 20000 and: [each <= 50000]]]. 68 order = order2 true Cheers, Henry PS: Why is there no removeAllSuchThat:[] for all collections? It replaces the common case aCollection removeAll: (aCollection select:[:each | each someThing]) quite nicely, has a simple implementation, and can be optimized much more easily for specific collections than removeAll:, due to return value constraints posed by the already existing VW implementations. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Mark Plas
> -----Original Message-----
> From: Mark Plas [mailto:[hidden email]] > Sent: Wednesday, October 08, 2008 11:04 AM > To: Terry Raymond; [hidden email] > Subject: RE: [vwnc] VM optimization: code inlining? > > Hi Terry, > > > > How would you know when to begin inlining a message send? You'd need to keep track of which parts of > the code are 'hot spots'. Would the VM do this for you and send back some notification messages? I > also think you'd need to specialize a lot of methods. In the case of the #at: example, you cannot > just inline the #at:ifAbsent: code because it is overridden several times in the subclasses of > Dictionary. You'd have to duplicate the optimized version of #at: for each subclass that overrides > #at:ifAbsent:. But if you do this on the smalltalk level, a developer would all of a sudden see > methods appearing/disappearing in his classes. There are two problems here. The first is identifing what methods should be inlined. The second is what does inlining mean. The inlining I am thinking of would be to use the RB inliner to dynamically inline a method and compile it. The new compiled method would replace the old one, but it would refer back to the original non-lined source. This does create an additional problem of the debugger source map. Additionally, the decompiled method would not look at all like the orginal source. Identifying which methods to inline and how deep to inline them could be done manually or with a special tool. It all depends how sophisticated you want to be. Basically, I am suggesting that the inlining be done in the image, so the resulting bytecodes produce improved performance, and not in the VM. > > Or are you speaking of method inlining at compile time? > > > > Mark > > > > ________________________________ > > From: [hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond > Sent: woensdag 8 oktober 2008 15:56 > To: [hidden email] > Subject: Re: [vwnc] VM optimization: code inlining? > > > > Steve > > > > Well, I guess I should provide more info. > > > > What I was thinking of, is how the RB does the inlining. > > > > If #at:ifAbsent: is defined as > > > > at: aKey ifAbsent: aBlock > > > > | index obj| > > index := self findKeyOrNil: key. > > obj := self basicAt: index. > > ^obj == nil > > ifTrue: aBlock <- not really how it is written > > ifFalse: [obj value] > > > > Then inlining of #at: would turn into > > > > at: key > > > > | index obj| > > index := self findKeyOrNil: key. > > obj := self basicAt: index. > > ^obj == nil > > ifTrue: [self keyNotFoundErrorFor: #at: index: key] > > ifFalse: [obj value] > > > > I think the inliner could be made smarter so it could transform block arguments > > into the #ifTrue:ifFalse: form so they would no longer require a copying > > block. > > Terry > > =========================================================== > Terry Raymond > Crafted Smalltalk > 80 Lazywood Ln. > Tiverton, RI 02878 > (401) 624-4517 [hidden email] > <http://www.craftedsmalltalk.com> > =========================================================== > > ________________________________ > > From: [hidden email] [mailto:[hidden email]] On Behalf Of Steven Kelly > Sent: Wednesday, October 08, 2008 8:50 AM > To: [hidden email] > Subject: Re: [vwnc] VM optimization: code inlining? > > > > Is native code cached at the level of granularity of CompiledMethod or BlockClosure? I think your > idea would only be useful for the latter, and if the inlining was restricted to whole BlockClosures, > and if the native code caching worked so that each inlined BlockClosure mapped to the same native > code block. > > > > Steve > > > > From: [hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond > Sent: 08 October 2008 15:30 > To: [hidden email] > Subject: Re: [vwnc] VM optimization: code inlining? > > > > Hmm, this made me think. > > > > As a typical curly brace compiler has switches for optimization, suppose > > we had something similar. I bet if method inlining were done at the > > bytecode level that we would see a reasonable performance improvement > > for some benchmarks, and applications. > > > > Doing at the bytecode level would be a minimal impact on the code > > base and would probably be fairly easy to do. There would be a > > little additional work so the debugger was aware of the change. > > > > Terry > > =========================================================== > Terry Raymond > Crafted Smalltalk > 80 Lazywood Ln. > Tiverton, RI 02878 > (401) 624-4517 [hidden email] > <http://www.craftedsmalltalk.com> > =========================================================== > > ________________________________ > > From: [hidden email] [mailto:[hidden email]] On Behalf Of Mark Plas > Sent: Wednesday, October 08, 2008 7:08 AM > To: [hidden email] > Subject: [vwnc] VM optimization: code inlining? > > > > Hi all, > > > > This is what the Dictionary>>at: implementation looks like in vw7.4.1: > > > > at: key > > "Answer the value at key. If key is not found, raise a signal." > > > > ^self at: key ifAbsent: [self keyNotFoundErrorFor: #at: index: key] > > > > The parameter for the ifAbsent: block is a copying block because it references #self & #key. This > means that for each invocation of the #at: method, a copying block is created with two copied values. > This is unfortunate because most the times, the key will be present when sending #at: and the > ifAbsent block will never be evaluated. > > > > To work around this I added a shared variable called #UndefinedValue on the Dictionary class and > initialized it with Object new. > > This allows me to the change the #at: method to this: > > > > > > at: key > > "Answer the value at key. If key is not found, raise a signal." > > > > | result | > > result := self at: key ifAbsent: [UndefinedValue]. > > result == UndefinedValue ifTrue: [self keyNotFoundErrorFor: #at: index: key]. > > ^result > > > > In this implementation, the copying block is missing, and when I run some benchmarks with it, it runs > in 75% of the original time. There are other methods on Dictionary that can be optimized in the same > way, so it might be interesting to introduce this kind of optimization in the base product. > > > > But this was not the reason for the mail. It made me think that in many cases copying blocks (or even > full blocks) are not needed. In the given example, there's no need to make a copying block because, > when it gets evaluated, the calling stack frame will still exist and the block could just have used > the stack variables instead of copying these values locally. > > > > It would be very interesting to see the VM evolve in such a way that it can detect these cases. I > believe there are techniques available that can achieve this. Strongtalk does it by using method > inlining, but perhaps there are other ways to do this? I can imagine that if this would find its way > into all the smalltalk code, it could offer a significant speedup because blocks are used pretty much > everywhere. > > > > I wonder whether Cincom has plans to put these kinds of optimizations into their VM? Apart from this, > is there documentation available describing the kinds of optimizations that happen in the VisualWorks > VM? > > > > Thanks, > > Mark > > Terry =========================================================== Terry Raymond Crafted Smalltalk 80 Lazywood Ln. Tiverton, RI 02878 (401) 624-4517 [hidden email] <http://www.craftedsmalltalk.com> =========================================================== _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Henrik Sperre Johansen
To those who care, here's the workspace test for another notoriously
"slow to remove from" collection, with removeAllSuchThat: implemented in a similar way: set := Set withAll: (2500 to: 20000). set2 := Set withAll: (2500 to: 20000). Time millisecondsToRun: [set removeAll: (set select: [:each | each >= 7500 and: [ each <= 12500]])]. 30523 Time millisecondsToRun: [set2 removeAllSuchThat: [:each | each >= 20000 and: [ each <= 50000]]] 61 set asOrderedCollection = set2 asOrderedCollection true I planned to run them with the same sizes as the OrderedCollection example, but I gave up on the removeAll: after a minute. removeAllSuchThat: finished in 1.5 seconds... Cheers, Henry PS. I played around some more with Pharo after posting, and even Squeak already has this for OrderedCollection. They're missing the equivalent for Set for some reason though, probably not used often. PPS. Is it just me it bothers that in such a mature technology as VisualWorks, I sometimes have to stop my flow of thoughts -> code to think about if the collection type I want to remove objects from will make performance blow up in my face? Henrik Johansen wrote: > The inlining seems like fun, the example got me thinking though of > another Collection method I'd like see go faster, which again led me to > thinking about removeAll:, which again meant I was in a bad place for > the rest of the day. Boo! > > Just so the day isn't entirely wasted, here's the one which it got me > thinking about: > > OrderedCollection >> > removeAllSuchThat: aBlock > "Evaluate aBlock for each element of the receiver. Remove each > element for > which aBlock evaluates to true. " > > | element newCollection index removed | > newCollection := self copyEmpty. > index := firstIndex. > removed := 0. > [index <= lastIndex] > whileTrue: > [element := self basicAt: index. > (aBlock value: element) > ifTrue: > [newCollection add: element. > self > basicAt: index - removed > put: (self basicAt: index + 1). > removed := removed + 1] > ifFalse: > [removed > 0 > ifTrue: > [self > basicAt: index - removed > put: (self basicAt: index)]]. > index := index + 1]. > lastIndex - removed + 1 > to: lastIndex > do: [:nilIndex | self basicAt: nilIndex put: nil]. > lastIndex := lastIndex - removed. > ^newCollection > > Testing an example in workspace, > order := OrderedCollection new. > order2 := OrderedCollection new. > 1 to: 100000 do: [:i | order add: i. > order2 add: i]. > Time millisecondsToRun: [order removeAllSuchThat: [:each | each >= 20000 > and: [each <= 50000]]]. 31242 > Time millisecondsToRun: [order2 removeAllSuchThatNew: [:each | each >= > 20000 and: [each <= 50000]]]. 68 > order = order2 true > > Cheers, > Henry > > PS: Why is there no removeAllSuchThat:[] for all collections? It > replaces the common case aCollection removeAll: (aCollection > select:[:each | each someThing]) quite nicely, has a simple > implementation, and can be optimized much more easily for specific > collections than removeAll:, due to return value constraints posed by > the already existing VW implementations. > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Henrik,
Note that the example you use with sets exacerbates a known limitation of the current implementation. When Set encounters a hash value that could be an identityHash value, it assumes it is and scales it. This is so that these don't cluster at the beginning of the set's table. However, this backfires when most elements in the set have consecutive hash values because the scaling poisons the hash table and so most operations become linear search. This can be clearly seen by doing the following: Time millisecondsToRun: [set removeAll: (set select: [:each | each >= 7500 and: [ each <= 12500]])] 71789 Time millisecondsToRun: [7500 to: 12500 do: [:each | set remove: each ifAbsent: [nil]]] 70104 In other words, the select: takes basically no time, and most of the effort is spent rehashing a hashed collection that has degenerated into a linear search collection. If, for example, we use numbers above 16383, then the situation changes quite a bit. set := Set withAll: (16384 to: 32768). Time millisecondsToRun: [set removeAll: (set select: [:each | each >= 17500 and: [ each <= 22500]])] 6676 Time millisecondsToRun: [17500 to: 22500 do: [:each | set remove: each ifAbsent: [nil]]] 6837 In other words, the time it takes to do the select: is still within measurement error, but the time it takes to complete an operation of roughly the same size has decreased by a factor of 11x. This is because now Integer>>hash is being left as is and is not scaled by Set. <digression> The choice to scale values in a Set is driven by the fact that using a simple Set to hold many objects that implement hash as identityHash will suffer performance problems without it. However, it introduces problems when one is using perfect hash functions as the scaling can break the assumptions made when designing the implementation of hash. So, at least in my personal opinion, what we could do is introduce another message such as scaledIdentityHash (or whatever) that returns an already scaled identityHash value. Then, we could implement Object>>scaledIdentityHash as identityHash bitShift: 14 and reimplement Object>>hash as ^self scaledIdentityHash. This scales identityHash values inside sets, and avoid the issues with perfect hash functions. </digression> You may want to redo your tests taking this into account. Of course, if you want to mass remove objects from a Set, you may want to keep a copy of removeAllSuchThat: around to further speed up things depending on usage. Finally, the selector removeAllSuchThat: sounds like something I could have come up with. However, now I think eject: is much nicer. The original counterpart to eject: was keep:, but as I am writing I get the idea that maybe it should be elect:. This makes eject: and elect: be like reject: and select:, except that the receivers are modified as opposed to answering a copy. Andres. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Henrik Johansen Sent: Wednesday, October 08, 2008 2:51 PM To: [hidden email] Subject: Re: [vwnc] OrderedCollection speedup To those who care, here's the workspace test for another notoriously "slow to remove from" collection, with removeAllSuchThat: implemented in a similar way: set := Set withAll: (2500 to: 20000). set2 := Set withAll: (2500 to: 20000). Time millisecondsToRun: [set removeAll: (set select: [:each | each >= 7500 and: [ each <= 12500]])]. 30523 Time millisecondsToRun: [set2 removeAllSuchThat: [:each | each >= 20000 and: [ each <= 50000]]] 61 set asOrderedCollection = set2 asOrderedCollection true I planned to run them with the same sizes as the OrderedCollection example, but I gave up on the removeAll: after a minute. removeAllSuchThat: finished in 1.5 seconds... Cheers, Henry PS. I played around some more with Pharo after posting, and even Squeak already has this for OrderedCollection. They're missing the equivalent for Set for some reason though, probably not used often. PPS. Is it just me it bothers that in such a mature technology as VisualWorks, I sometimes have to stop my flow of thoughts -> code to think about if the collection type I want to remove objects from will make performance blow up in my face? Henrik Johansen wrote: > The inlining seems like fun, the example got me thinking though of > another Collection method I'd like see go faster, which again led me > to thinking about removeAll:, which again meant I was in a bad place > for the rest of the day. Boo! > > Just so the day isn't entirely wasted, here's the one which it got me > thinking about: > > OrderedCollection >> > removeAllSuchThat: aBlock > "Evaluate aBlock for each element of the receiver. Remove each > element for > which aBlock evaluates to true. " > > | element newCollection index removed | > newCollection := self copyEmpty. > index := firstIndex. > removed := 0. > [index <= lastIndex] > whileTrue: > [element := self basicAt: index. > (aBlock value: element) > ifTrue: > [newCollection add: element. > self > basicAt: index - removed > put: (self basicAt: index + 1). > removed := removed + 1] > ifFalse: > [removed > 0 > ifTrue: > [self > basicAt: index - removed > put: (self basicAt: index)]]. > index := index + 1]. > lastIndex - removed + 1 > to: lastIndex > do: [:nilIndex | self basicAt: nilIndex put: nil]. > lastIndex := lastIndex - removed. > ^newCollection > > Testing an example in workspace, > order := OrderedCollection new. > order2 := OrderedCollection new. > 1 to: 100000 do: [:i | order add: i. > order2 add: i]. > Time millisecondsToRun: [order removeAllSuchThat: [:each | each >= > 20000 > and: [each <= 50000]]]. 31242 > Time millisecondsToRun: [order2 removeAllSuchThatNew: [:each | each >= > 20000 and: [each <= 50000]]]. 68 order = order2 true > > Cheers, > Henry > > PS: Why is there no removeAllSuchThat:[] for all collections? It > replaces the common case aCollection removeAll: (aCollection > select:[:each | each someThing]) quite nicely, has a simple > implementation, and can be optimized much more easily for specific > collections than removeAll:, due to return value constraints posed by > the already existing VW implementations. > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
On Wed, 8 Oct 2008 18:40:46 -0400
"Valloud, Andres" <[hidden email]> wrote: > However, now I think eject: is much nicer. Now I am going to dream of ejected objects dangling from tiny parachutes and slowly drifting downwards ... Good thing you didn't choose evict:, because homeless objects would just make me weep with sorrow. SCNR s. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Stefan Schmiedl wrote:
> On Wed, 8 Oct 2008 18:40:46 -0400 > "Valloud, Andres" <[hidden email]> wrote: > > >> However, now I think eject: is much nicer. >> > > Now I am going to dream of ejected objects dangling from tiny > parachutes and slowly drifting downwards ... > > Good thing you didn't choose evict:, because homeless objects would just > make me weep with sorrow. > > evicted. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Alan Knight-2
Hello, Out of curiosity i tried (using Mark's benchmark) the following experiments. 1) On my machine (iMac intel OSX / VW7.6) the time for his solution is about 2.8". 2) Adding (the "k" argument is of course redundant here) == at: key ifAbsent: aBlock receiver: receiver key: k | index obj| index := self findKeyOrNil: key. obj := self basicAt: index. ^obj == nil ifTrue: [aBlock value: receiver value: k ] ifFalse: [obj value] == and using == at: key ^ self at: key ifAbsent: [ :rcvr :k | a keyNotFoundErrorFor: #at: index: b ] receiver: self key: key == results in about 2.3". We have a clean block here, and little overhead. 3) Using a more 'generic' variant == at: key ifAbsent: aBlock context: context | index obj| index := self findKeyOrNil: key. obj := self basicAt: index. ^obj == nil ifTrue: [aBlock value: context ] ifFalse: [obj value] == with == at: key "Answer the value at key. If key is not found, raise a signal." ^ self at: key ifAbsent: [ :ctxt | ctxt first keyNotFoundErrorFor: #at: index: ctxt last ] context: (Array with: self with: key) == takes about 3.6". Again we have a clean block but added the overhead of dealing with arrays. 4) Adding instance variables "a" and "b" to dictionaries to force the use of copying blocks as follows (and to avoid having to thinker with the compiler): == at: key a := self. b := key. ^ self at: key ifAbsent: [ a keyNotFoundErrorFor: #at: index: b ] == yields a time of about 3.4". (We ignore all threading issues here). 5) If we (could) remove the assignments to "a" and "b" (for the sake of the argument), then execution time drops to about 2.9". michel On 08 Oct 2008, at 17:01, Alan Knight wrote:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Alan Knight-2
Alan Knight wrote:
> I find it remarkable that, with everything that goes on in a dictionary > lookup, that the cost of a copying versus a clean block would be 25% of > the time. I'm inclined to be suspicious of the benchmark. > Rather more than 25%, actually. Benchmarks are often surprising... The time to run an empty loop: ^Time microsecondsToRun: [10000000 timesRepeat: []] 30907 30870 29836 average: 30538 | dict | dict := Dictionary new. dict at: #foo put: #bar. ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] 1123834 1102892 1245824 average: 1157517, less loop overhead 1126979 for an average of 8.87 lookups per microsecond. | dict | dict := TestDictionary new. dict at: #foo put: #bar. ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] 734595 758221 747092 average: 746636, less loop overhead 716098 for an average of 13.96 lookups per microsecond That's a *57%* increase in lookup speed. The difference between Dictionary and TestDictionary is Mark's change, with the exception of using an instance variable instead of a shared variable, since shared variable access is quite slow compared to instvar access. Looking at why there is this much speed difference, the MultiTimeProfiler tells us: Dictionary 92 samples in 1.62 seconds real time 1452 scavenges, 0 incGCs TestDictionary 57 samples in 1.02 seconds real time 792 scavenges, 0 incGCs So we're saving a lot of scavenges, which implies we're creating fewer (or smaller) objects. But we're still creating objects. The profiler reports for the ScavengeProcess: Dictionary 33.0% * 92 samples * 17650 microseconds / sample = 535854 microseconds spent in scavenge. TestDictionary 17.3% * 57 samples * 17820 microseconds / sample = 175723 microseconds spent in scavenge. The difference is 360131 microseconds. So the difference in benchmarks is almost entirely accounted for by savings in GC. Let's see if we can reduce scavenges even further. Test2Dictionary>>at: key | index obj | index := self findKeyOrNil: key. obj := self basicAt: index. ^obj == nil ifTrue: [self keyNotFoundErrorFor: #at: index: key] ifFalse: [obj value] The profiler now reports that we've eliminated scavenges altogether. Excellent. The timings: | dict | dict := Test2Dictionary new. dict at: #foo put: #bar. ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] 585597 583622 575339 average: 581519, less loop overhead 550981 for an average of 18.15 lookups per microsecond. That's a 105% speedup. Test2Dictionary>>at: is more than twice as fast as Dictionary>>at: in the non-failure case. A self-style VM could do this kind of inlining automatically. The bytecode-level inlining that Terry's talking about could also do it, though without the frequency feedback it's a bit harder to know where to do this, and a mechanism would be needed to keep track of all of the methods that had inlined a particular method in order to automatically recompile all inlining methods whenever an inlined method is recompiled. Regards, -Martin _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
So technically, copying takes just a little bit longer than a clean
block, it's the cleanup that takes alot of time. Makes me wonder how much practical impact this would have in a real situation, when regarding the amount of garbage you'd create elsewhere in the course of doing 10 million Dictionary lookups ;) Cheers, Henry Martin McClure wrote: > Alan Knight wrote: > >> I find it remarkable that, with everything that goes on in a dictionary >> lookup, that the cost of a copying versus a clean block would be 25% of >> the time. I'm inclined to be suspicious of the benchmark. >> >> > > Rather more than 25%, actually. Benchmarks are often surprising... > > The time to run an empty loop: > > ^Time microsecondsToRun: [10000000 timesRepeat: []] > 30907 > 30870 > 29836 > > average: 30538 > > > | dict | > dict := Dictionary new. > dict at: #foo put: #bar. > ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] > 1123834 > 1102892 > 1245824 > > average: 1157517, less loop overhead 1126979 > for an average of 8.87 lookups per microsecond. > > | dict | > dict := TestDictionary new. > dict at: #foo put: #bar. > ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] > 734595 > 758221 > 747092 > > average: 746636, less loop overhead 716098 > for an average of 13.96 lookups per microsecond > > That's a *57%* increase in lookup speed. > > > The difference between Dictionary and TestDictionary is Mark's change, > with the exception of using an instance variable instead of a shared > variable, since shared variable access is quite slow compared to instvar > access. > > Looking at why there is this much speed difference, the > MultiTimeProfiler tells us: > > Dictionary > 92 samples in 1.62 seconds real time > 1452 scavenges, 0 incGCs > > TestDictionary > 57 samples in 1.02 seconds real time > 792 scavenges, 0 incGCs > > So we're saving a lot of scavenges, which implies we're creating fewer > (or smaller) objects. But we're still creating objects. > > The profiler reports for the ScavengeProcess: > > Dictionary > 33.0% * 92 samples * 17650 microseconds / sample = 535854 > microseconds spent in scavenge. > > TestDictionary > 17.3% * 57 samples * 17820 microseconds / sample = 175723 > microseconds spent in scavenge. > > The difference is 360131 microseconds. So the difference in benchmarks > is almost entirely accounted for by savings in GC. > > Let's see if we can reduce scavenges even further. > > Test2Dictionary>>at: key > > | index obj | > index := self findKeyOrNil: key. > obj := self basicAt: index. > ^obj == nil > ifTrue: [self keyNotFoundErrorFor: #at: index: key] > ifFalse: [obj value] > > > The profiler now reports that we've eliminated scavenges altogether. > Excellent. > > The timings: > > | dict | > dict := Test2Dictionary new. > dict at: #foo put: #bar. > ^Time microsecondsToRun: [10000000 timesRepeat: [dict at: #foo]] > 585597 > 583622 > 575339 > > average: 581519, less loop overhead 550981 > for an average of 18.15 lookups per microsecond. > > That's a 105% speedup. Test2Dictionary>>at: is more than twice as fast > as Dictionary>>at: in the non-failure case. > > > A self-style VM could do this kind of inlining automatically. The > bytecode-level inlining that Terry's talking about could also do it, > though without the frequency feedback it's a bit harder to know where to > do this, and a mechanism would be needed to keep track of all of the > methods that had inlined a particular method in order to automatically > recompile all inlining methods whenever an inlined method is recompiled. > > Regards, > > -Martin > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Martin McClure
On Fri, Oct 10, 2008 at 1:03 PM, Martin McClure <[hidden email]> wrote: [excellent analysis snipped]
Not only that, it would use an at: that would avoid the bounds check and isInteger check because it would be able to prove that the result of findKeyOrNil: would be an index in range. So the speed ups would be even higher than achieved by this experiment.
The A Self-styl;e VM can still be implemented to target bytecode and have the VM generate the machine code. The VM needs additional bytecodes to express things like non-bounds-checking at: etc.
But you and I know this. We've been talking about it for a while now. I'm looking forward to working on this in Cog. Eliot Qwaq, Inc. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hello, I looked again at my small tests (I noticed my previous mail was a bit confusing, the Dictionary>>#at: method also uses copying blocks of course). I also tried to have a rough idea of scavenges using AllocationProfiler: 1) standard #at: about 6.2" - about 5800 scavenges, and 1 incGC 2) Mark's #at: about 2,8" - 0 scavenges and 0 incGCC 3) variant on standard (with b instance variable as below): about 3.2" - about 3170 scavenges and 1 incGC at: key "Answer the value at key. If key is not found, raise a signal." b := key. ^ self at: key ifAbsent: [ self keyNotFoundErrorFor: #at: index: b ] 1 5093 2 2401 3 842 4 253 5 82 6 25 7 16 8 5 9 2 10 3 11 1 13 1 17 1 Would it make sense to try to store the copied values in another way? Regards, michel On 11 Oct 2008, at 00:38, Eliot Miranda wrote:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
> I looked again at my small tests (I noticed my previous mail was a bit
> confusing, the Dictionary>>#at: method also uses copying blocks of > course). I also tried to have a rough idea of scavenges using > AllocationProfiler: > > 1) standard #at: about 6.2" - about 5800 scavenges, and 1 incGC > 2) Mark's #at: about 2,8" - 0 scavenges and 0 incGCC > 3) variant on standard (with b instance variable as below): about 3.2" > - about 3170 scavenges and 1 incGC Reading this thread, I have mixed feelings. Yes, it is a good idea to avoid garbage. BUT (and I meant to write this in capital letters ;-) ) the large performance differences probably depend on the configuration of ObjectMemory as well. Given that the default values are still relatively small, it might be interesting to see the effect of setting the size of Eden and SurvivorSpace to larger values. This should reduce the number of scavenges. E.g., setting the sizes of Eden and SurvivorSpace to 10 or 20 times the default value should reduce the number of scavenges by a factor of 10 or 20. It might also reduce tenuring objects in OldSpace, thus avoiding the incGC runs as well. Would the difference in execution times still be as large, or would it shrink to less impressive values? If garbage collection (scavenges in this case) are expensive, one can either avoid it by not creating garbage, or make it more efficient. So far, the proposed solutions have concentrated on the first of the two options only. Having said that, the most efficient solution will of course be a combination of both. Joachim Geidel _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Michel Tilman
To Martin: You
wrote “since
shared variable access is quite slow compared to instvar access.”, so I changed my method to use an instvar, but it was
a bit slower than the version with a shared variable. What would be the reason
for this? Then I also did
some experiments where I inlined the at:ifAbsent: in the at: method:, avoiding
a message send and a copying block: at: key |
index obj | index
:= self findKeyOrNil: key. obj
:= self basicAt: index. ^obj
== nil ifTrue:
[self keyNotFoundErrorFor: #at: index: key] ifFalse:
[obj value] and then
increasingly inlined some more messages to finally come to something like this: at: key |
index obj length pass probe aHashValue | length
:= self basicSize. pass
:= 1. aHashValue
:= key hash. index
:= (length > 4095 and: [aHashValue <= 16383]) ifTrue:
[aHashValue * (length // 4095 + 1) \\ length + 1] ifFalse:
[aHashValue \\ length + 1]. [(probe
:= self basicAt: index) == nil or: [probe key = key]] whileFalse: [(index
:= index + 1) > length ifTrue: [index
:= 1. pass
:= pass + 1. pass
> 2 ifTrue: [index := self grow findKeyOrNil: key]]]. obj
:= self basicAt: index. ^obj
== nil ifTrue:
[self keyNotFoundErrorFor: #at: index: key] ifFalse:
[obj value] I’ve
summerized the results (benchmarks in VW7.6; the benchmark code is listed at
the end of the mail. The memory size has been set to 128MB): 1)
Standard Dictionary>>at: implementation: 6.3sec 2)
Using a shared var to avoid copying block: 4.2sec 3)
Using an instvar to avoid coying block: 4.3sec 4)
Using Michel’s technique to use stack variables to avoid
copying blocks: 4.0sec 5)
Inlining the at:ifAbsent: into the at: method: 3.4sec 6)
Inlined the #findKeyOrNil: into the previous method: 3.1sec 7)
Inlined the #initialIndexFor:boundedBy: into the previous: 3.0sec 8)
Inlined the constant values into the previous: 2.8sec The difference
between nr 4 & 5 is remarkable: 15% faster by avoiding ONE message send. The difference
between the standard and the final inlined version (nr 8) results in a speedup
of x2.25. There seems to be quite some overhead associated with a message send
(going from step 5 to step 8 only removed message sends, there were no extra
copying blocks removed). I still used my
original benchmark, not taking loop overhead into account: |d| d := Dictionary
new. d at: #ababa put:
1. d at: #abcacba
put: 2. d at: #wwabwaba
put: 3. d at: #tahhbg
put: 4. Time
millisecondsToRun: [1000000 timesRepeat: [d
at: #ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg. d at:
#ababa. d at:
#abcacba. d at:
#wwabwaba. d at:
#tahhbg].] Mark From:
[hidden email] [mailto:[hidden email]] On Behalf Of Michel Tilman Hello, Out of curiosity i tried (using Mark's benchmark) the following
experiments. 1) On my machine (iMac intel OSX / VW7.6) the time for his solution is
about 2.8". 2) Adding (the "k" argument is of course redundant here) == at: key ifAbsent: aBlock receiver: receiver key: k |
index obj| index
:= self findKeyOrNil: key. obj
:= self basicAt: index. ^obj
== nil ifTrue:
[aBlock value: receiver value: k ] ifFalse:
[obj value] == and using == at: key ^
self at: key ifAbsent: [ :rcvr :k | a keyNotFoundErrorFor: #at: index: b ]
receiver: self key: key == results in about 2.3". We have a clean block here, and little overhead. 3) Using a more 'generic' variant == at: key ifAbsent: aBlock context: context |
index obj| index
:= self findKeyOrNil: key. obj
:= self basicAt: index. ^obj
== nil ifTrue:
[aBlock value: context ] ifFalse:
[obj value] == with == at: key "Answer
the value at key. If key is not found, raise a signal." ^
self at: key ifAbsent: [ :ctxt | ctxt first keyNotFoundErrorFor: #at: index:
ctxt last ] context: (Array with: self with: key) == takes about 3.6". Again we have a clean block but added the overhead of dealing with
arrays. 4) Adding instance variables "a" and "b" to
dictionaries to force the use of copying blocks as follows (and to avoid having
to thinker with the compiler): == at: key a
:= self. b
:= key. ^
self at: key ifAbsent: [ a keyNotFoundErrorFor: #at: index: b ] == yields a time of about 3.4". (We ignore all threading issues here). 5) If we (could) remove the assignments to "a" and "b"
(for the sake of the argument), then execution time drops to about 2.9". michel On 08 Oct 2008, at 17:01, Alan Knight wrote:
I find it remarkable that, with everything that goes on in a dictionary
lookup, that the cost of a copying versus a clean block would be 25% of the
time. I'm inclined to be suspicious of the benchmark. Steve From: [hidden email] [[hidden email]] On
Behalf Of Steven Kelly From: [hidden email] [[hidden email]] On
Behalf Of Mark Plas -- Alan Knight [|], Engineering Manager, Cincom Smalltalk _______________________________________________ _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |