Hi. I have to say that I hate that the compiler associates a special bytecode for #class and that the VM don't even send the message. I want to be able to overwrite #class in a proxy for example, or to debug it.
I did some benchmarks disabling such optimization by doing: (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: []. Compiler recompileAll. And the difference is NOTHING in my tests (maybe I am doing something wrong). With Standard VM and #class optimization [SystemNavigation default allObjectsDo: [:each | each class]] timeToRun -> aprox 50 miliseconds With Standard VM and WITHOUT #class optimization [SystemNavigation default allObjectsDo: [:each | each class]] timeToRun -> aprox 60 miliseconds So...only 10 miliseconds more for asking the class to all the objects in the system.....(done in a PharoCore 1.3) With Cog VM and #class optimization [SystemNavigation default allObjectsDo: [:each | each class]] timeToRun -> aprox 13 miliseconds With Cog VM and WITHOUT #class optimization [SystemNavigation default allObjectsDo: [:each | each class]] timeToRun -> aprox 17 miliseconds So...only 5 miliseconds more for asking the class to all the objects in the system.....(done in a PharoCore 1.3) Considering (5 / 324535) asFloat 0,000015406... of overhead What do you think? Mariano |
On Mar 25, 2011, at 5:26 PM, Mariano Martinez Peck wrote: Hi. I have to say that I hate that the compiler associates a special bytecode for #class and that the VM don't even send the message. I want to be able to overwrite #class in a proxy for example, or to debug it. What is the constant over head of doing just the iteration? [SystemNavigation default allObjectsDo: [:each | ]] timeToRun
I am for... for the JIT is is a monomorphic send to a primitive, so the IC will be very effective and should take care that the lookup is actually almost never done. (and the lookup is the only thing the #class bytecode saves, as the target method is a primitive). Marcus -- Marcus Denker -- http://www.marcusdenker.de INRIA Lille -- Nord Europe. Team RMoD. |
In reply to this post by Mariano Martinez Peck
I am against non-transparent optimizations as well.
However, I think special selectors like ifTrue: / class and so forth should still be used in the VM since its cheap way getting some performance improvements. But to make them work in a decent OO way they should also be able to deal with a non-standard implementation (like customized Boolean class). On 2011-03-25, at 17:26, Mariano Martinez Peck wrote: > Hi. I have to say that I hate that the compiler associates a special > bytecode for #class and that the VM don't even send the message. I want to > be able to overwrite #class in a proxy for example, or to debug it. > > ObjectsDo: [:each | each class]] timeToRun > -> aprox 17 miliseconds > > So...only 5 miliseconds more for asking the class to all the objects in the > system.....(done in a PharoCore 1.3) > > Considering (5 ms / 324535) asFloat 0,000015406 ms... of overhead Arguing with absolute values is a bit dangerous: But it makes 13ms / 17ms * 100% = 76% => 25% speed improvement! in COG 50ms / 60ms * 100% = 83% => 17% faster So the impact is still quite big. But indeed I would be in favor of a normal message send... |
In reply to this post by Mariano Martinez Peck
On Mar 25, 2011, at 5:46 PM, Camillo Bruni wrote: >> > > Arguing with absolute values is a bit dangerous: > > But it makes > 13ms / 17ms * 100% = 76% => 25% speed improvement! in COG > 50ms / 60ms * 100% = 83% => 17% faster > > So the impact is still quite big. But indeed I would be in favor of a normal message send... > And are you sure that your Doit is actually JITed in Cog? 1) always take the overhead of the loop into acount. Especially if it is a large percentage, it can make you "slowdown" seem very small even though it is huge. e.g. the overhead of the loop in your case should be *very* high. 90% of the execution time? 2) always make sure that you actually JIT. -> put it in a normal method in a class -> execute it at least 2 times Marcus -- Marcus Denker -- http://www.marcusdenker.de INRIA Lille -- Nord Europe. Team RMoD. |
On Mar 25, 2011, at 5:51 PM, Marcus Denker wrote: > > On Mar 25, 2011, at 5:46 PM, Camillo Bruni wrote: >>> >> >> Arguing with absolute values is a bit dangerous: >> >> But it makes >> 13ms / 17ms * 100% = 76% => 25% speed improvement! in COG >> 50ms / 60ms * 100% = 83% => 17% faster So for Cog, you get this counter-intuitive result is because Cog executes the overhead loop faster in combination with not jitting what you wanted to test. As the doit is *not* Jited, it executes as fast as in the intepreter case. You see a higher percentage just because the base overhead loop is executed faster and the thing you test gets a larger share of the runtime. Marcus -- Marcus Denker -- http://www.marcusdenker.de INRIA Lille -- Nord Europe. Team RMoD. |
On 2011-03-25, at 18:12, Marcus Denker wrote: > > On Mar 25, 2011, at 5:51 PM, Marcus Denker wrote: > >> >> On Mar 25, 2011, at 5:46 PM, Camillo Bruni wrote: >>>> >>> >>> Arguing with absolute values is a bit dangerous: >>> >>> But it makes >>> 13ms / 17ms * 100% = 76% => 25% speed improvement! in COG >>> 50ms / 60ms * 100% = 83% => 17% faster > > So for Cog, you get this counter-intuitive result is because Cog > executes the overhead loop faster in combination with not jitting > what you wanted to test. I didn't run the benchmarks, I just listed the results mariano provided. Just to show that they do not provide a valid argument for removal of the method. So we have a nice Benchmarking framework on sqeaksource which we should use instead of relying on some pseudo valid results. I do not have to time right now to do so... but I expect the overhead to be neglectable in COG, since it should be fairly easy to JIT this in... > As the doit is *not* Jited, it executes as fast as in the intepreter case. > You see a higher percentage just because the base overhead loop > is executed faster and the thing you test gets a larger share of the > runtime. > > Marcus > > > -- > Marcus Denker -- http://www.marcusdenker.de > INRIA Lille -- Nord Europe. Team RMoD. > > |
In reply to this post by Marcus Denker-4
>> What do you think? > > I am for... for the JIT is is a monomorphic send to a primitive, so > the IC will be very effective > and should take care that the lookup is actually almost never done. > (and the lookup is the > only thing the #class bytecode saves, as the target method is a > primitive). I think that programmers should have the illusion that the VM interacts with the system through message sends. By being able to overwrite this particular message you break this illusion. The VM will just fetch the real class; but the user doesn't get access to it anymore. Telling the system that this particular message shouldn't be overwritten is probably a good thing. If you want to overwrite the method class, maybe your tools should send another message than class in the first place? I do however agree that many other optimizations become slightly superfluous, such as bytecodes for integer operations ... that somehow seems like compile-time inline-cached behavior without the ability to flush the cache :) cheers, Toon |
In reply to this post by Marcus Denker-4
On Mar 25, 2011, at 7:09 PM, Toon Verwaest wrote: > >>> What do you think? >> >> I am for... for the JIT is is a monomorphic send to a primitive, so the IC will be very effective >> and should take care that the lookup is actually almost never done. (and the lookup is the >> only thing the #class bytecode saves, as the target method is a primitive). > I think that programmers should have the illusion that the VM interacts with the system through message sends. By being able to overwrite this particular message you break this illusion. Why that? I think exactly the opposite is true. If I write self class I think that I send #class to self. But in the current system I do not. If #class is not send, one can never ever do proxies that work for real. A proxy can never stand in for an object as #class always answers "Proxy", even though it should not. Or for a Future it will say "Future". > The VM will just fetch the real class; but the user doesn't get access to it anymore. Telling the system that this particular message shouldn't be overwritten is probably a good thing. > One should take care, but a system that just does not allow things like this *even if you take care* is bad. > If you want to overwrite the method class, maybe your tools should send another message than class in the first place? > So what if it's not a tool? It might be domain code that really does self class == something as part of execution. Now I want ot use a proxy here, but I can not. Marcus -- Marcus Denker -- http://www.marcusdenker.de INRIA Lille -- Nord Europe. Team RMoD. |
In reply to this post by Toon Verwaest-2
On Fri, Mar 25, 2011 at 7:08 PM, Toon Verwaest <[hidden email]> wrote:
Of course I can do that. I can also do (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: []. Compiler recompileAll. I don't care. I have it working for my "usage". This was just an example. And don't compare #class with all the boolean methods like #ifNil: and friends or the bytecodes for integer operations....I think #class is sent far less times than those...
|
On 25 March 2011 19:22, Mariano Martinez Peck <[hidden email]> wrote:
> > > On Fri, Mar 25, 2011 at 7:08 PM, Toon Verwaest <[hidden email]> > wrote: >> >>>> What do you think? >>> >>> I am for... for the JIT is is a monomorphic send to a primitive, so the >>> IC will be very effective >>> and should take care that the lookup is actually almost never done. (and >>> the lookup is the >>> only thing the #class bytecode saves, as the target method is a >>> primitive). >> >> I think that programmers should have the illusion that the VM interacts >> with the system through message sends. By being able to overwrite this >> particular message you break this illusion. The VM will just fetch the real >> class; but the user doesn't get access to it anymore. Telling the system >> that this particular message shouldn't be overwritten is probably a good >> thing. >> >> If you want to overwrite the method class, maybe your tools should send >> another message than class in the first place? > > Of course I can do that. I can also do > > (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: > []. > Compiler recompileAll. > > I don't care. I have it working for my "usage". This was just an example. > > And don't compare #class with all the boolean methods like #ifNil: and > friends or the bytecodes for integer operations....I think #class is sent > far less times than those... > Yes. and i predict that performance impact in macro benchmarks, where you running normal code (which not sending this message intentionally in a loop) will be at the noise level magnitude. > >> >> I do however agree that many other optimizations become slightly >> superfluous, such as bytecodes for integer operations ... that somehow seems >> like compile-time inline-cached behavior without the ability to flush the >> cache :) >> >> cheers, >> Toon >> > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Camillo Bruni
Hi:
On 25 Mar 2011, at 18:15, Camillo Bruni wrote: >>>> 13ms / 17ms * 100% = 76% => 25% speed improvement! in COG >>>> 50ms / 60ms * 100% = 83% => 17% faster > > So we have a nice Benchmarking framework on sqeaksource which we should use instead of relying on some pseudo valid results. It is not "there" yet, in terms of usability. And it is not trimmed for such kind of microbenchmarks. But it would be great to establish it as a standard to avoid those common traps, like the code did not get JITed... Mariano, if you do these kind of measurements often, please have a look at [1] and [2]. And perhaps, we can extend [3] to your liking. Best regards Stefan [1] A. Georges, D. Buytaert, and L. Eeckhout, “Statistically Rigorous Java Performance Evaluation,” SIGPLAN Not., vol. 42, no. 10, p. 57-76, 2007. [2] http://code.google.com/p/caliper/ [3] http://www.squeaksource.com/SMark.html -- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525 |
In reply to this post by Igor Stasenko
On Fri, Mar 25, 2011 at 8:27 PM, Igor Stasenko <[hidden email]> wrote:
So...what about doing something more real...for example, run all the testcases and compare time. Wouldn't that be a good "benchmark" ? BTW, can I run all test from code? (not using the TestRunner UI)
|
>>> I don't care. I have it working for my "usage". This was just an example.
>>> >>> And don't compare #class with all the boolean methods like #ifNil: and >>> friends or the bytecodes for integer operations....I think #class is >> sent >>> far less times than those... >>> >> >> Yes. and i predict that performance impact in macro benchmarks, where >> you running normal code >> (which not sending this message intentionally in a loop) will be at >> the noise level magnitude. >> > > So...what about doing something more real...for example, run all the > testcases and compare time. > Wouldn't that be a good "benchmark" ? sounds already better. make sure to run it something like 100 times, get the average and the standard deviation. otherwise the results don't provide much information. > BTW, can I run all test from code? (not using the TestRunner UI) something like the following code should do it. TestCase allSubclasses do: [ :cls| cls isAbstract ifFalse: [cls run]]. |
So....I tried with running all tests of a PharoCore 1.3 and I've got this:
cog vm with optimization Time to run all tests:116506 cog vm without optimization Time to run all tests:121930 121930 - 116506 ----> 5424 ((5424 * 100) / 116506) asFloat ---> 4.655554220383499 So...in Cog, running all test, there is an overhead of 4.6% Do we want to pay it ? I would...but I am all ears. Cheers Mariano On Mon, Mar 28, 2011 at 11:08 AM, Camillo Bruni <[hidden email]> wrote:
-- Mariano http://marianopeck.wordpress.com |
On 5 April 2011 11:33, Mariano Martinez Peck <[hidden email]> wrote:
> So....I tried with running all tests of a PharoCore 1.3 and I've got this: > > cog vm with optimization > Time to run all tests:116506 > > cog vm without optimization > Time to run all tests:121930 > > 121930 - 116506 ----> 5424 > ((5424 * 100) / 116506) asFloat ---> 4.655554220383499 > > So...in Cog, running all test, there is an overhead of 4.6% > > Do we want to pay it ? > > I would...but I am all ears. > this is too much. or you also removed other optimized sends? > Cheers > > Mariano > > > > On Mon, Mar 28, 2011 at 11:08 AM, Camillo Bruni <[hidden email]> > wrote: >> >> >>> I don't care. I have it working for my "usage". This was just an >> >>> example. >> >>> >> >>> And don't compare #class with all the boolean methods like #ifNil: >> >>> and >> >>> friends or the bytecodes for integer operations....I think #class is >> >> sent >> >>> far less times than those... >> >>> >> >> >> >> Yes. and i predict that performance impact in macro benchmarks, where >> >> you running normal code >> >> (which not sending this message intentionally in a loop) will be at >> >> the noise level magnitude. >> >> >> > >> > So...what about doing something more real...for example, run all the >> > testcases and compare time. >> > Wouldn't that be a good "benchmark" ? >> >> sounds already better. make sure to run it something like 100 times, get >> the average and the standard deviation. otherwise the results don't provide >> much information. >> >> > BTW, can I run all test from code? (not using the TestRunner UI) >> >> >> something like the following code should do it. >> >> TestCase allSubclasses do: [ :cls| >> cls isAbstract >> ifFalse: [cls run]]. > > > > -- > Mariano > http://marianopeck.wordpress.com > > -- Best regards, Igor Stasenko AKA sig. |
On Tue, Apr 5, 2011 at 12:09 PM, Igor Stasenko <[hidden email]> wrote:
Only that one. For #== I got 4.3. And doing both together (#class and #==) I got 9.05% :( And removing the #class optimization in a Interpreter VM gives me almost as in cog: 4.7% maybe I am doing something wrong...
-- Mariano http://marianopeck.wordpress.com |
In reply to this post by Igor Stasenko
On Apr 5, 2011, at 12:32 PM, Mariano Martinez Peck wrote: > > > On Tue, Apr 5, 2011 at 12:09 PM, Igor Stasenko <[hidden email]> wrote: > On 5 April 2011 11:33, Mariano Martinez Peck <[hidden email]> wrote: > > So....I tried with running all tests of a PharoCore 1.3 and I've got this: > > > > cog vm with optimization > > Time to run all tests:116506 > > > > cog vm without optimization > > Time to run all tests:121930 > > > > 121930 - 116506 ----> 5424 > > ((5424 * 100) / 116506) asFloat ---> 4.655554220383499 > > > > So...in Cog, running all test, there is an overhead of 4.6% > > > > Do we want to pay it ? > > > > I would...but I am all ears. > > > 5% for #class message? > this is too much. or you also removed other optimized sends? > > > Only that one. For #== I got 4.3. And doing both together (#class and #==) I got 9.05% :( > > And removing the #class optimization in a Interpreter VM gives me almost as in cog: 4.7% > > maybe I am doing something wrong... > e.g. Issue 3940: BlockNode has undeclared ivar optimizedMessageNode -- Marcus Denker -- http://www.marcusdenker.de INRIA Lille -- Nord Europe. Team RMoD. |
VariableNode initialize.
Compiler recompileAll. [ TestCase allSubclasses do: [ :cls| cls isAbstract ifFalse: [cls suite run]]. ] timeToRun 178938 183963 (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: []. Compiler recompileAll. [ TestCase allSubclasses do: [ :cls| cls isAbstract ifFalse: [cls suite run]]. ] timeToRun 187168 184992 the deviation is too big to see if its really so big overhead. if you compare worst , you got 187/178 ~ 5% and if you compare the best you got 184/183 ~ 0.5% -- Best regards, Igor Stasenko AKA sig. |
This is exactly why you have to provide some confidence interval / deviation, otherwise it is hard to make any reasonable conclusion.
run it 100 times and take the average and provide the standard deviation. I am not a big fan of relying on incomplete benchmarking results: Please read: http://portal.acm.org/citation.cfm?id=1297033 http://www.squeaksource.com/p.html provides a basic benchmarking framework under the NBenchmark package. You subclass from PBenchmarkSuite implement a method #benchXXX and run it. r := PBFloat run: 100. r asString which will give decent results back :). This way it is much easier to make sense out of the numbers. So here again to remember: - number of samples - average run times - standard deviation If one of these results is missing the benchmark results are incomplete. best regards, camillo On 2011-04-05, at 13:56, Igor Stasenko wrote: > VariableNode initialize. > Compiler recompileAll. > > [ > TestCase allSubclasses do: [ :cls| > cls isAbstract > ifFalse: [cls suite run]]. > ] timeToRun > > 178938 > 183963 > > > > (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: []. > Compiler recompileAll. > > [ > TestCase allSubclasses do: [ :cls| > cls isAbstract > ifFalse: [cls suite run]]. > ] timeToRun > > 187168 > 184992 > > the deviation is too big to see if its really so big overhead. > > if you compare worst , you got 187/178 ~ 5% > and if you compare the best you got > 184/183 ~ 0.5% > > > -- > Best regards, > Igor Stasenko AKA sig. > |
Better than average, take the median
Nicolas 2011/4/5 Camillo Bruni <[hidden email]>: > This is exactly why you have to provide some confidence interval / deviation, otherwise it is hard to make any reasonable conclusion. > > run it 100 times and take the average and provide the standard deviation. > > I am not a big fan of relying on incomplete benchmarking results: > > Please read: http://portal.acm.org/citation.cfm?id=1297033 > > http://www.squeaksource.com/p.html provides a basic benchmarking framework under the NBenchmark package. You subclass from PBenchmarkSuite implement a method #benchXXX and run it. > > r := PBFloat run: 100. > r asString > > which will give decent results back :). This way it is much easier to make sense out of the numbers. > > So here again to remember: > > - number of samples > - average run times > - standard deviation > > If one of these results is missing the benchmark results are incomplete. > > best regards, > camillo > > > > On 2011-04-05, at 13:56, Igor Stasenko wrote: > >> VariableNode initialize. >> Compiler recompileAll. >> >> [ >> TestCase allSubclasses do: [ :cls| >> cls isAbstract >> ifFalse: [cls suite run]]. >> ] timeToRun >> >> 178938 >> 183963 >> >> >> >> (ParseNode classVarNamed: 'StdSelectors') removeKey: #class ifAbsent: []. >> Compiler recompileAll. >> >> [ >> TestCase allSubclasses do: [ :cls| >> cls isAbstract >> ifFalse: [cls suite run]]. >> ] timeToRun >> >> 187168 >> 184992 >> >> the deviation is too big to see if its really so big overhead. >> >> if you compare worst , you got 187/178 ~ 5% >> and if you compare the best you got >> 184/183 ~ 0.5% >> >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > > |
Free forum by Nabble | Edit this page |