Hi folks!
Just stumbled on this little tiny benchmark: http://butunclebob.com/ArticleS.UncleBob.SpeedOfJavaCppRuby As you can see Squeak is referenced a bit down in the comments with code: "Java is the winner, for 2100000 it spends 250 msec on average. Ruby 6500 msec on average. Squeak Smalltalk 2269 msec, VisualWorks Smalltalk 460 msec." I threw the same code into my Gjallar 3.8 image using the Exupery VM for Win32 and cobbled together this (excluding the actual #runOn: methods): ---------- bench "self bench" Transcript show: 'Normal:', ([self runOn: 2100000 ] timeToRun) asString;cr. Exupery initialiseExupery. ExuperyProfiler optimise:[self runOnX: 2100000]. Transcript show: 'ExuperyOptimised:', ([self runOnX: 2100000 ] timeToRun) asString;cr. Exupery dynamicallyInline. Transcript show: 'ExuperyOptimisedInlined:', ([self runOnX: 2100000 ] timeToRun) asString;cr. ------------------------ Which gave this (three runs): Normal:2698 ExuperyOptimised:5217 ExuperyOptimisedInlined:1807 Normal:2666 ExuperyOptimised:5030 ExuperyOptimisedInlined:1812 Normal:2672 ExuperyOptimised:5182 ExuperyOptimisedInlined:1770 Note: #runOnX: is exactly the same as #runOn:, I just didn't want to get Exupery mixed up with "normal" code. So.... well, evidently Exupery manages to do some inlining here :) and the end result is roughly 32% faster than interpreted. I did look at the exupery.log file but couldn't make much out of it. I also played around a little bit with the constants in #optimise but that only made things *worse* - possibly it decided to compile more and then didn't manage to inline? Have no idea. :) regards, Göran PS. This was using Exupery 0.10 - I see there is a 0.11 on SM... Darn! :) |
Hello Göran et al,
Friday, February 16, 2007, 8:20:30 AM, you wrote: GK> "Java is the winner, for 2100000 it spends 250 msec on average. GK> Ruby 6500 msec on average. Squeak Smalltalk 2269 msec, VisualWorks GK> Smalltalk 460 msec." I found the webpage a bit confusing. For example, the original code snippets calculate square roots while the latter don't. Nevertheless, I found the Smalltalk code to be worth of improvement. For example, switching to use byte arrays and using 0 and 1 as booleans results in a 25% speedup in VW. The scaled times would be 250ms for Java and 370ms for VW. Something important to note: not setting a single threaded program to run exclusively on one of multiple available cores can result in severe performance degradation under Windows XP (anywhere from no degradation up to 2x slower). The degradation itself is not stable either, as it swings from moderate to really bad from one second to the next. This is something we're investigating in the context of VW, and it seems to be related to Windows' process scheduler swapping cores under single threaded programs leading to CPU cache thrashing. Other single threaded programs such as Winrar exhibit the same behavior. I do not have access to an AMD dual core CPU, would somebody be able to run some tests with and without core affinity set? The "on average" description of run times may have something to do with this, and to me it casts a bit of doubt as to the precision of the results. We also observed that running two images on a dual core CPU, with each image set to run on one of the cores, results in both images running faster than a single image with core affinity by about 10%. As you can see, further research is required. If anybody has good pointers on this, I'd appreciate a link. -- Best regards, Andres mailto:[hidden email] |
In reply to this post by Göran Krampe
Göran Krampe writes:
> Hi folks! > > Just stumbled on this little tiny benchmark: > > http://butunclebob.com/ArticleS.UncleBob.SpeedOfJavaCppRuby Interesting, for what it's worth Exupery should be as fast or slightly faster than VW for that benchmark. Though I may have lost a little time recently but there's plenty of room for a little tuning to make it back up. First I had a look at what Exupery's doing with: tail -f Exupery.log | grep -v block Which shows the following: 9:38:21 pm: Initializing the Code Cache 9:38:24 pm: Compiling SmallInteger>>to:by:do: inlining #() 9:38:25 pm: Compiling WBKToys>>runOn: inlining #() 9:38:29 pm: Compiling SmallInteger>>to:by:do: inlining {{61 . ExuperyBlockContext}} 9:38:29 pm: Failed to inline ExuperyBlockContext>>value: 9:38:29 pm: Compiling WBKToys>>runOn: inlining {{33 . Array} . {44 . Array} . {61 . Array}} What's interesting is the #to:by:do. If you look at the bytecodes it's there. If I wrote a compiled version of ExuperyBlockContext>>value then even with the current bytecodes it should improve noticeably. Only compiled primitives can be inlined at the moment. Here's Exupery's benchmark suite: arithmaticLoopBenchmark 1387 compiled 127 ratio: 10.920 bytecodeBenchmark 2139 compiled 484 ratio: 4.419 sendBenchmark 1582 compiled 728 ratio: 2.173 doLoopsBenchmark 1063 compiled 843 ratio: 1.261 pointCreation 1075 compiled 1030 ratio: 1.044 largeExplorers 585 compiled 595 ratio: 0.983 compilerBenchmark 474 compiled 454 ratio: 1.044 Cumulative Time 1058.337 compiled 521.541 ratio 2.028 The bytecode benchmark is a prime number sieve similar to what you were using though coded to avoid sends and real blocks. Bryce |
Free forum by Nabble | Edit this page |