Hi: Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running). I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1]. The set contains the following benchmarks: - DeltaBlue - Richards - GraphSearch (search in a graph data structure) - Json (a minimal JSON parser benchmark) - PageRank (a page rank algorithm implementation) -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch -- Permute, Queens, Sieve, Storage, Towers The Java implementations are here [2] and the SOM implementations here [3]. Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs. The Results ——————————— The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks. geomean min max Java 8 1.0 1.0 1.0 latest PharoVM 12.9 2.5 182.4 (not sure which exact version of the CogVM that is) TruffleSOM 2.3 1.0 4.9 RTruffleSOM 3.0 1.5 11.5 TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework. RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy). So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue. Compared to the CogVM, my SOM implementations are doing a little better :) Another interesting data point is the pure interpreter performance: geomean min max Java 8 interp 1.0 1.0 1.0 PharoVM Stack 1.6 0.5 15.3 (not sure which exact version of the StackVM that is) TruffleSOM 6.3 1.9 15.7 RTruffleSOM 5.6 1.6 15.7 What we see here is that the StackVM is actually sometimes faster than the Java interpreter. While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers. Well, that’s it for the moment. I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista. And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;) Best regards Stefan [1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4 [2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som [3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
Hi: Now with the numbers for the Spur VM (3306): geomean min max Java 8 1.0 1.0 1.0 latest PharoVM 12.9 2.5 182.4 (not sure which exact version of the CogVM that is) Spur 3306 9.4 2.3 139.2 TruffleSOM 2.3 1.0 4.9 RTruffleSOM 3.0 1.5 11.5 You’re probably interested in the difference between Spur and non-Spur VM. Here the speedup of Spur for the different benchmarks: Bounce 9% BubbleSort 34% DeltaBlue 10% Fannkuch 25% GraphSearch 10% Json 9% Mandelbrot 54% NBody 50% PageRank 24% Permute 38% Queens 20% QuickSort 28% Richards -7% Sieve 40% Storage 36% Towers 26% Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
In reply to this post by Stefan Marr-3
Hi Stefan,
On Mon, Apr 6, 2015 at 3:12 AM, Stefan Marr <[hidden email]> wrote:
No response to specifics yet because Clément and I are busy with Sista internals but YES! Thank you *very much*! As soon as possible we'll be looking at this in detail. And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;) best,
Eliot |
In reply to this post by Stefan Marr-3
I've checked the code, and ran some benchmarks in Squeak. When I tried to load the code, I got complaints from the system, because the name of class variables begin with lowercase letters. ScriptCollector is also missing from Squeak, though it's easy to work this around. There are still plenty of #% sends in the code, which I had to rewrite to #\\. The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that "PRNG" hasn't really put much thought in it... I tried it with another PRNG which is another 6x faster (so the overall speed is 36x of the original version), but that's still way too slow. Squeak is rather slow here. An optimized PRNG written in C generates about 3 magnitudes more random bits than an optimized PRNG in Squeak at the same time. About porting: I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes? Another thing is about types: The [1 - SomPageRank DFactor / n] expression is calculated in an n*n loop during the PageRank benchmark, where n is a constant of the benchmark, and SomPageRank DFactor is also a constant - 0.85. Let's see how this adds to the runtime: n := 100. [ 1 - SomPageRank DFactor / n ] bench. '5,110,000 per second. 196 nanoseconds per run.'. [ 1.0 - SomPageRank DFactor / n ] bench. '23,500,000 per second. 42.5 nanoseconds per run.'. nFloat := n asFloat. [ 1.0 - 0.85 / nFloat ] bench. '26,000,000 per second. 38.5 nanoseconds per run.'. [ 0.15 / nFloat ] bench. '41,400,000 per second. 24.2 nanoseconds per run.'. [ 0.0015 ] bench. '118,000,000 per second. 8.46 nanoseconds per run.'. [] bench. '125,000,000 per second. 8.01 nanoseconds per run.' So the code is calculating the same constant over and over again. Due to type conversions this is about 25x slower than using a precalculated constant (and ~5x slower than the code with proper types). Of course an adaptive optimizer could optimize this, but the same applies to any programmer who care about performance. The corresponding java code is: private static double D_FACTOR = 0.85; // damping factor and ((1 - D_FACTOR)/n) If I were to port this, and I would want to stick to the implementation, then D_FACTOR would be a class variable, and the code would read as: [1.0 - DFactor / n]. But knowing that the constant is not used anywhere else, I see no problem with precalculating the value. Levente On Mon, 6 Apr 2015, Stefan Marr wrote: > > Hi: > > Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running). > > I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1]. > > The set contains the following benchmarks: > - DeltaBlue > - Richards > - GraphSearch (search in a graph data structure) > - Json (a minimal JSON parser benchmark) > - PageRank (a page rank algorithm implementation) > -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch > -- Permute, Queens, Sieve, Storage, Towers > > The Java implementations are here [2] and the SOM implementations here [3]. > > Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs. > > The Results > ——————————— > > The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks. > > geomean min max > Java 8 1.0 1.0 1.0 > latest PharoVM 12.9 2.5 182.4 (not sure which exact version of the CogVM that is) > TruffleSOM 2.3 1.0 4.9 > RTruffleSOM 3.0 1.5 11.5 > > TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework. > RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy). > > So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue. > Compared to the CogVM, my SOM implementations are doing a little better :) > > > Another interesting data point is the pure interpreter performance: > > geomean min max > Java 8 interp 1.0 1.0 1.0 > PharoVM Stack 1.6 0.5 15.3 (not sure which exact version of the StackVM that is) > TruffleSOM 6.3 1.9 15.7 > RTruffleSOM 5.6 1.6 15.7 > > What we see here is that the StackVM is actually sometimes faster than the Java interpreter. > While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers. > > > Well, that’s it for the moment. > I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista. > > > And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;) > > Best regards > Stefan > > > [1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4 > [2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som > [3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks > > -- > Stefan Marr > INRIA Lille - Nord Europe > http://stefan-marr.de/research/ > > > > |
Hi Levente: My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code. It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code. > On 06 Apr 2015, at 22:24, Levente Uzonyi <[hidden email]> wrote: > > I've checked the code, and ran some benchmarks in Squeak. When I tried to load the code, I got complaints from the system, because the name of class variables begin with lowercase letters. > ScriptCollector is also missing from Squeak, though it's easy to work this > around. > There are still plenty of #% sends in the code, which I had to rewrite to #\\. Yes, I did not commit the Squeak compatible code. It’s only in my image. If someone wants it, I could probably upload it somewhere. > The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it... Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work. > About porting: > I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes? The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes. > Another thing is about types: > [1 - SomPageRank DFactor / n] > private static double D_FACTOR = 0.85; // damping factor > > If I were to port this, and I would want to stick to the implementation, then D_FACTOR would be a class variable, and the code would read as: [1.0 - DFactor / n]. But knowing that the constant is not used anywhere else, I see no problem with precalculating the value. That’s the job of the optimizer/JIT compiler. And, also SOM doesn’t have class variables. So, I’ll stick with the current code. That’s a puzzle for Sista to solve. Not something I would want to change in the code. You might be tempted to do such optimizations, because you know the currently limits of the execution mechanisms. But for me, those benchmarks are about measuring how good the VMs optimize and not, how good you know your VMs. Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
Hi Stefan, really interesting; thanks. > Am 06.04.2015 um 22:50 schrieb Stefan Marr <[hidden email]>: > My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code. it might seem the pure interpreter comparison adds little value then, but it's still interesting. Can you share some more insight about the characteristics of the benchmarks with the "surprising" results in interpreter-only mode? > The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes. The indexing base really shouldn't matter to the compiler. Best, Michael |
Hi Michael: > On 06 Apr 2015, at 23:06, Michael Haupt <[hidden email]> wrote: > >> Am 06.04.2015 um 22:50 schrieb Stefan Marr <[hidden email]>: >> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code. > > it might seem the pure interpreter comparison adds little value then, but it’s still interesting. It depends. For the self-optimizing interpreters it is also interesting to see the effect of the optimizations. And, for Sista, and the new bytecode set with quickening, this will also be interesting. In general, I think interpreter performance is relevant for warmup and average application performance. But I am still looking for a better way of measuring that more directly. > Can you share some more insight about the characteristics of the benchmarks with the “surprising" results in interpreter-only mode? So, the faster than Java benchmarks were DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers. Honestly, I don’t know whether there is one common characteristic between them. I wouldn’t normally think so. NBody is mostly object field access and floating point operations. Storage is a tree of arrays, rather a GC benchmark. Permute is very array heavy. It could well be that the Smalltalk bytecode set is just slightly more optimized for the relevant performance critical operations in those benchmarks. However, the average is still about 60% slower. So, on the whole set of benchmarks, the Java interpreter is still better. I would say, overall, it only says that the interpreter is pretty well optimized. >> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes. > > The indexing base really shouldn’t matter to the compiler. Right, Graal as well as RPython take care of that nicely. Perhaps Sista will some day, too. Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
In reply to this post by Stefan Marr-3
On Mon, 6 Apr 2015, Stefan Marr wrote: > > Hi Levente: > > My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code. > It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code. In that case maybe the best is to leave everything as it is, but then the results of these benchmarks are hardly comparable between the ports. > >> On 06 Apr 2015, at 22:24, Levente Uzonyi <[hidden email]> wrote: >> >> I've checked the code, and ran some benchmarks in Squeak. When I tried to load the code, I got complaints from the system, because the name of class variables begin with lowercase letters. >> ScriptCollector is also missing from Squeak, though it's easy to work this >> around. >> There are still plenty of #% sends in the code, which I had to rewrite to #\\. > > Yes, I did not commit the Squeak compatible code. It’s only in my image. > If someone wants it, I could probably upload it somewhere. need it. Does Pharo have a #% method? > > >> The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it... > > Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work. It's just 6x faster and uses ThirtyTwoBitRegister. If you're still interested in it, I'll upload it somewhere, along with the missing method of ThirtyTwoBitRegister. > >> About porting: >> I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes? > > The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes. The methods I found in the PageRank benchmark are easy to port to 1-based index. I did that in my image. > > >> Another thing is about types: >> [1 - SomPageRank DFactor / n] > >> private static double D_FACTOR = 0.85; // damping factor >> >> If I were to port this, and I would want to stick to the implementation, then D_FACTOR would be a class variable, and the code would read as: [1.0 - DFactor / n]. But knowing that the constant is not used anywhere else, I see no problem with precalculating the value. > > That’s the job of the optimizer/JIT compiler. And, also SOM doesn’t have class variables. > So, I’ll stick with the current code. That’s a puzzle for Sista to solve. Not something I would want to change in the code. > > You might be tempted to do such optimizations, because you know the currently limits of the execution mechanisms. But for me, those benchmarks are about measuring how good the VMs optimize and not, how good you know your VMs. constant", because the VM should be able to find that out. :) Levente > > Best regards > Stefan > > -- > Stefan Marr > INRIA Lille - Nord Europe > http://stefan-marr.de/research/ > > > > |
Answering some of my own questions: Yes, Pharo has a #% method. Yes, it's possible to reproduce the Jenkins hash based PRNG with Nativeboost[1]. It's about 10x faster than the one with ThirtyTwoBitRegister. So the overall speedup is ~60x compared to the pure Smalltalk one. No, the code doesn't work out of the box in Pharo. I had to change SomAll's superclass from Object to SomBenchmarkHarness. No, SomPageRank doesn't run on Pharo either. It gets into an infinite loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n = 1, and that makes the following loop infinite: k := SomJenkinsRandom random abs % n. [i = k] whileTrue: [ k := SomJenkinsRandom random abs % n. ]. Levente [1] http://leves.web.elte.hu/squeak/NBJenkinsRandom.st |
Hi Levente: > On 07 Apr 2015, at 22:57, Levente Uzonyi <[hidden email]> wrote: > > Answering some of my own questions: Sorry, didn’t find time earlier. > No, the code doesn't work out of the box in Pharo. I had to change SomAll’s superclass from Object to SomBenchmarkHarness. Ehm, I think I did not even port SomAll. That’s an empty class? > No, SomPageRank doesn't run on Pharo either. It gets into an infinite loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n = 1, and that makes the following loop infinite: That’s a bug in the original code. Yes. I didn’t fix it. Not sure how you even execute the code, since I didn’t document anything… Sorry. But, the general idea is to execute it from the command line. From within the image, you can execute it with: SomBenchmarkHarness new run: { ''. 'PageRank'. '10'. '0'. ‘5' }. On the command line it would look something like: $yourVM $yourImage SomBenchmarkHarness PageRank 500 0 1 (Assuming that the Scripting package is loaded). Sorry, am currently traveling. Will try to create a proper configuration when I am back. Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
In reply to this post by Levente Uzonyi-2
Hi Levente: > On 07 Apr 2015, at 16:58, Levente Uzonyi <[hidden email]> wrote: > > On Mon, 6 Apr 2015, Stefan Marr wrote: > >> My goal with those benchmarks is to compare how well the just-in-time compilers optimize the code. >> It is not a goal to write the most efficient version of the code, taking the known implementation details into account. I want those benchmarks to be reasonably high-level, reasonably idiomatic code. > > In that case maybe the best is to leave everything as it is, but then the results of these benchmarks are hardly comparable between the ports. Cross-language comparisons are always fishy… However, on platforms with ‘sufficiently smart compilers’, the results are a useful indication for where to look for further optimization potential. On platforms with less extensive optimizations, yes, I agree, one might have the feeling that it is a little unfair. >>> The PageRank benchmark is so slow that I stopped running it after about 30 minutes. The profiler shows, that it spends over 95% of the time in SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that PRNG, but it's still way to slow. One can consider this as a weakness of the system, but it's also a weak point of the benchmark, that relies so heavily on a fast PRNG implementation. The code is also pretty bad, because it uses only a few bits out of the generated 32, and it has to fight with the signed result. Whoever came up with using that “PRNG" hasn't really put much thought in it... >> >> Yes, the RNG is slow. If you got a faster version, I could surely integrate it thought. As long as it isn’t fundamentally changing things. As mentioned about, for me, the benchmarks are intended to measure how well the optimizers work. > > It's just 6x faster and uses ThirtyTwoBitRegister. If you’re still interested in it, I'll upload it somewhere, along with the missing method of ThirtyTwoBitRegister. Sure, ThirtyTwoBitRegister is part of Squeak/Pharo, so, seems like a valid approach to me. And, since it is used for performance critical stuff, a compiler writer could decide to recognize it and map it directly on 32bit operations… >>> About porting: >>> I don't know what your original goal was, but I don't see why you would keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and this definitely has a negative impact on the Smalltalk results. If you were to port code from Smalltalk to Java, would you keep the 1-based indexes? >> >> The problem here is that not all indexing is pure indexing. Sometimes the indexes are used as values. Getting that right is a pain. So for some benchmarks I gave up after trying to convert to 1-based indexes. > > The methods I found in the PageRank benchmark are easy to port to 1-based index. I did that in my image. Ok, I think I tried and failed. But perhaps that was another benchmark. > Well, then you shouldn’t tell Java that the variable is “private static constant", because the VM should be able to find that out. :) Ehm, well, actually, I even forgot the `final` keyword in the Java version :) But, there is a difference between representing something as a constant with the available language features, and doing a manual constant propagation. The later is arguably rather bad for code readability and maintainability, because the intension gets lost. Since we don’t have final fields in Smalltalk, a method returning a literal value is as close as we get. And, it is sufficient for good compilers. Putting a precomputed results in some field, that’s modifiable isn’t something I like. Especially with Smalltalk images and reflection, it’s always a little unclear whether the value is actually what you expect it to be. Anyway, yes, cross-language and cross-dialect comparisons are inherently problematic. Thanks for the comments Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
Thanks Stefan for those benchmarks! Just for fun, we took them and ran them in Squeak 4.6 on Cog, the Stack VM, and RSqueak/VM.
This omits PageRank, because I couldn't be bothered to wait that long, and GraphSearch, because there seems to be a bug with RSqueak/VM so it fails with an out-of-bounds. The numbers are normalized against the interpreter VM. Here are the winners per benchmark: mean.norm benchmark vm 1 0.02566508 Mandelbrot RSqueak 2 0.05702497 IntegerLoop RSqueak 3 0.06013495 FieldLoop RSqueak 4 0.06748896 WhileLoop RSqueak 5 0.08052232 Richards RSqueak 6 0.09240031 Towers RSqueak 7 0.15941078 Bounce Cog 8 0.19163855 Sieve RSqueak 9 0.22643333 Queens RSqueak 10 0.25765379 NBody Cog 11 0.26811460 Storage RSqueak 12 0.27148559 Fannkuch RSqueak 13 0.29138479 DeltaBlue Cog 14 0.39468583 Json Cog 15 0.44837529 Permute RSqueak And here are the numbers for Cog and RSqueak/VM (Interpreter is 1): RSqueak Cog Bounce 0.44569231 0.1594108 DeltaBlue 0.50761783 0.2913848 Fannkuch 0.27148559 0.4696399 FieldLoop 0.06013495 0.2631032 IntegerLoop 0.05702497 0.1894448 Json 1.48456633 0.3946858 Mandelbrot 0.02566508 0.3334912 NBody 0.73934825 0.2576538 Permute 0.44837529 0.4645503 Queens 0.22643333 0.3669021 Richards 0.08052232 0.1307220 Sieve 0.19163855 0.6768656 Storage 0.26811460 0.4415439 Towers 0.09240031 0.2979555 WhileLoop 0.06748896 0.2240831 |
Hi Tim: Bye the way, I put the Spur image I used here: http://stefan-marr.de/downloads/Benchmarks-Spur-2015-04-06.zip > On 10 Apr 2015, at 13:53, timfelgentreff <[hidden email]> wrote: > > The numbers are normalized against the interpreter VM. Here are the winners > per benchmark: Interesting, but, we should really have a talk about how to present data :-P I can’t make sense out of your raw R output… If I didn’t do anything wrong, this should be the speedup numbers over the interpreter. IMHO, this is actually understandable. RSqueak Cog Bounce 2,2 6,3 DeltaBlue 2,0 3,4 Fannkuch 3,7 2,1 FieldLoop 16,6 3,8 IntegerLoop 17,5 5,3 Json 0,7 2,5 Mandelbrot 39,0 3,0 NBody 1,4 3,9 Permute 2,2 2,2 Queens 4,4 2,7 Richards 12,4 7,6 Sieve 5,2 1,5 Storage 3,7 2,3 Towers 10,8 3,4 WhileLoop 14,8 4,5 I am a little surprised that Json is so bad. In RTruffleSOM, which also uses meta-tracing, it’s not slower than the interpreter. Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
In reply to this post by Stefan Marr-3
On Wed, 8 Apr 2015, Stefan Marr wrote: > > Hi Levente: > >> On 07 Apr 2015, at 22:57, Levente Uzonyi <[hidden email]> wrote: >> >> Answering some of my own questions: > > Sorry, didn’t find time earlier. > >> No, the code doesn't work out of the box in Pharo. I had to change SomAll’s superclass from Object to SomBenchmarkHarness. > > Ehm, I think I did not even port SomAll. That’s an empty class? > >> No, SomPageRank doesn't run on Pharo either. It gets into an infinite loop, because #generateRandomPagesN:outLinks:divisor: is evaluated with n = 1, and that makes the following loop infinite: > > That’s a bug in the original code. Yes. I didn’t fix it. > > Not sure how you even execute the code, since I didn’t document anything… Sorry. because SomAll doesn't implement this and that, but it seemed like SomBenchmarkHarness does, so I simply changed SomAll to be a subclass of SomBenchmarkHarness. > But, the general idea is to execute it from the command line. > > From within the image, you can execute it with: > SomBenchmarkHarness new run: { ''. 'PageRank'. '10'. '0'. ‘5' }. > > On the command line it would look something like: > $yourVM $yourImage SomBenchmarkHarness PageRank 500 0 1 > > (Assuming that the Scripting package is loaded). > Sorry, am currently traveling. Will try to create a proper configuration when I am back. Levente > > Best regards > Stefan > > > -- > Stefan Marr > INRIA Lille - Nord Europe > http://stefan-marr.de/research/ > > > > |
In reply to this post by Stefan Marr-3
We guessed JSON may be bad because we never looked at string performance.
I haven't checked RTruffleSOM in a while, but one issue that affects RSqueak (as well as Cog to some extent, I imagine) is that the interrupt timer. For us, it adds a couple of guards and bridges to pretty much every loop. SOM doesn't have an interactive image, does it? |
Hi Tim: > On 10 Apr 2015, at 15:13, timfelgentreff <[hidden email]> wrote: > > We guessed JSON may be bad because we never looked at string performance. Ok, sounds plausible. Indeed Json uses purely Strings for everything. > I haven't checked RTruffleSOM in a while, but one issue that affects RSqueak > (as well as Cog to some extent, I imagine) is that the interrupt timer. For > us, it adds a couple of guards and bridges to pretty much every loop. Ok, right. Yes, that’s going to cost performance. Perhaps, you should have a proper ‘headless/event-less’ execution mode for benchmarking ;) > SOM doesn’t have an interactive image, does it? No, no images, no UI, no event handling. It’s really just a straight forward sequential language. Best regards Stefan -- Stefan Marr INRIA Lille - Nord Europe http://stefan-marr.de/research/ |
We have that, but I wanted a fair comparison in the environment that an actual user of Squeak would run :) |
Free forum by Nabble | Edit this page |