Hi All, I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising: "CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec' "Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec' Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048. And the results are '932604735 bytecodes/sec; 118660385 sends/sec'. Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks. Does anyone know why? Levente |
> On 16 May 2015, at 17:08, Levente Uzonyi <[hidden email]> wrote: > > Hi All, > > I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising: > > "CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec' > "Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec' > > Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048. > > And the results are '932604735 bytecodes/sec; 118660385 sends/sec'. > > Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks. > > Does anyone know why? It has to be a flags issue. btw… We did not fork… I branched to work on spur and I still didn’t get back to trunk, in the mean time we just incorporated to trunk minor bugfixes :) > > > Levente |
On Sat, 16 May 2015, Esteban Lorenzano wrote: > > >> On 16 May 2015, at 17:08, Levente Uzonyi <[hidden email]> wrote: >> >> Hi All, >> >> I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising: >> >> "CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec' >> "Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec' >> >> Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048. >> >> And the results are '932604735 bytecodes/sec; 118660385 sends/sec'. >> >> Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks. >> >> Does anyone know why? > > It has to be a flags issue. can squeeze out a few percents with -O3 -fomit-frame-pointer -march=native, but what that really affects are the primitives. I narrowed it down, it happened between 3306 and 3308: 3048 '932604735 bytecodes/sec; 118660385 sends/sec' 3163 '885813148 bytecodes/sec; 118216795 sends/sec' 3164 '856187290 bytecodes/sec; 109993018 sends/sec' 3254 '932604735 bytecodes/sec; 119670738 sends/sec' 3306 '943778801 bytecodes/sec; 125949728 sends/sec' 3308 '924187725 bytecodes/sec; 44489533 sends/sec' 3311 '937728937 bytecodes/sec; 46190853 sends/sec' 3312 '894323144 bytecodes/sec; 45239356 sends/sec' 3343 '904593639 bytecodes/sec; 46368169 sends/sec' Levente > btw… We did not fork… I branched to work on spur and I still didn’t get back to trunk, in the mean time we just incorporated to trunk minor bugfixes :) > > >> >> >> Levente > > |
Hi Levente,
On Sat, May 16, 2015 at 8:59 AM, Levente Uzonyi <[hidden email]> wrote:
I'll take a look soon. Could you tell me what platform you're on? [snip] -- best,
Eliot |
Hi Eliot, This happens on Ubuntu 14.04, using coglinuxht. Levente On Sat, 16 May 2015, Eliot Miranda wrote: |
In reply to this post by Levente Uzonyi-2
On a side note: an undebuggable VM optimized for my machine[1] with gcc 4.8.2, built from the latest sources gives '1037487335 bytecodes/sec; 45666949 sends/sec', which is about +15% bytecodes/sec, but doesn't affect sends/sec. Plugin performance is probably also better. Levente [1] with -march=native -mfpmath=sse -O3 -fomit-frame-pointer On Sat, 16 May 2015, Levente Uzonyi wrote: |
In reply to this post by Eliot Miranda-2
I also checked spur (cogspurlinuxht), and it's very similar: 3306 '1574173712 bytecodes/sec; 96854495 sends/sec' 3308 '1629276054 bytecodes/sec; 45580785 sends/sec' Levente |
I tried to look at it but I am not sure. Is it something related to the memory leak checker being enabled or something like that ? I am not sure this is a C compiler flag issue as this bench runs mostly jitted code. It looks like the jitted code is the same between both versions. I'm confused. 2015-05-19 2:39 GMT+02:00 Levente Uzonyi <[hidden email]>:
|
The difference is somewhere in the sources. I just checked out revision 3306 of the src directory, and rebuilt a CogVM (ht one, and kept the platform files at 3345. I used some extra CFLAGS which are responsible for the higher bytecodes/sec, ignore that.). It gives the following: '941176470 bytecodes/sec; 122927085 sends/sec' While 3343 gives: '896672504 bytecodes/sec; 46102703 sends/sec' Levente |
And after that I checked out 3308 of src, and rebuilt the VM again: '934306569 bytecodes/sec; 46368169 sends/sec' The files which were changed from 3306 to 3308 are: U src/vm/cointerp.c U src/vm/cointerpmt.c U src/vm/cogit.c U src/vm/cointerp.h U src/vm/gcc3x-cointerp.c U src/vm/cointerpmt.h U src/vm/gcc3x-cointerpmt.c U src/vm/cogit.h U src/plugins/QuicktimePlugin/QuicktimePlugin.c Levente On Tue, 19 May 2015, Levente Uzonyi wrote: > > The difference is somewhere in the sources. I just checked out revision 3306 > of the src directory, and rebuilt a CogVM (ht one, and kept the platform > files at 3345. I used some extra CFLAGS which are responsible for the higher > bytecodes/sec, ignore that.). It gives the following: > > '941176470 bytecodes/sec; 122927085 sends/sec' > > While 3343 gives: > > '896672504 bytecodes/sec; 46102703 sends/sec' > > Levente > |
The generated source files have a comment block that indicate the VMMaker sources from which they were generated. The SVN 3306 src files came from VMMaker.oscog-eem.1155, and the SVN 3308 src files were generated from VMMaker.oscog-eem.1178. The changes that produce the performances difference will be found somewhere in the VMMaker commits it that range. Dave On Tue, May 19, 2015 at 10:40:15PM +0200, Levente Uzonyi wrote: > > And after that I checked out 3308 of src, and rebuilt the VM again: > > '934306569 bytecodes/sec; 46368169 sends/sec' > > The files which were changed from 3306 to 3308 are: > > U src/vm/cointerp.c > U src/vm/cointerpmt.c > U src/vm/cogit.c > U src/vm/cointerp.h > U src/vm/gcc3x-cointerp.c > U src/vm/cointerpmt.h > U src/vm/gcc3x-cointerpmt.c > U src/vm/cogit.h > U src/plugins/QuicktimePlugin/QuicktimePlugin.c > > Levente > > On Tue, 19 May 2015, Levente Uzonyi wrote: > > > > >The difference is somewhere in the sources. I just checked out revision > >3306 of the src directory, and rebuilt a CogVM (ht one, and kept the > >platform files at 3345. I used some extra CFLAGS which are responsible for > >the higher bytecodes/sec, ignore that.). It gives the following: > > > >'941176470 bytecodes/sec; 122927085 sends/sec' > > > >While 3343 gives: > > > >'896672504 bytecodes/sec; 46102703 sends/sec' > > > >Levente > > |
Could it be related to the change in 1166 to use XCHG between register/mem locations instead of a MOV to a (now unavailable) temp register in genPushRegisterArgsForNumArgs:? According to http://www.agner.org/optimize/instruction_tables.pdf , XCHG with m/r operands has major penalties on most platforms compared to m/r MOV, which would be a problem if it's used for a large percent of sends... Cheers, Henry
|
Hi Henry,
On Wed, May 20, 2015 at 5:41 AM, Henrik Johansen <[hidden email]> wrote:
I think you're right, but I'll check. The XCHG is only needed on certain two-argument sends, where we're short a register (I can't remember the details right now). I'll see if I can use XCHG less.
best,
Eliot |
In reply to this post by Henrik Sperre Johansen
Henrik, Levente, David, thank you so much for this. Henrik, indeed it is the XCHG instruction that slows things down. Here's a rough before and after measure, a huge difference! ----QUIT----{15 May 2015 . 8:46:55 am} trunk4.6.image priorSource: 21959994 (1 to: 10) do: [:i| Transcript cr; print: 0 tinyBenchmarks; flush] '225948808 bytecodes/sec; 15841140 sends/sec' '226749335 bytecodes/sec; 16492340 sends/sec' '247821878 bytecodes/sec; 16536674 sends/sec' '239252336 bytecodes/sec; 18218093 sends/sec' '279628618 bytecodes/sec; 17426750 sends/sec' '249269717 bytecodes/sec; 17848093 sends/sec' '254220456 bytecodes/sec; 16581247 sends/sec' '270184696 bytecodes/sec; 18146439 sends/sec' '233576642 bytecodes/sec; 17056311 sends/sec' '230008984 bytecodes/sec; 17848093 sends/sec' ----QUIT----{20 May 2015 . 10:27:31 am} trunk4.6.image priorSource: 21960768 '612074118 bytecodes/sec; 92940463 sends/sec' '614277144 bytecodes/sec; 89647038 sends/sec' '634448574 bytecodes/sec; 88517824 sends/sec' '632098765 bytecodes/sec; 93145781 sends/sec' '638006230 bytecodes/sec; 85932045 sends/sec' '632098765 bytecodes/sec; 91660848 sends/sec' '641604010 bytecodes/sec; 93145781 sends/sec' '623250152 bytecodes/sec; 91132543 sends/sec' '636815920 bytecodes/sec; 92195314 sends/sec' '589522164 bytecodes/sec; 80877219 sends/sec' On Wed, May 20, 2015 at 5:41 AM, Henrik Johansen <[hidden email]> wrote:
best,
Eliot |
Free forum by Nabble | Edit this page |