CogVM performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

CogVM performance

Levente Uzonyi-2
 
Hi All,

I saw that some of my benchmarks produce different results when running on
PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks
from the same image on both VMs and the results were surprising:

"CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec'
"Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec'

Slightly more bytecodes, and a lot more sends. I tried playing with
compiler flags, but with no luck, so I came to the conclusion that the
difference comes from somewhere else. I checked where the PharoVM was
forked from CogVM (somewhere around 2014-06-10), so I downloaded a version
of CogVM from around that time: CogVM 3048.

And the results are '932604735 bytecodes/sec; 118660385 sends/sec'.

Somewhere between 3048 and 3343 the performance has dropped significantly
(sends/sec). Note that this is not something that's only visible in micro
benchmarks.

Does anyone know why?


Levente
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

EstebanLM


> On 16 May 2015, at 17:08, Levente Uzonyi <[hidden email]> wrote:
>
> Hi All,
>
> I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising:
>
> "CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec'
> "Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec'
>
> Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048.
>
> And the results are '932604735 bytecodes/sec; 118660385 sends/sec'.
>
> Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks.
>
> Does anyone know why?

It has to be a flags issue.
btw… We did not fork… I branched to work on spur and I still didn’t get back to trunk, in the mean time we just incorporated to trunk minor bugfixes :)


>
>
> Levente

Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
 
On Sat, 16 May 2015, Esteban Lorenzano wrote:

>
>
>> On 16 May 2015, at 17:08, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi All,
>>
>> I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising:
>>
>> "CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec'
>> "Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec'
>>
>> Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048.
>>
>> And the results are '932604735 bytecodes/sec; 118660385 sends/sec'.
>>
>> Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks.
>>
>> Does anyone know why?
>
> It has to be a flags issue.
I don't think so, because flags don't have much effect on jitted code. I
can squeeze out a few percents with -O3 -fomit-frame-pointer
-march=native, but what that really affects are the primitives.

I narrowed it down, it happened between 3306 and 3308:

3048 '932604735 bytecodes/sec; 118660385 sends/sec'
3163 '885813148 bytecodes/sec; 118216795 sends/sec'
3164 '856187290 bytecodes/sec; 109993018 sends/sec'
3254 '932604735 bytecodes/sec; 119670738 sends/sec'
3306 '943778801 bytecodes/sec; 125949728 sends/sec'
3308 '924187725 bytecodes/sec; 44489533 sends/sec'
3311 '937728937 bytecodes/sec; 46190853 sends/sec'
3312 '894323144 bytecodes/sec; 45239356 sends/sec'
3343 '904593639 bytecodes/sec; 46368169 sends/sec'

Levente

> btw… We did not fork… I branched to work on spur and I still didn’t get back to trunk, in the mean time we just incorporated to trunk minor bugfixes :)
>
>
>>
>>
>> Levente
>
>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Eliot Miranda-2
 
Hi Levente,

On Sat, May 16, 2015 at 8:59 AM, Levente Uzonyi <[hidden email]> wrote:
 
On Sat, 16 May 2015, Esteban Lorenzano wrote:



On 16 May 2015, at 17:08, Levente Uzonyi <[hidden email]> wrote:

Hi All,

I saw that some of my benchmarks produce different results when running on PharoVM instead of CogVM, and I was wondering why. I ran #tinyBenchmarks from the same image on both VMs and the results were surprising:

"CogVM 3343" '904593639 bytecodes/sec; 46368169 sends/sec'
"Pharo vm50" '941176470 bytecodes/sec; 120698445 sends/sec'

Slightly more bytecodes, and a lot more sends. I tried playing with compiler flags, but with no luck, so I came to the conclusion that the difference comes from somewhere else. I checked where the PharoVM was forked from CogVM (somewhere around 2014-06-10), so I downloaded a version of CogVM from around that time: CogVM 3048.

And the results are '932604735 bytecodes/sec; 118660385 sends/sec'.

Somewhere between 3048 and 3343 the performance has dropped significantly (sends/sec). Note that this is not something that's only visible in micro benchmarks.

Does anyone know why?

It has to be a flags issue.

I don't think so, because flags don't have much effect on jitted code. I can squeeze out a few percents with -O3 -fomit-frame-pointer -march=native, but what that really affects are the primitives.

I narrowed it down, it happened between 3306 and 3308:

3048 '932604735 bytecodes/sec; 118660385 sends/sec'
3163 '885813148 bytecodes/sec; 118216795 sends/sec'
3164 '856187290 bytecodes/sec; 109993018 sends/sec'
3254 '932604735 bytecodes/sec; 119670738 sends/sec'
3306 '943778801 bytecodes/sec; 125949728 sends/sec'
3308 '924187725 bytecodes/sec; 44489533 sends/sec'
3311 '937728937 bytecodes/sec; 46190853 sends/sec'
3312 '894323144 bytecodes/sec; 45239356 sends/sec'
3343 '904593639 bytecodes/sec; 46368169 sends/sec'


I'll take a look soon.  Could you tell me what platform you're on?

[snip]
--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
 
Hi Eliot,

This happens on Ubuntu 14.04, using coglinuxht.

Levente

On Sat, 16 May 2015, Eliot Miranda wrote:

Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
In reply to this post by Levente Uzonyi-2
 
On a side note: an undebuggable VM optimized for my machine[1] with gcc
4.8.2, built from the latest sources gives '1037487335 bytecodes/sec;
45666949 sends/sec', which is about +15% bytecodes/sec, but doesn't
affect sends/sec. Plugin performance is probably also better.

Levente

[1] with -march=native -mfpmath=sse -O3 -fomit-frame-pointer

On Sat, 16 May 2015, Levente Uzonyi wrote:

Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
In reply to this post by Eliot Miranda-2
 
I also checked spur (cogspurlinuxht), and it's very similar:

3306 '1574173712 bytecodes/sec; 96854495 sends/sec'
3308 '1629276054 bytecodes/sec; 45580785 sends/sec'

Levente
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Clément Béra
 
I tried to look at it but I am not sure. Is it something related to the memory leak checker being enabled or something like that ? 

I am not sure this is a C compiler flag issue as this bench runs mostly jitted code. It looks like the jitted code is the same between both versions. I'm confused.

2015-05-19 2:39 GMT+02:00 Levente Uzonyi <[hidden email]>:

I also checked spur (cogspurlinuxht), and it's very similar:

3306 '1574173712 bytecodes/sec; 96854495 sends/sec'
3308 '1629276054 bytecodes/sec; 45580785 sends/sec'

Levente

Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
 
The difference is somewhere in the sources. I just checked out revision
3306 of the src directory, and rebuilt a CogVM (ht one, and kept the
platform files at 3345. I used some extra CFLAGS which are responsible
for the higher bytecodes/sec, ignore that.). It gives the following:

'941176470 bytecodes/sec; 122927085 sends/sec'

While 3343 gives:

'896672504 bytecodes/sec; 46102703 sends/sec'

Levente
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Levente Uzonyi-2
 
And after that I checked out 3308 of src, and rebuilt the VM again:

'934306569 bytecodes/sec; 46368169 sends/sec'

The files which were changed from 3306 to 3308 are:

U    src/vm/cointerp.c
U    src/vm/cointerpmt.c
U    src/vm/cogit.c
U    src/vm/cointerp.h
U    src/vm/gcc3x-cointerp.c
U    src/vm/cointerpmt.h
U    src/vm/gcc3x-cointerpmt.c
U    src/vm/cogit.h
U    src/plugins/QuicktimePlugin/QuicktimePlugin.c

Levente

On Tue, 19 May 2015, Levente Uzonyi wrote:

>
> The difference is somewhere in the sources. I just checked out revision 3306
> of the src directory, and rebuilt a CogVM (ht one, and kept the platform
> files at 3345. I used some extra CFLAGS which are responsible for the higher
> bytecodes/sec, ignore that.). It gives the following:
>
> '941176470 bytecodes/sec; 122927085 sends/sec'
>
> While 3343 gives:
>
> '896672504 bytecodes/sec; 46102703 sends/sec'
>
> Levente
>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

David T. Lewis
 
The generated source files have a comment block that indicate the VMMaker
sources from which they were generated. The SVN 3306 src files came from
VMMaker.oscog-eem.1155, and the SVN 3308 src files were generated from
VMMaker.oscog-eem.1178. The changes that produce the performances difference
will be found somewhere in the VMMaker commits it that range.

Dave


On Tue, May 19, 2015 at 10:40:15PM +0200, Levente Uzonyi wrote:

>
> And after that I checked out 3308 of src, and rebuilt the VM again:
>
> '934306569 bytecodes/sec; 46368169 sends/sec'
>
> The files which were changed from 3306 to 3308 are:
>
> U    src/vm/cointerp.c
> U    src/vm/cointerpmt.c
> U    src/vm/cogit.c
> U    src/vm/cointerp.h
> U    src/vm/gcc3x-cointerp.c
> U    src/vm/cointerpmt.h
> U    src/vm/gcc3x-cointerpmt.c
> U    src/vm/cogit.h
> U    src/plugins/QuicktimePlugin/QuicktimePlugin.c
>
> Levente
>
> On Tue, 19 May 2015, Levente Uzonyi wrote:
>
> >
> >The difference is somewhere in the sources. I just checked out revision
> >3306 of the src directory, and rebuilt a CogVM (ht one, and kept the
> >platform files at 3345. I used some extra CFLAGS which are responsible for
> >the higher bytecodes/sec, ignore that.). It gives the following:
> >
> >'941176470 bytecodes/sec; 122927085 sends/sec'
> >
> >While 3343 gives:
> >
> >'896672504 bytecodes/sec; 46102703 sends/sec'
> >
> >Levente
> >
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Henrik Sperre Johansen
 
Could it be related to the change in 1166 to use XCHG between register/mem locations instead of a MOV to a (now unavailable) temp register in genPushRegisterArgsForNumArgs:?
According to http://www.agner.org/optimize/instruction_tables.pdf , XCHG with m/r operands has major penalties on most platforms compared to m/r MOV, which would be a problem if it's used for a large percent of sends...

Cheers,
Henry

On 20 May 2015, at 3:07 , David T. Lewis <[hidden email]> wrote:


The generated source files have a comment block that indicate the VMMaker
sources from which they were generated. The SVN 3306 src files came from
VMMaker.oscog-eem.1155, and the SVN 3308 src files were generated from
VMMaker.oscog-eem.1178. The changes that produce the performances difference
will be found somewhere in the VMMaker commits it that range.

Dave


On Tue, May 19, 2015 at 10:40:15PM +0200, Levente Uzonyi wrote:

And after that I checked out 3308 of src, and rebuilt the VM again:

'934306569 bytecodes/sec; 46368169 sends/sec'

The files which were changed from 3306 to 3308 are:

U    src/vm/cointerp.c
U    src/vm/cointerpmt.c
U    src/vm/cogit.c
U    src/vm/cointerp.h
U    src/vm/gcc3x-cointerp.c
U    src/vm/cointerpmt.h
U    src/vm/gcc3x-cointerpmt.c
U    src/vm/cogit.h
U    src/plugins/QuicktimePlugin/QuicktimePlugin.c

Levente

On Tue, 19 May 2015, Levente Uzonyi wrote:


The difference is somewhere in the sources. I just checked out revision
3306 of the src directory, and rebuilt a CogVM (ht one, and kept the
platform files at 3345. I used some extra CFLAGS which are responsible for
the higher bytecodes/sec, ignore that.). It gives the following:

'941176470 bytecodes/sec; 122927085 sends/sec'

While 3343 gives:

'896672504 bytecodes/sec; 46102703 sends/sec'

Levente


Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Eliot Miranda-2
 
Hi Henry,

On Wed, May 20, 2015 at 5:41 AM, Henrik Johansen <[hidden email]> wrote:
 
Could it be related to the change in 1166 to use XCHG between register/mem locations instead of a MOV to a (now unavailable) temp register in genPushRegisterArgsForNumArgs:?
According to http://www.agner.org/optimize/instruction_tables.pdf , XCHG with m/r operands has major penalties on most platforms compared to m/r MOV, which would be a problem if it's used for a large percent of sends...

I think you're right, but I'll check.  The XCHG is only needed on certain two-argument sends, where we're short a register (I can't remember the details right now).  I'll see if I can use XCHG less.
 
Cheers,
Henry

On 20 May 2015, at 3:07 , David T. Lewis <[hidden email]> wrote:


The generated source files have a comment block that indicate the VMMaker
sources from which they were generated. The SVN 3306 src files came from
VMMaker.oscog-eem.1155, and the SVN 3308 src files were generated from
VMMaker.oscog-eem.1178. The changes that produce the performances difference
will be found somewhere in the VMMaker commits it that range.

Dave


On Tue, May 19, 2015 at 10:40:15PM +0200, Levente Uzonyi wrote:

And after that I checked out 3308 of src, and rebuilt the VM again:

'934306569 bytecodes/sec; 46368169 sends/sec'

The files which were changed from 3306 to 3308 are:

U    src/vm/cointerp.c
U    src/vm/cointerpmt.c
U    src/vm/cogit.c
U    src/vm/cointerp.h
U    src/vm/gcc3x-cointerp.c
U    src/vm/cointerpmt.h
U    src/vm/gcc3x-cointerpmt.c
U    src/vm/cogit.h
U    src/plugins/QuicktimePlugin/QuicktimePlugin.c

Levente

On Tue, 19 May 2015, Levente Uzonyi wrote:


The difference is somewhere in the sources. I just checked out revision
3306 of the src directory, and rebuilt a CogVM (ht one, and kept the
platform files at 3345. I used some extra CFLAGS which are responsible for
the higher bytecodes/sec, ignore that.). It gives the following:

'941176470 bytecodes/sec; 122927085 sends/sec'

While 3343 gives:

'896672504 bytecodes/sec; 46102703 sends/sec'

Levente






--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CogVM performance

Eliot Miranda-2
In reply to this post by Henrik Sperre Johansen
 
Henrik, Levente, David,

    thank you so much for this.  Henrik, indeed it is the XCHG instruction that slows things down.  Here's a rough before and after measure, a huge difference!

----QUIT----{15 May 2015 . 8:46:55 am} trunk4.6.image priorSource: 21959994
(1 to: 10) do: [:i| Transcript cr; print: 0 tinyBenchmarks; flush]
'225948808 bytecodes/sec; 15841140 sends/sec'
'226749335 bytecodes/sec; 16492340 sends/sec'
'247821878 bytecodes/sec; 16536674 sends/sec'
'239252336 bytecodes/sec; 18218093 sends/sec'
'279628618 bytecodes/sec; 17426750 sends/sec'
'249269717 bytecodes/sec; 17848093 sends/sec'
'254220456 bytecodes/sec; 16581247 sends/sec'
'270184696 bytecodes/sec; 18146439 sends/sec'
'233576642 bytecodes/sec; 17056311 sends/sec'
'230008984 bytecodes/sec; 17848093 sends/sec'
----QUIT----{20 May 2015 . 10:27:31 am} trunk4.6.image priorSource: 21960768
'612074118 bytecodes/sec; 92940463 sends/sec'
'614277144 bytecodes/sec; 89647038 sends/sec'
'634448574 bytecodes/sec; 88517824 sends/sec'
'632098765 bytecodes/sec; 93145781 sends/sec'
'638006230 bytecodes/sec; 85932045 sends/sec'
'632098765 bytecodes/sec; 91660848 sends/sec'
'641604010 bytecodes/sec; 93145781 sends/sec'
'623250152 bytecodes/sec; 91132543 sends/sec'
'636815920 bytecodes/sec; 92195314 sends/sec'
'589522164 bytecodes/sec; 80877219 sends/sec'

On Wed, May 20, 2015 at 5:41 AM, Henrik Johansen <[hidden email]> wrote:
 
Could it be related to the change in 1166 to use XCHG between register/mem locations instead of a MOV to a (now unavailable) temp register in genPushRegisterArgsForNumArgs:?
According to http://www.agner.org/optimize/instruction_tables.pdf , XCHG with m/r operands has major penalties on most platforms compared to m/r MOV, which would be a problem if it's used for a large percent of sends...

Cheers,
Henry

On 20 May 2015, at 3:07 , David T. Lewis <[hidden email]> wrote:


The generated source files have a comment block that indicate the VMMaker
sources from which they were generated. The SVN 3306 src files came from
VMMaker.oscog-eem.1155, and the SVN 3308 src files were generated from
VMMaker.oscog-eem.1178. The changes that produce the performances difference
will be found somewhere in the VMMaker commits it that range.

Dave


On Tue, May 19, 2015 at 10:40:15PM +0200, Levente Uzonyi wrote:

And after that I checked out 3308 of src, and rebuilt the VM again:

'934306569 bytecodes/sec; 46368169 sends/sec'

The files which were changed from 3306 to 3308 are:

U    src/vm/cointerp.c
U    src/vm/cointerpmt.c
U    src/vm/cogit.c
U    src/vm/cointerp.h
U    src/vm/gcc3x-cointerp.c
U    src/vm/cointerpmt.h
U    src/vm/gcc3x-cointerpmt.c
U    src/vm/cogit.h
U    src/plugins/QuicktimePlugin/QuicktimePlugin.c

Levente

On Tue, 19 May 2015, Levente Uzonyi wrote:


The difference is somewhere in the sources. I just checked out revision
3306 of the src directory, and rebuilt a CogVM (ht one, and kept the
platform files at 3345. I used some extra CFLAGS which are responsible for
the higher bytecodes/sec, ignore that.). It gives the following:

'941176470 bytecodes/sec; 122927085 sends/sec'

While 3343 gives:

'896672504 bytecodes/sec; 46102703 sends/sec'

Levente






--
best,
Eliot