Re: [squeak-dev] 64bit VMs some thoughts.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] 64bit VMs some thoughts.

David T. Lewis
 
On Wed, Dec 02, 2009 at 05:28:33PM -0800, John M McIntosh wrote:

> So you sit there smug about the fact you built a 64bit VM, likely for hosting on your 64bit Linux OS.
> {Or the unix one for Darwin, or that new fangled cocoa one}
>
> However it's possible that it's running 1/3 the performance of the 32bit VM.
> Did you check? Thought not...
>
> So let's talk.
>
> Are you using the gnuifed version of interp.c?  If you don't know, well go check.
> Are you using GCC 4.1 or higher?
>
> The interpreter loop is highly tuned monster that suffers from compiler optimization issues. With
> careful tuning parms as found in the macintosh xcode build project for the carbon VM using gcc 4.0
> you'll get the most optimum performance.
>
> GCC 4.2+ ?
>
> Michael Rueger and I spent a few days attempting to get good performance out of GCC 4.2
> WITHOUT success. I think that can account for at least a 33% slowdown.
>
> So where does the other 33% slowdown come from?  
>
> Well when we compile the VM in 64bit to use a 32bit image each reference to an oops requires us
> to add a 64bit memory start address to the 32bit oops number to resolve to a 64bit memory address.
> Unfortunately GCC 4.2 growls, and produces the lousiest code possible to do this.
> Maybe higher versions of GCC are better? Anyone care to test?
>
> So some solutions.
>
> (a) Ensure the squeak oops memory block loads within the 0-4GB address space.
> See pagezero size for Darwin. Then alter the logic a bit so that sqMemoryBase is zero
> and that the squeak memory accessors don't do the add of sqMemoryBase=0 to the oops address.
> Although you might have to use GCC 4.2 you'll run 100% faster.
>
> (b) Use the (non-free) Intel compiler
>

Hi John,

I get very different results, but they certainly support your observation
that newer GCC compilers are a problem.

If I compare a 64-bit VM built on my computer to a 32-bit VM downloaded from
Ian's site, running both on the same hardware and OS (AMD Turion, 64 bit Linux),
the 64-bit VM is running about twice as fast as the 32-bit VM.

In the past (over several years), I have never measured this carefully, but
I have the general impression that 64-bit and 32-bit VMs run at similar speeds
on my hardware and OS (I guess I should figure out how to compile in 32-bit
mode so I can really find out).

I would guess that the difference I am seeing now is due to compiler version.
Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.

For the record, here are the results I got (copied from CommandShell windows
in a Squeak trunk image).

For a 64-bit VM that I compiled locally, installed in /usr/local:

  $ cat /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 15
  model : 36
  model name : AMD Turion(tm) 64 Mobile Technology ML-34
  stepping : 2
  cpu MHz : 1600.000
  cache size : 1024 KB
  fpu : yes
  fpu_exception : yes
  cpuid level : 1
  wp : yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm
  bogomips : 3203.59
  TLB size : 1024 4K pages
  clflush size : 64
  cache_alignment : 64
  address sizes : 40 bits physical, 48 bits virtual
  power management: ts fid vid ttp tm stc
 
  $ cat /proc/version
  Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006
  $ /usr/local/bin/squeak -version
  SQUEAK_ENCODING=UTF-8
  SQUEAK_PATHENC=UTF-8
  SQUEAK_PLUGINS=/usr/local/lib/squeak/3.11.9-2145
  + exec /usr/local/lib/squeak/3.11.9-2145/squeakvm -version
  3.11.9-2145 #1 XShm Thu Dec  3 10:54:44 EST 2009 gcc 4.1.2
  Linux linux-6xfc 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux
  plugin path: /usr/local/lib/squeak/3.11.9-2145 [default: /usr/local/lib/squeak/3.11.9-2145/]
  $ strings /usr/local/lib/squeak/3.11.9-2145/squeakvm | grep gcc
  gcc 4.1.2
  $ 0 tinyBenchmarks
  154031287 bytecodes/sec; 5145368 sends/sec
  $ 0 tinyBenchmarks
  153201675 bytecodes/sec; 5183202 sends/sec
  $ 0 tinyBenchmarks
  151658767 bytecodes/sec; 5268426 sends/sec
  $

For a 32-bit VM from Ian's site, running the same image from a local directory:

  $ cat /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 15
  model : 36
  model name : AMD Turion(tm) 64 Mobile Technology ML-34
  stepping : 2
  cpu MHz : 1600.000
  cache size : 1024 KB
  fpu : yes
  fpu_exception : yes
  cpuid level : 1
  wp : yes
  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm
  bogomips : 3203.59
  TLB size : 1024 4K pages
  clflush size : 64
  cache_alignment : 64
  address sizes : 40 bits physical, 48 bits virtual
  power management: ts fid vid ttp tm stc
 
  $ cat /proc/version
  Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006
  $ pwd
  /home/lewis/squeak/VMM-Ian/Squeak-3.11.3.2135-linux_i386/lib/squeak/3.11.3-2135
  $ ls -l squeakvm
  -rwxr-xr-x 1 lewis users 2376017 2009-09-16 17:46 squeakvm
  $ strings squeakvm | grep gcc
  gcc 4.3.3
  $ ./squeakvm -version
  3.11.3-2135 #1 XShm Wed Sep 16 14:25:10 PDT 2009 gcc 4.3.3
  Linux ubuntu 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 18:40:08 UTC 2009 i686 GNU/Linux
  plugin path: /usr/local/lib/squeak/3.11.9-2145 [default: /home/lewis/squeak/VMM-Ian/Squeak-3.11.3.2135-linux_i386/lib/squeak/3.11.3-2135/]
  $ 0 tinyBenchmarks
  62135922 bytecodes/sec; 3330746 sends/sec
  $ 0 tinyBenchmarks
  62256809 bytecodes/sec; 3425013 sends/sec
  $ 0 tinyBenchmarks
  62317429 bytecodes/sec; 3346096 sends/sec
  $

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] 64bit VMs some thoughts.

David T. Lewis
 
On Thu, Dec 03, 2009 at 03:03:48PM -0500, David T. Lewis wrote:

>
> I would guess that the difference I am seeing now is due to compiler version.
> Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.
>
> The results I got were:
>
> For a 64-bit VM that I compiled locally, installed in /usr/local:
> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> OS: Linux version 2.6.18.2-34-default
> Compiler for VM: gcc 4.1.2
> Results:
>
>     0 tinyBenchmarks ==> 154031287 bytecodes/sec; 5145368 sends/sec
>     0 tinyBenchmarks ==> 153201675 bytecodes/sec; 5183202 sends/sec
>     0 tinyBenchmarks ==> 151658767 bytecodes/sec; 5268426 sends/sec
>
> For a 32-bit VM from Ian's site, running the same image from a local directory:
> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> OS: Linux version 2.6.18.2-34-default
> Compiler for VM: gcc 4.3.3
>   Results:
>
>     0 tinyBenchmarks ==> 62135922 bytecodes/sec; 3330746 sends/sec
>     0 tinyBenchmarks ==> 62256809 bytecodes/sec; 3425013 sends/sec
>     0 tinyBenchmarks ==> 62317429 bytecodes/sec; 3346096 sends/sec
>

After installing a prodigious number of 32-bit libraries on my 64-bit
Linux, I can now build a 32-bit VM for comparison. Here are the results
of a 64-bit versus 32-bit VM using the same compiler, operating system,
and hardware:

Compiled in 64-bit mode:
0 tinyBenchmarks '155339805 bytecodes/sec; 5304104 sends/sec'
0 tinyBenchmarks '155812538 bytecodes/sec; 5393385 sends/sec'
0 tinyBenchmarks '155151515 bytecodes/sec; 5272367 sends/sec'

Compiled in 32-bit mode:
0 tinyBenchmarks '136679124 bytecodes/sec; 4652907 sends/sec'
0 tinyBenchmarks '135521439 bytecodes/sec; 4659058 sends/sec'
0 tinyBenchmarks '135665076 bytecodes/sec; 4690056 sends/sec'

So overall I see about a 14% speed advantage for the 64-bit VM versus
the 32-bit VM on this platform. Again, this is with the older gcc 4.1.2
compiler.

Dave

Reply | Threaded
Open this post in threaded view
|

Re: Re: [squeak-dev] 64bit VMs some thoughts.

David T. Lewis
 
On Thu, Dec 03, 2009 at 05:33:09PM -0500, David T. Lewis wrote:

>  
> On Thu, Dec 03, 2009 at 03:03:48PM -0500, David T. Lewis wrote:
> >
> > I would guess that the difference I am seeing now is due to compiler version.
> > Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.
> >
> > The results I got were:
> >
> > For a 64-bit VM that I compiled locally, installed in /usr/local:
> > CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> > OS: Linux version 2.6.18.2-34-default
> > Compiler for VM: gcc 4.1.2
> > Results:
> >
> >     0 tinyBenchmarks ==> 154031287 bytecodes/sec; 5145368 sends/sec
> >     0 tinyBenchmarks ==> 153201675 bytecodes/sec; 5183202 sends/sec
> >     0 tinyBenchmarks ==> 151658767 bytecodes/sec; 5268426 sends/sec
> >
> > For a 32-bit VM from Ian's site, running the same image from a local directory:
> > CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> > OS: Linux version 2.6.18.2-34-default
> > Compiler for VM: gcc 4.3.3
> >   Results:
> >
> >     0 tinyBenchmarks ==> 62135922 bytecodes/sec; 3330746 sends/sec
> >     0 tinyBenchmarks ==> 62256809 bytecodes/sec; 3425013 sends/sec
> >     0 tinyBenchmarks ==> 62317429 bytecodes/sec; 3346096 sends/sec
> >
>
> After installing a prodigious number of 32-bit libraries on my 64-bit
> Linux, I can now build a 32-bit VM for comparison. Here are the results
> of a 64-bit versus 32-bit VM using the same compiler, operating system,
> and hardware:
>
> Compiled in 64-bit mode:
> 0 tinyBenchmarks '155339805 bytecodes/sec; 5304104 sends/sec'
> 0 tinyBenchmarks '155812538 bytecodes/sec; 5393385 sends/sec'
> 0 tinyBenchmarks '155151515 bytecodes/sec; 5272367 sends/sec'
>
> Compiled in 32-bit mode:
> 0 tinyBenchmarks '136679124 bytecodes/sec; 4652907 sends/sec'
> 0 tinyBenchmarks '135521439 bytecodes/sec; 4659058 sends/sec'
> 0 tinyBenchmarks '135665076 bytecodes/sec; 4690056 sends/sec'
>
> So overall I see about a 14% speed advantage for the 64-bit VM versus
> the 32-bit VM on this platform. Again, this is with the older gcc 4.1.2
> compiler.

With apologies, I have to retract that last set of numbers. I may have
inadvertently let my CPU fall into power save mode (I'm not sure). But
in any case, I wanted to repeat the experiment, so I build two VMs from
scratch, and the results I get this time are:

VM compiled in 64-bit mode:
0 tinyBenchmarks '158024691 bytecodes/sec; 4354017 sends/sec'
0 tinyBenchmarks '156670746 bytecodes/sec; 5187016 sends/sec'
0 tinyBenchmarks '155623100 bytecodes/sec; 5198491 sends/sec'
0 tinyBenchmarks '157635467 bytecodes/sec; 5179393 sends/sec'
0 tinyBenchmarks '157732593 bytecodes/sec; 5104384 sends/sec'
0 tinyBenchmarks '157927205 bytecodes/sec; 5221596 sends/sec'

VM compiled in 32-bit mode:
0 tinyBenchmarks '160300563 bytecodes/sec; 5179393 sends/sec'
0 tinyBenchmarks '160200250 bytecodes/sec; 5256640 sends/sec'
0 tinyBenchmarks '160703075 bytecodes/sec; 5126658 sends/sec'
0 tinyBenchmarks '160905091 bytecodes/sec; 5085970 sends/sec'
0 tinyBenchmarks '157635467 bytecodes/sec; 5202328 sends/sec'
0 tinyBenchmarks '158907510 bytecodes/sec; 5233225 sends/sec'

So the 32-bit VM may be two or three percent faster than the 64-bit
version on this platform.

Summary: I do not see evidence of a big performance hit from use
of 64-bit pointers, but there may well be a large performance hit
from use of the latest gcc compiler.

Dave

 
Reply | Threaded
Open this post in threaded view
|

Re: Re: [squeak-dev] 64bit VMs some thoughts.

johnmci
In reply to this post by David T. Lewis

Ok, is this a 64bit VM built with
#define SQ_VI_BYTES_PER_WORD 4 ?
or
#define SQ_VI_BYTES_PER_WORD 8  ?

what that is set to does change things.

btw on a macbook pro  2.33 Ghz intel core 2 duo
a 4.x squeak macintosh carbon vm does
'533611255 bytecodes/sec; 11577747 sends/sec'
'535005224 bytecodes/sec; 11599578 sends/sec'
'533889468 bytecodes/sec; 11280518 sends/sec'
'535284892 bytecodes/sec; 11837670 sends/sec'
'535005224 bytecodes/sec; 11405773 sends/sec'
'533611255 bytecodes/sec; 9517756 sends/sec'
'533333333 bytecodes/sec; 11769725 sends/sec'
'533611255 bytecodes/sec; 11420129 sends/sec'
'533611255 bytecodes/sec; 11563238 sends/sec'
'535284892 bytecodes/sec; 11304036 sends/sec'

which is 8.5x bytescodes/sec faster, and  3.3x sends/sec faster than the 32bit vm example you give
Maybe someone can run this on a equivalent  intel core 2 duo to understand why your 1.6 Ghz machine numbers are so dreadful.


5.x is  with #define SQ_VI_BYTES_PER_WORD 4 in 32bit mode CHEATING
below, but I've not tuned it yet.
'504930966 bytecodes/sec; 12435935 sends/sec'
'506429277 bytecodes/sec; 12229906 sends/sec'
'506429277 bytecodes/sec; 12623070 sends/sec'
'505429417 bytecodes/sec; 12649026 sends/sec'
'505928853 bytecodes/sec; 12588628 sends/sec'
'504433497 bytecodes/sec; 11868121 sends/sec'
'505429417 bytecodes/sec; 12649026 sends/sec'
'505429417 bytecodes/sec; 12614442 sends/sec'
'506429277 bytecodes/sec; 11769725 sends/sec'
'505928853 bytecodes/sec; 11505566 sends/sec'

5.x with SQ_VI_BYTES_PER_WORD 8 in 64bit mode  GCC 4.2
'482563619 bytecodes/sec; 14152552 sends/sec'
'483474976 bytecodes/sec; 14174292 sends/sec'
'482563619 bytecodes/sec; 14076986 sends/sec'
'481655691 bytecodes/sec; 14217973 sends/sec'
'482563619 bytecodes/sec; 14141708 sends/sec'
'483018867 bytecodes/sec; 13907256 sends/sec'
'483018867 bytecodes/sec; 14120068 sends/sec'
'483474976 bytecodes/sec; 14012854 sends/sec'
'483474976 bytecodes/sec; 14076986 sends/sec'
'483474976 bytecodes/sec; 13896783 sends/sec'

5.x  with SQ_VI_BYTES_PER_WORD 4 in 64bit mode   CHEATING
below, but I've not tuned it yet.  
'433530906 bytecodes/sec; 13051576 sends/sec'
'432432432 bytecodes/sec; 13042352 sends/sec'
'433898305 bytecodes/sec; 13051576 sends/sec'
'433530906 bytecodes/sec; 13042352 sends/sec'
'433898305 bytecodes/sec; 12941745 sends/sec'
'433898305 bytecodes/sec; 13023944 sends/sec'
'414910858 bytecodes/sec; 12815922 sends/sec'
'433898305 bytecodes/sec; 13005587 sends/sec'
'433164128 bytecodes/sec; 13005587 sends/sec'
'434266327 bytecodes/sec; 12444321 sends/sec'

On 2009-12-03, at 2:33 PM, David T. Lewis wrote:

>
> On Thu, Dec 03, 2009 at 03:03:48PM -0500, David T. Lewis wrote:
>>
>> I would guess that the difference I am seeing now is due to compiler version.
>> Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.
>>
>> The results I got were:
>>
>> For a 64-bit VM that I compiled locally, installed in /usr/local:
>> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
>> OS: Linux version 2.6.18.2-34-default
>> Compiler for VM: gcc 4.1.2
>> Results:
>>
>>    0 tinyBenchmarks ==> 154031287 bytecodes/sec; 5145368 sends/sec
>>    0 tinyBenchmarks ==> 153201675 bytecodes/sec; 5183202 sends/sec
>>    0 tinyBenchmarks ==> 151658767 bytecodes/sec; 5268426 sends/sec
>>
>> For a 32-bit VM from Ian's site, running the same image from a local directory:
>> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
>> OS: Linux version 2.6.18.2-34-default
>> Compiler for VM: gcc 4.3.3
>>  Results:
>>
>>    0 tinyBenchmarks ==> 62135922 bytecodes/sec; 3330746 sends/sec
>>    0 tinyBenchmarks ==> 62256809 bytecodes/sec; 3425013 sends/sec
>>    0 tinyBenchmarks ==> 62317429 bytecodes/sec; 3346096 sends/sec
>>
>
> After installing a prodigious number of 32-bit libraries on my 64-bit
> Linux, I can now build a 32-bit VM for comparison. Here are the results
> of a 64-bit versus 32-bit VM using the same compiler, operating system,
> and hardware:
>
> Compiled in 64-bit mode:
> 0 tinyBenchmarks '155339805 bytecodes/sec; 5304104 sends/sec'
> 0 tinyBenchmarks '155812538 bytecodes/sec; 5393385 sends/sec'
> 0 tinyBenchmarks '155151515 bytecodes/sec; 5272367 sends/sec'
>
> Compiled in 32-bit mode:
> 0 tinyBenchmarks '136679124 bytecodes/sec; 4652907 sends/sec'
> 0 tinyBenchmarks '135521439 bytecodes/sec; 4659058 sends/sec'
> 0 tinyBenchmarks '135665076 bytecodes/sec; 4690056 sends/sec'
>
> So overall I see about a 14% speed advantage for the 64-bit VM versus
> the 32-bit VM on this platform. Again, this is with the older gcc 4.1.2
> compiler.
>
> Dave
>

--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: Re: [squeak-dev] 64bit VMs some thoughts.

David T. Lewis
 
In all cases I was using a 32-bit image, hence SQ_VI_BYTES_PER_WORD 4.

My computer is a small laptop, several years old, with an AMD processor
that was designed for low power rather than performance. I would not be
surprised if a newer mac is quite a bit faster.

I think you can expect the 64 bit image (SQ_VI_BYTES_PER_WORD 8) to be
relatively slow. Entirely aside from address calculation issues, the image
is going to be full of 64-bit integer arithmetic in places where 32-bit
arithmetic would normally be happening, and that is bound to take a toll.
There is plenty of room for optimization.

Dave

On Thu, Dec 03, 2009 at 04:49:38PM -0800, John M McIntosh wrote:

>
> Ok, is this a 64bit VM built with
> #define SQ_VI_BYTES_PER_WORD 4 ?
> or
> #define SQ_VI_BYTES_PER_WORD 8  ?
>
> what that is set to does change things.
>
> btw on a macbook pro  2.33 Ghz intel core 2 duo
> a 4.x squeak macintosh carbon vm does
> '533611255 bytecodes/sec; 11577747 sends/sec'
> '535005224 bytecodes/sec; 11599578 sends/sec'
> '533889468 bytecodes/sec; 11280518 sends/sec'
> '535284892 bytecodes/sec; 11837670 sends/sec'
> '535005224 bytecodes/sec; 11405773 sends/sec'
> '533611255 bytecodes/sec; 9517756 sends/sec'
> '533333333 bytecodes/sec; 11769725 sends/sec'
> '533611255 bytecodes/sec; 11420129 sends/sec'
> '533611255 bytecodes/sec; 11563238 sends/sec'
> '535284892 bytecodes/sec; 11304036 sends/sec'
>
> which is 8.5x bytescodes/sec faster, and  3.3x sends/sec faster than the 32bit vm example you give
> Maybe someone can run this on a equivalent  intel core 2 duo to understand why your 1.6 Ghz machine numbers are so dreadful.
>
>
> 5.x is  with #define SQ_VI_BYTES_PER_WORD 4 in 32bit mode CHEATING
> below, but I've not tuned it yet.
> '504930966 bytecodes/sec; 12435935 sends/sec'
> '506429277 bytecodes/sec; 12229906 sends/sec'
> '506429277 bytecodes/sec; 12623070 sends/sec'
> '505429417 bytecodes/sec; 12649026 sends/sec'
> '505928853 bytecodes/sec; 12588628 sends/sec'
> '504433497 bytecodes/sec; 11868121 sends/sec'
> '505429417 bytecodes/sec; 12649026 sends/sec'
> '505429417 bytecodes/sec; 12614442 sends/sec'
> '506429277 bytecodes/sec; 11769725 sends/sec'
> '505928853 bytecodes/sec; 11505566 sends/sec'
>
> 5.x with SQ_VI_BYTES_PER_WORD 8 in 64bit mode  GCC 4.2
> '482563619 bytecodes/sec; 14152552 sends/sec'
> '483474976 bytecodes/sec; 14174292 sends/sec'
> '482563619 bytecodes/sec; 14076986 sends/sec'
> '481655691 bytecodes/sec; 14217973 sends/sec'
> '482563619 bytecodes/sec; 14141708 sends/sec'
> '483018867 bytecodes/sec; 13907256 sends/sec'
> '483018867 bytecodes/sec; 14120068 sends/sec'
> '483474976 bytecodes/sec; 14012854 sends/sec'
> '483474976 bytecodes/sec; 14076986 sends/sec'
> '483474976 bytecodes/sec; 13896783 sends/sec'
>
> 5.x  with SQ_VI_BYTES_PER_WORD 4 in 64bit mode   CHEATING
> below, but I've not tuned it yet.  
> '433530906 bytecodes/sec; 13051576 sends/sec'
> '432432432 bytecodes/sec; 13042352 sends/sec'
> '433898305 bytecodes/sec; 13051576 sends/sec'
> '433530906 bytecodes/sec; 13042352 sends/sec'
> '433898305 bytecodes/sec; 12941745 sends/sec'
> '433898305 bytecodes/sec; 13023944 sends/sec'
> '414910858 bytecodes/sec; 12815922 sends/sec'
> '433898305 bytecodes/sec; 13005587 sends/sec'
> '433164128 bytecodes/sec; 13005587 sends/sec'
> '434266327 bytecodes/sec; 12444321 sends/sec'
>
> On 2009-12-03, at 2:33 PM, David T. Lewis wrote:
>
> >
> > On Thu, Dec 03, 2009 at 03:03:48PM -0500, David T. Lewis wrote:
> >>
> >> I would guess that the difference I am seeing now is due to compiler version.
> >> Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.
> >>
> >> The results I got were:
> >>
> >> For a 64-bit VM that I compiled locally, installed in /usr/local:
> >> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> >> OS: Linux version 2.6.18.2-34-default
> >> Compiler for VM: gcc 4.1.2
> >> Results:
> >>
> >>    0 tinyBenchmarks ==> 154031287 bytecodes/sec; 5145368 sends/sec
> >>    0 tinyBenchmarks ==> 153201675 bytecodes/sec; 5183202 sends/sec
> >>    0 tinyBenchmarks ==> 151658767 bytecodes/sec; 5268426 sends/sec
> >>
> >> For a 32-bit VM from Ian's site, running the same image from a local directory:
> >> CPU: AMD Turion(tm) 64 Mobile Technology ML-34, 1600 MHz
> >> OS: Linux version 2.6.18.2-34-default
> >> Compiler for VM: gcc 4.3.3
> >>  Results:
> >>
> >>    0 tinyBenchmarks ==> 62135922 bytecodes/sec; 3330746 sends/sec
> >>    0 tinyBenchmarks ==> 62256809 bytecodes/sec; 3425013 sends/sec
> >>    0 tinyBenchmarks ==> 62317429 bytecodes/sec; 3346096 sends/sec
> >>
> >
> > After installing a prodigious number of 32-bit libraries on my 64-bit
> > Linux, I can now build a 32-bit VM for comparison. Here are the results
> > of a 64-bit versus 32-bit VM using the same compiler, operating system,
> > and hardware:
> >
> > Compiled in 64-bit mode:
> > 0 tinyBenchmarks '155339805 bytecodes/sec; 5304104 sends/sec'
> > 0 tinyBenchmarks '155812538 bytecodes/sec; 5393385 sends/sec'
> > 0 tinyBenchmarks '155151515 bytecodes/sec; 5272367 sends/sec'
> >
> > Compiled in 32-bit mode:
> > 0 tinyBenchmarks '136679124 bytecodes/sec; 4652907 sends/sec'
> > 0 tinyBenchmarks '135521439 bytecodes/sec; 4659058 sends/sec'
> > 0 tinyBenchmarks '135665076 bytecodes/sec; 4690056 sends/sec'
> >
> > So overall I see about a 14% speed advantage for the 64-bit VM versus
> > the 32-bit VM on this platform. Again, this is with the older gcc 4.1.2
> > compiler.
> >
> > Dave
> >
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: [squeak-dev] 64bit VMs some thoughts.

Ian Piumarta
 
Hi Dave,

> I think you can expect the 64 bit image (SQ_VI_BYTES_PER_WORD 8) to be
> relatively slow. Entirely aside from address calculation issues, the  
> image
> is going to be full of 64-bit integer arithmetic in places where 32-
> bit
> arithmetic would normally be happening, and that is bound to take a  
> toll.

My experience with em64t has been entirely positive.  Programs  
consistently run faster in 64-bit mode than in 32-bit mode.  Much of  
the improvement is probably due to the 8 additional registers and the  
passing of the first few arguments in registers rather than on the  
stack.  The Squeak VM likely does not benefit the way most other  
programs do because of the aggressive inlining of methods in the C  
code generator, eliminating entirely the impact of a better argument  
passing convention.

I have to say I am disgusted at gcc 4.3 though.  Maybe replacing all  
the -O/-f options with '-Os -fno-cse-follow-jumps -fomit-frame-
pointer' would help?  It has worked wonders for me on (non-Squeak)  
bytecode interpreters; the Core2 in particular seems hypersensitive to  
locality and alignment of loops/jumps at the start of cache lines.  
(Attempts to manually allocate machine registers to VM registers  
always reduce performance with gcc-4.3, but I haven't experimented  
with explicit register assignments in the Squeak VM on 64-bit hardware.)

Cheers,
Ian