Switching to use foo struct on Windows VM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Switching to use foo struct on Windows VM

Igor Stasenko
I tried to introduce VM pointers table for use by Exupery, but found
that there's no common way for adding this code because all platforms,
except win32 using foo struct for globals.

I investigated how easy to patch win32 VM for using foo struct and
found that there are little places to change in platform-specific
code.

So i decided to make patch.

1 tinyBenchmarks
using old VM:
 '118518518 bytecodes/sec; 3351243 sends/sec'
  '121673003 bytecodes/sec; 3338403 sends/sec'
 '121788772 bytecodes/sec; 3335847 sends/sec'
 '122020972 bytecodes/sec; 3323125 sends/sec'

using VM with foo struct
 '121327014 bytecodes/sec; 3387727 sends/sec'
'122020972 bytecodes/sec; 3379842 sends/sec'
  '120075046 bytecodes/sec; 3536215 sends/sec'
'120640904 bytecodes/sec; 3335847 sends/sec'

benchmark shows no noticeable difference using foo struct or not.
Maybe this is bad benchmark for this case..

Please , let me know, if my patch is acceptable, from this depends the
way how i implement VM pointers table. :)



win32-foo.1.cs (2K) Download Attachment
sqWin32.rar (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
added as issue 0006561 on mantis

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Andreas.Raab
In reply to this post by Igor Stasenko
sig wrote:
> I tried to introduce VM pointers table for use by Exupery, but found
> that there's no common way for adding this code because all platforms,
> except win32 using foo struct for globals.

Can you say what the requirements for this patch are? E.g., why exactly
does it matter if the VM is compiled with struct foo or not?

> benchmark shows no noticeable difference using foo struct or not.
> Maybe this is bad benchmark for this case..

This result is quite surprising. When John originally introduced this
option, x86 was significantly slower when compiling with than without
it. As a matter of fact, given that probably some 90+% of all Squeak
platforms are now x86 I was thinking about removing it altogether (after
all, it's just a pointless memory dereferencing which is only
advantageous on platforms that don't have direct addressing modes).

> Please , let me know, if my patch is acceptable, from this depends the
> way how i implement VM pointers table. :)

To be blunt, there are two things I don't like about it: First, it
introduces the need for another dereferencing in an already
register-deprived model. Second, anything containing "struct foo fum" is
immediately on my list of things I never want to see in my code.
Changing these names to something sensible would make it a lot easier to
convince me about the changes.

However, I can probably fix up the support code so that it's possible to
compile a "struct foo VM", which I presume is your main need. Although,
given that a "struct foo VM" will compile trivially without the
indirection, it may be easier for you to compile Unix and Mac VMs
without the extra indirection.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

johnmci

On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:

> This result is quite surprising. When John originally introduced  
> this option, x86 was significantly slower when compiling with than  
> without it. As a matter of fact, given that probably some 90+% of  
> all Squeak platforms are now x86 I was thinking about removing it  
> altogether (after all, it's just a pointless memory dereferencing  
> which is only advantageous on platforms that don't have direct  
> addressing modes).
>
>> Please , let me know, if my patch is acceptable, from this depends  
>> the
>> way how i implement VM pointers table. :)
>
> To be blunt, there are two things I don't like about it: First, it  
> introduces the need for another dereferencing in an already  
> register-deprived model. Second, anything containing "struct foo  
> fum" is immediately on my list of things I never want to see in my  
> code. Changing these names to something sensible would make it a  
> lot easier to convince me about the changes.

Ah, well the history why it was Foo was because I had discovered that  
under PPC the usage of a structure would remove one instruction for  
each read or write to a VM memory location. This made a significant  
change to the performance of the PowerPC VM, if you run 1/3 less  
instructions you get more work done. I set out one weekend to alter  
the VM and named the structure Foo as a joke, and then dug deep into  
SLang to figure out how to change it so that references to global  
variables would refer to the Foo structure because I really didn't  
think I was going to be able to change it.  However I was successful  
and left it named Foo as a reminder how well build slang was, oddly  
no one complained until tonight (took years I note).  Also of course  
I had to make it so that you could build the VM with or without the  
feature because as Andreas pointed out it did not produce good  
assembler on the Intel Platform, so getting all that to work was non-
trival.

Lurking in here also was some comments from people wanting to build  
VMs for some special purpose CPUS where they would hang all the  
globals off a single structure pointed to by a register versus having  
1000 separate globals, plus a thought about making a VM with multiple  
VM threads that would only require a register switch to change squeak  
VM processes.

Other notes.

(a) Sometimes depending on the compiler version Arrays are, or are  
not allocated into the structure because of  how the compiler feels  
it should generate the code.  Sometimes it does insane things, other  
times it removed one or two instructions for PowerPC references. This  
behaviour is tied to the compiler version. Truthfully I've not check  
this on macintel to see if it makes any difference, likely not.

(b) The other few none-foo structure variables are variables  
initialized to constants, these could have been moved into foo and an  
initialization routine used to populate them, but work on that never  
happen. I guess if someone wants to change the foo name then those  
few initialized variables should be dragged into the structure for  
completeness as part of the cleanup.


A few years back I noticed Ian was compiling the Unix Intel VM with  
the foo structure and I asked him why? Since I had earlier noted the  
intel performance degradation. I think Ian said he had checked and  
there was no longer an issue and there was no harm in compiling with  
foo for the intel platform.  I believe now what happens is because  
it's declared as struct foo * foo = &fum; you just end up with a  
reference into the dynamic storage area for the VM with the  
precomputed offset being the location of the fum and the variable  
offset. Earlier compilers I guess would first reference the storage  
area to the pointer, then reference the variable into the structure  
which gave the poor performance values.

Because PowerPC is not yet dead, don't all the game consoles use it?  
It would not be wise to abandon this feature because today all  
mainstream platforms are Intel based register-deprived solutions,  
someday that might change.
Well that and PowerPC based macintosh machines likely will still be  
around for 5 to 7  more years given the historical longevity of  
macintosh hardware.


> However, I can probably fix up the support code so that it's  
> possible to compile a "struct foo VM", which I presume is your main  
> need. Although, given that a "struct foo VM" will compile trivially  
> without the indirection, it may be easier for you to compile Unix  
> and Mac VMs without the extra indirection.


A few years back I changed all the mac support code to avoid  
referring to foo or fum or interp.c globals directly and use the vm  
supplied accessors via the interpreterProxy or via interp.c accessor  
routine.


--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

johnmci
In reply to this post by Andreas.Raab
> anything containing "struct foo fum"

In case anyone was slow, it was "Fe Fi Fo Fum" and building things  
(aka standing) on the shoulders of Giants
for this minor change to all the VM work that came before.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Bryce Kampjes
In reply to this post by Andreas.Raab
Andreas Raab writes:
 > sig wrote:
 > > I tried to introduce VM pointers table for use by Exupery, but found
 > > that there's no common way for adding this code because all platforms,
 > > except win32 using foo struct for globals.
 >
 > Can you say what the requirements for this patch are? E.g., why exactly
 > does it matter if the VM is compiled with struct foo or not?

The goal is to provide a generic way of getting pointers to the
interpreters variables and functions. Exupery needs these because it
generates code that does the same thing as the interpreter. Sig needs
these as he's interested in allowing low-level programming to be done
inside the image. At the moment Exupery has a lot of trivial accessor
functions to return the addresses.

The problem is you can't put "&foo->activeContext" into a initialiser
in C as at compile time C can not know where foo points.

Using #returnPrefixFromVariable: to generate the variable accessing
code will also allow generated code to work in VM's that use foo or
don't use foo. #returnPrefixFromVariable: is called when translating
addressOf: for this reason.

I'm guessing that the problem could also be solved by generating
accessors the way that your #addressOf: operation does.

 > > benchmark shows no noticeable difference using foo struct or not.
 > > Maybe this is bad benchmark for this case..
 >
 > This result is quite surprising. When John originally introduced this
 > option, x86 was significantly slower when compiling with than without
 > it. As a matter of fact, given that probably some 90+% of all Squeak
 > platforms are now x86 I was thinking about removing it altogether (after
 > all, it's just a pointless memory dereferencing which is only
 > advantageous on platforms that don't have direct addressing modes).

Low level performance is getting more complex as it gets faster. The
interpreter does not execute many instructions per clock (sorry, I
don't have the numbers handy and they will change depending on
architecture). Given how low the instructions per clock is adding
extra work to the interpreter doesn't matter so long as the extra work
stays inside the delays (probably branch misspredicts) that are
currently limiting the interpreters speed. That's the magic of out of
order execution.

I'd guess that on slower in-order x86 CPUs using foo will have more of
an adverse impact on performance. And having foo is likely to be
most important on slower CPUs including ARMs in phones/handhelds.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Bert Freudenberg
In reply to this post by johnmci

On Jul 15, 2007, at 10:51 , John M McIntosh wrote:

>
> On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:
>
>> This result is quite surprising. When John originally introduced  
>> this option, x86 was significantly slower when compiling with than  
>> without it. As a matter of fact, given that probably some 90+% of  
>> all Squeak platforms are now x86 I was thinking about removing it  
>> altogether (after all, it's just a pointless memory dereferencing  
>> which is only advantageous on platforms that don't have direct  
>> addressing modes).
>>
>>> Please , let me know, if my patch is acceptable, from this  
>>> depends the
>>> way how i implement VM pointers table. :)
>>
>> To be blunt, there are two things I don't like about it: First, it  
>> introduces the need for another dereferencing in an already  
>> register-deprived model. Second, anything containing "struct foo  
>> fum" is immediately on my list of things I never want to see in my  
>> code. Changing these names to something sensible would make it a  
>> lot easier to convince me about the changes.
>
> Ah, well the history why it was Foo was because I had discovered  
> that under PPC the usage of a structure would remove one  
> instruction for each read or write to a VM memory location. This  
> made a significant change to the performance of the PowerPC VM, if  
> you run 1/3 less instructions you get more work done. I set out one  
> weekend to alter the VM and named the structure Foo as a joke, and  
> then dug deep into SLang to figure out how to change it so that  
> references to global variables would refer to the Foo structure  
> because I really didn't think I was going to be able to change it.  
> However I was successful and left it named Foo as a reminder how  
> well build slang was, oddly no one complained until tonight (took  
> years I note).  Also of course I had to make it so that you could  
> build the VM with or without the feature because as Andreas pointed  
> out it did not produce good assembler on the Intel Platform, so  
> getting all that to work was non-trival.
>
> Lurking in here also was some comments from people wanting to build  
> VMs for some special purpose CPUS where they would hang all the  
> globals off a single structure pointed to by a register versus  
> having 1000 separate globals, plus a thought about making a VM with  
> multiple VM threads that would only require a register switch to  
> change squeak VM processes.
>
> Other notes.
>
> (a) Sometimes depending on the compiler version Arrays are, or are  
> not allocated into the structure because of  how the compiler feels  
> it should generate the code.  Sometimes it does insane things,  
> other times it removed one or two instructions for PowerPC  
> references. This behaviour is tied to the compiler version.  
> Truthfully I've not check this on macintel to see if it makes any  
> difference, likely not.
>
> (b) The other few none-foo structure variables are variables  
> initialized to constants, these could have been moved into foo and  
> an initialization routine used to populate them, but work on that  
> never happen. I guess if someone wants to change the foo name then  
> those few initialized variables should be dragged into the  
> structure for completeness as part of the cleanup.
>
>
> A few years back I noticed Ian was compiling the Unix Intel VM with  
> the foo structure and I asked him why? Since I had earlier noted  
> the intel performance degradation. I think Ian said he had checked  
> and there was no longer an issue and there was no harm in compiling  
> with foo for the intel platform.  I believe now what happens is  
> because it's declared as struct foo * foo = &fum; you just end up  
> with a reference into the dynamic storage area for the VM with the  
> precomputed offset being the location of the fum and the variable  
> offset. Earlier compilers I guess would first reference the storage  
> area to the pointer, then reference the variable into the structure  
> which gave the poor performance values.
>
> Because PowerPC is not yet dead, don't all the game consoles use  
> it? It would not be wise to abandon this feature because today all  
> mainstream platforms are Intel based register-deprived solutions,  
> someday that might change.
> Well that and PowerPC based macintosh machines likely will still be  
> around for 5 to 7  more years given the historical longevity of  
> macintosh hardware.
>
>
>> However, I can probably fix up the support code so that it's  
>> possible to compile a "struct foo VM", which I presume is your  
>> main need. Although, given that a "struct foo VM" will compile  
>> trivially without the indirection, it may be easier for you to  
>> compile Unix and Mac VMs without the extra indirection.
>
>
> A few years back I changed all the mac support code to avoid  
> referring to foo or fum or interp.c globals directly and use the vm  
> supplied accessors via the interpreterProxy or via interp.c  
> accessor routine.

Wonder how that would affect the AMD Geode, which is a not-so-modern  
x86 processor, but still quite important for Squeak. Once we get a  
Geode LX we need to seriously measure performance ... what magic bit  
do I need to flip to disable/enable foo fum?

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

johnmci
MacOSPowerPCOS9VMMaker>>createCodeGenerator
"set up a CCodeGenerator for this VMMaker - Mac OS uses the global  
struct and local def of the structure"
        ^CCodeGeneratorGlobalStructure new initialize; globalStructDefined:  
true


overides

VMMaker>>createCodeGenerator
"set up a CCodeGenerator for this VMMaker"
        ^CCodeGenerator new initialize


This override happens for unix, risc, mac, but not for windows which  
is the VMMakerWithFileCopying/Win32VMMaker subclass structure.




> Wonder how that would affect the AMD Geode, which is a not-so-
> modern x86 processor, but still quite important for Squeak. Once we  
> get a Geode LX we need to seriously measure performance ... what  
> magic bit do I need to flip to disable/enable foo fum?
>
> - Bert -
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
In reply to this post by Bert Freudenberg
On 15/07/07, Bert Freudenberg <[hidden email]> wrote:

>
> On Jul 15, 2007, at 10:51 , John M McIntosh wrote:
>
> >
> > On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:
> >
> >> This result is quite surprising. When John originally introduced
> >> this option, x86 was significantly slower when compiling with than
> >> without it. As a matter of fact, given that probably some 90+% of
> >> all Squeak platforms are now x86 I was thinking about removing it
> >> altogether (after all, it's just a pointless memory dereferencing
> >> which is only advantageous on platforms that don't have direct
> >> addressing modes).
> >>

Everywhere when some method uses foo struct, generator places
following line in function:
register struct foo * foo = &fum;

and then uses everywhere  foo->bar.
So, the difference in compiled code when using foo struct or not is minimal:

   mov reg, [bar]   <- using globals
   mov reg, [foo + bar_offset]  <- with foo

Of course, this depends how well GCC optimizes code, but in optimal
case - difference between loading value using direct pointer or using
base+offset is a just few cycles. And i don't think that this may
cause a major speed degradation.

The only platform , which uses another level of indirection is RiscOS
(which passes
'globalStructDefined: false'  to CCodeGeneratorGlobalStructure).
when globalStructDefined: false, it not generates a line in each
function (register struct foo * foo = &fum;) and uses foo directly (it
seems that 'foo' declared somewhere in platform code, because
CCodeGeneratorGlobalStructure omits declaration of foo, when
globalStructDefined: false).


> >>> Please , let me know, if my patch is acceptable, from this
> >>> depends the
> >>> way how i implement VM pointers table. :)
> >>
> >> To be blunt, there are two things I don't like about it: First, it
> >> introduces the need for another dereferencing in an already
> >> register-deprived model. Second, anything containing "struct foo
> >> fum" is immediately on my list of things I never want to see in my
> >> code. Changing these names to something sensible would make it a
> >> lot easier to convince me about the changes.
> >
> > Ah, well the history why it was Foo was because I had discovered
> > that under PPC the usage of a structure would remove one
> > instruction for each read or write to a VM memory location. This
> > made a significant change to the performance of the PowerPC VM, if
> > you run 1/3 less instructions you get more work done. I set out one
> > weekend to alter the VM and named the structure Foo as a joke, and
> > then dug deep into SLang to figure out how to change it so that
> > references to global variables would refer to the Foo structure
> > because I really didn't think I was going to be able to change it.
> > However I was successful and left it named Foo as a reminder how
> > well build slang was, oddly no one complained until tonight (took
> > years I note).  Also of course I had to make it so that you could
> > build the VM with or without the feature because as Andreas pointed
> > out it did not produce good assembler on the Intel Platform, so
> > getting all that to work was non-trival.
> >
> > Lurking in here also was some comments from people wanting to build
> > VMs for some special purpose CPUS where they would hang all the
> > globals off a single structure pointed to by a register versus
> > having 1000 separate globals, plus a thought about making a VM with
> > multiple VM threads that would only require a register switch to
> > change squeak VM processes.
> >
> > Other notes.
> >
> > (a) Sometimes depending on the compiler version Arrays are, or are
> > not allocated into the structure because of  how the compiler feels
> > it should generate the code.  Sometimes it does insane things,
> > other times it removed one or two instructions for PowerPC
> > references. This behaviour is tied to the compiler version.
> > Truthfully I've not check this on macintel to see if it makes any
> > difference, likely not.
> >
> > (b) The other few none-foo structure variables are variables
> > initialized to constants, these could have been moved into foo and
> > an initialization routine used to populate them, but work on that
> > never happen. I guess if someone wants to change the foo name then
> > those few initialized variables should be dragged into the
> > structure for completeness as part of the cleanup.
> >
> >
> > A few years back I noticed Ian was compiling the Unix Intel VM with
> > the foo structure and I asked him why? Since I had earlier noted
> > the intel performance degradation. I think Ian said he had checked
> > and there was no longer an issue and there was no harm in compiling
> > with foo for the intel platform.  I believe now what happens is
> > because it's declared as struct foo * foo = &fum; you just end up
> > with a reference into the dynamic storage area for the VM with the
> > precomputed offset being the location of the fum and the variable
> > offset. Earlier compilers I guess would first reference the storage
> > area to the pointer, then reference the variable into the structure
> > which gave the poor performance values.
> >
> > Because PowerPC is not yet dead, don't all the game consoles use
> > it? It would not be wise to abandon this feature because today all
> > mainstream platforms are Intel based register-deprived solutions,
> > someday that might change.
> > Well that and PowerPC based macintosh machines likely will still be
> > around for 5 to 7  more years given the historical longevity of
> > macintosh hardware.
> >
> >
> >> However, I can probably fix up the support code so that it's
> >> possible to compile a "struct foo VM", which I presume is your
> >> main need. Although, given that a "struct foo VM" will compile
> >> trivially without the indirection, it may be easier for you to
> >> compile Unix and Mac VMs without the extra indirection.
> >
 What i would like to see, is to make sources unified for different platforms.
The situation is simple: i made modifications to VM and all working
fine, but only for Win32 platform, because i was not aware that
other's using foo struct.
Well, i can make things work regardless CCodeGenerator uses foo struct or not.

> >
> > A few years back I changed all the mac support code to avoid
> > referring to foo or fum or interp.c globals directly and use the vm
> > supplied accessors via the interpreterProxy or via interp.c
> > accessor routine.
>
> Wonder how that would affect the AMD Geode, which is a not-so-modern
> x86 processor, but still quite important for Squeak. Once we get a
> Geode LX we need to seriously measure performance ... what magic bit
> do I need to flip to disable/enable foo fum?
>
See overridden method #createCodeGenerator
to use foo, it uses CCodeGeneratorGlobalStructure
to use globals - simple CCodeGenerator.

I don't think that switching back to globals will introduce problems
in generated code which prevent it from building. Event if so, the
code will require few fixes.

> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
In reply to this post by Bryce Kampjes
Another point why i'd prefer to use a single struct (call it foo, or
anything else) for interpreter globals, is to encapsulate all global
values in single place:
- VM variables
- pointers to VM functions.

And in generated code use foo->bar for values, and foo->bar(...) for
function calls.

This will give me ability to replace a function pointer with own code
on the fly in running VM, without recompiling code at all.
And moreover, this eliminates the need in having InterpreterProxy
variable for each plugin.

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

johnmci
In reply to this post by Igor Stasenko

On Jul 15, 2007, at 11:55 AM, sig wrote:
> Everywhere when some method uses foo struct, generator places
> following line in function:
> register struct foo * foo = &fum;

I believe we only generate that if the foo structure was used in the  
routine more than once.
On powerpc this was a clue that the structure pointer should be in a  
register which gain us some performance
in earlier versions of GC. In later GCC compilers it seems they  
ignore the register hint now.  I once tried to use
the GCC global register hint, which worked quite well, but was  
fraught with issues if all the plugins were not
recompiled and if foo was not setup before anyone invoked a interp.c  
routine as part of VM setup.


>
> and then uses everywhere  foo->bar.
> So, the difference in compiled code when using foo struct or not is  
> minimal:
>
>   mov reg, [bar]   <- using globals
>   mov reg, [foo + bar_offset]  <- with foo
>
> Of course, this depends how well GCC optimizes code, but in optimal
> case - difference between loading value using direct pointer or using
> base+offset is a just few cycles. And i don't think that this may
> cause a major speed degradation.

A cycle here, a cycle there, add up to real cycles.
This is the first byte code in intel assembler properly optimized.

L10161:
        addl $1, %esi
        movzbl (%esi), %ebx
        addl $4, %edi
        movl _foo, %eax
        movl 84(%eax), %eax
        movl 4(%eax), %eax
        movl %eax, (%edi)
        movl 512(%esp,%ebx,4), %eax
L10421:
        jmp *%eax


less than optimal compiles can result in 12 instructions,  9 versus  
12 instructions does equal a difference in real physical time.


--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
On 15/07/07, John M McIntosh <[hidden email]> wrote:

>
> On Jul 15, 2007, at 11:55 AM, sig wrote:
> > Everywhere when some method uses foo struct, generator places
> > following line in function:
> > register struct foo * foo = &fum;
>
> I believe we only generate that if the foo structure was used in the
> routine more than once.
> On powerpc this was a clue that the structure pointer should be in a
> register which gain us some performance
> in earlier versions of GC. In later GCC compilers it seems they
> ignore the register hint now.  I once tried to use
> the GCC global register hint, which worked quite well, but was
> fraught with issues if all the plugins were not
> recompiled and if foo was not setup before anyone invoked a interp.c
> routine as part of VM setup.
>


>
> >
> > and then uses everywhere  foo->bar.
> > So, the difference in compiled code when using foo struct or not is
> > minimal:
> >
> >   mov reg, [bar]   <- using globals
> >   mov reg, [foo + bar_offset]  <- with foo
> >
> > Of course, this depends how well GCC optimizes code, but in optimal
> > case - difference between loading value using direct pointer or using
> > base+offset is a just few cycles. And i don't think that this may
> > cause a major speed degradation.
>
> A cycle here, a cycle there, add up to real cycles.
> This is the first byte code in intel assembler properly optimized.
>
> L10161:
>         addl    $1, %esi
>         movzbl  (%esi), %ebx
>         addl    $4, %edi
>         movl    _foo, %eax
>         movl    84(%eax), %eax
>         movl    4(%eax), %eax
>         movl    %eax, (%edi)
>         movl    512(%esp,%ebx,4), %eax
> L10421:
>         jmp     *%eax
>
>
> less than optimal compiles can result in 12 instructions,  9 versus
> 12 instructions does equal a difference in real physical time.
>

While you, people, fighting with different GCC compilers to force them
produce optimal code, my intent is to PROVIDE this optimal code
written by hands and compiled by Exupery. And in my case, if things go
well, example above will prove nothing, because i will be able to
reimplement any VM function (even interpret() ) and have much better
control on how to avoid producing extra jumps/calls.

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Bryce Kampjes
In reply to this post by Igor Stasenko
sig writes:
 > On 15/07/07, Bert Freudenberg <[hidden email]> wrote:
 > Everywhere when some method uses foo struct, generator places
 > following line in function:
 > register struct foo * foo = &fum;
 >
 > and then uses everywhere  foo->bar.
 > So, the difference in compiled code when using foo struct or not is minimal:
 >
 >    mov reg, [bar]   <- using globals
 >    mov reg, [foo + bar_offset]  <- with foo
 >
 > Of course, this depends how well GCC optimizes code, but in optimal
 > case - difference between loading value using direct pointer or using
 > base+offset is a just few cycles. And i don't think that this may
 > cause a major speed degradation.

The cost is to be efficient you need to use a register to
hold foo. The x86 is register starved with only 6 or 7 registers
available.

It's so bad that people will commonly compile with a compiler flag to
free up the frame pointer which makes debugging much harder as the
debugger can no longer reliably find the stack. This frees up 1
register which can provide a 20% performance improvement.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Bryce Kampjes
In reply to this post by Igor Stasenko
sig writes:
 > Another point why i'd prefer to use a single struct (call it foo, or
 > anything else) for interpreter globals, is to encapsulate all global
 > values in single place:
 > - VM variables
 > - pointers to VM functions.
 >
 > And in generated code use foo->bar for values, and foo->bar(...) for
 > function calls.
 >
 > This will give me ability to replace a function pointer with own code
 > on the fly in running VM, without recompiling code at all.
 > And moreover, this eliminates the need in having InterpreterProxy
 > variable for each plugin.

There are two separate questions here:
 * Should you be able to always use foo?
 * Should other people be able to not use foo?

In my opinion the ideal answer is yes to both questions.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

johnmci
In reply to this post by Igor Stasenko

> While you, people, fighting with different GCC compilers to force them
> produce optimal code, my intent is to PROVIDE this optimal code
> written by hands and compiled by Exupery. And in my case, if things go
> well, example above will prove nothing, because i will be able to
> reimplement any VM function (even interpret() ) and have much better
> control on how to avoid producing extra jumps/calls.

Well sure all you need to do is take

pushReceiverVariableBytecode

        self fetchNextBytecode.
        "this bytecode will be expanded so that refs to currentBytecode  
below will be constant"
        self pushReceiverVariable: (currentBytecode bitAnd: 16rF).


which requires all these routines

fetchNextBytecode
        "This method fetches the next instruction (bytecode). Each bytecode  
method is responsible for fetching the next bytecode, preferably as  
early as possible to allow the memory system time to process the  
request before the next dispatch."

        currentBytecode := self fetchByte.


fetchByte
        "This method uses the preIncrement builtin function which has no  
Smalltalk equivalent. Thus, it must be overridden in the simulator."

        ^ self byteAtPointer: localIP preIncrement


pushReceiverVariable: fieldIndex

        self internalPush:
                (self fetchPointer: fieldIndex ofObject: receiver).


fetchPointer: fieldIndex ofObject: oop
        "index by word size, and return a pointer as long as the word size"

        ^ self longAt: oop + BaseHeaderSize + (fieldIndex << ShiftForWord)

internalPush: object

        self longAtPointer: (localSP := localSP + BytesPerWord) put: object.


longAtPointer: pointer put: longValue
        "This gets implemented by Macros in C, where its types will also be  
checked.
        pointer is a raw address, and longValue is the width of a machine  
word."

        ^ self longAt: pointer put: longValue


which SLANG mushes into

                CASE(0)
                        /* pushReceiverVariableBytecode */
                        {
                                /* begin fetchNextBytecode */
                                currentBytecode = byteAtPointer(++localIP);
                                /* begin pushReceiverVariable: */
                                /* begin internalPush: */
                                longAtPointerput(localSP += BytesPerWord, longAt((foo->receiver +  
BaseHeaderSize) + ((0 & 15) << ShiftForWord)));
                        }



Then provide proper assembler for Intel (AMD/variations), powerpc,  
Risc, unknown. Although you can argue you could ignore 10% or less of  
the
population and just do intel, but the compiler and instruction  
purists would argue not all intel like CPUS like the same sequence of  
instruction mixes.

LIkely of course hand coded assembler *might* be better, although I  
think people now seem to think with multiple execution unit hardware and
smarter compilers that statement is becoming difficult to prove.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
In reply to this post by Bryce Kampjes
On 16/07/07, [hidden email] <[hidden email]> wrote:

> sig writes:
>  > Another point why i'd prefer to use a single struct (call it foo, or
>  > anything else) for interpreter globals, is to encapsulate all global
>  > values in single place:
>  > - VM variables
>  > - pointers to VM functions.
>  >
>  > And in generated code use foo->bar for values, and foo->bar(...) for
>  > function calls.
>  >
>  > This will give me ability to replace a function pointer with own code
>  > on the fly in running VM, without recompiling code at all.
>  > And moreover, this eliminates the need in having InterpreterProxy
>  > variable for each plugin.
>
> There are two separate questions here:
>  * Should you be able to always use foo?
>  * Should other people be able to not use foo?
>
> In my opinion the ideal answer is yes to both questions.
>
This depends on how well VM infrastructure organized.
In ideal situation, there must be a single global variable
static foo * VM.
This variable can be a pointer to foo struct or simply a value -
depends on if you want  to be able switching between different
interpreters using single executable, as someone suggested.
In current code, foo always assigned to &fum , so its not possible to
switch between different VM's. And semantically using foo->bar is the
same as using fum.bar.
All plugins using InterpreterProxy, and already calling VM functions
indirectly.
I see no big harm to make VM behave similar - call it's functions indirectly.

> Bryce
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Igor Stasenko
In reply to this post by Bryce Kampjes
On 16/07/07, [hidden email] <[hidden email]> wrote:

> sig writes:
>  > Another point why i'd prefer to use a single struct (call it foo, or
>  > anything else) for interpreter globals, is to encapsulate all global
>  > values in single place:
>  > - VM variables
>  > - pointers to VM functions.
>  >
>  > And in generated code use foo->bar for values, and foo->bar(...) for
>  > function calls.
>  >
>  > This will give me ability to replace a function pointer with own code
>  > on the fly in running VM, without recompiling code at all.
>  > And moreover, this eliminates the need in having InterpreterProxy
>  > variable for each plugin.
>

I changed code to generate indirect calls everywheren in interp.c.
See results:

1 tinyBenchmarks
direct calls:
'120640904 bytecodes/sec; 3180012 sends/sec'
'118518518 bytecodes/sec; 3260940 sends/sec'
'119962511 bytecodes/sec; 3253634 sends/sec'
'119180633 bytecodes/sec; 3227123 sends/sec'
'117323556 bytecodes/sec; 3227123 sends/sec'


indirect calls:
'119626168 bytecodes/sec; 3263383 sends/sec'
'118848653 bytecodes/sec; 3219968 sends/sec'
'118408880 bytecodes/sec; 3305475 sends/sec'
'118628359 bytecodes/sec; 3441245 sends/sec'
'117972350 bytecodes/sec; 3273190 sends/sec'

As you suggested yearly, the main bottleneck is branch mispredicting.
As you can see benchmarks results, difference lies in error bounds. It
may be slower than direct calls ( by 1/1000 maybe).
At least on my AMD Athlon 1.1 Ghz i see no reason, why i must
sacrifice having VM with ability to replace different functions at run
time for 1/1000 speed boost.

Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

timrowledge
In reply to this post by johnmci
>
It also makes a significant improvement on ARM machines; y'know, the  
*other* 50% of all the 32bit cpus in the world (or thereabouts) such  
as pretty much every cellphone, fax, router, camera and, oh yes the  
iPhone.

I couldn't care less how silly the name seems. If it bothers you that  
much then change it.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Full of wisdumb.



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

timrowledge
In reply to this post by Igor Stasenko

On 15-Jul-07, at 11:55 AM, sig wrote:

>
> The only platform , which uses another level of indirection is RiscOS
> (which passes
> 'globalStructDefined: false'  to CCodeGeneratorGlobalStructure).
> when globalStructDefined: false, it not generates a line in each
> function (register struct foo * foo = &fum;) and uses foo directly (it
> seems that 'foo' declared somewhere in platform code, because
> CCodeGeneratorGlobalStructure omits declaration of foo, when
> globalStructDefined: false).
The ARM compiler makes it nice and easy to declare global register  
variables and foo is so declared. It means that all those globals are  
accessible by a nice simple
LDR val, [foo, #offsetforval]
instead of
LDR val, [stackframe base, #offset1]
LDR val,[val, #offsetforval]
which also gets replicated for stores.

The idea for global register variables was (so far as I know) another  
bit of genius from Eliot; he had been faking it by spoofing the SUN  
compiler and since I couldn't be bothered to try the same trickery on  
the ARM cc I spoke to the guys at ARM that wrote the compiler and  
persuaded them to add the facility as a proper pragma. IIRC it was  
worth about 30% performance back in 1988 on a 12MHz ARM3 system. At  
some later date I believe Eliot was able to persuade the gcc people  
to add a similar capability.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
If it was easy, the hardware people would take care of it.



Reply | Threaded
Open this post in threaded view
|

Re: Switching to use foo struct on Windows VM

Bryce Kampjes
In reply to this post by timrowledge
tim Rowledge writes:
 > >
 > It also makes a significant improvement on ARM machines; y'know, the  
 > *other* 50% of all the 32bit cpus in the world (or thereabouts) such  
 > as pretty much every cellphone, fax, router, camera and, oh yes the  
 > iPhone.

And gaining fast if only because x86 is slowly moving to 64 bit.

Bryce