Smalltalk › Squeak › Squeak - Dev

Switching to use foo struct on Windows VM

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

20 messages Options

Igor Stasenko

Switching to use foo struct on Windows VM

I tried to introduce VM pointers table for use by Exupery, but found
that there's no common way for adding this code because all platforms,
except win32 using foo struct for globals.

I investigated how easy to patch win32 VM for using foo struct and
found that there are little places to change in platform-specific
code.

So i decided to make patch.

1 tinyBenchmarks
using old VM:
'118518518 bytecodes/sec; 3351243 sends/sec'
'121673003 bytecodes/sec; 3338403 sends/sec'
'121788772 bytecodes/sec; 3335847 sends/sec'
'122020972 bytecodes/sec; 3323125 sends/sec'

using VM with foo struct
'121327014 bytecodes/sec; 3387727 sends/sec'
'122020972 bytecodes/sec; 3379842 sends/sec'
'120075046 bytecodes/sec; 3536215 sends/sec'
'120640904 bytecodes/sec; 3335847 sends/sec'

benchmark shows no noticeable difference using foo struct or not.
Maybe this is bad benchmark for this case..

Please , let me know, if my patch is acceptable, from this depends the
way how i implement VM pointers table. :)

win32-foo.1.cs (2K) Download Attachment

sqWin32.rar (47K) Download Attachment

Igor Stasenko

Re: Switching to use foo struct on Windows VM

added as issue 0006561 on mantis

Andreas.Raab

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

sig wrote:
> I tried to introduce VM pointers table for use by Exupery, but found
> that there's no common way for adding this code because all platforms,
> except win32 using foo struct for globals.

Can you say what the requirements for this patch are? E.g., why exactly
does it matter if the VM is compiled with struct foo or not?

> benchmark shows no noticeable difference using foo struct or not.
> Maybe this is bad benchmark for this case..

This result is quite surprising. When John originally introduced this
option, x86 was significantly slower when compiling with than without
it. As a matter of fact, given that probably some 90+% of all Squeak
platforms are now x86 I was thinking about removing it altogether (after
all, it's just a pointless memory dereferencing which is only
advantageous on platforms that don't have direct addressing modes).

> Please , let me know, if my patch is acceptable, from this depends the
> way how i implement VM pointers table. :)

To be blunt, there are two things I don't like about it: First, it
introduces the need for another dereferencing in an already
register-deprived model. Second, anything containing "struct foo fum" is
immediately on my list of things I never want to see in my code.
Changing these names to something sensible would make it a lot easier to
convince me about the changes.

However, I can probably fix up the support code so that it's possible to
compile a "struct foo VM", which I presume is your main need. Although,
given that a "struct foo VM" will compile trivially without the
indirection, it may be easier for you to compile Unix and Mac VMs
without the extra indirection.

Cheers,
- Andreas

johnmci

Re: Switching to use foo struct on Windows VM

On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:

> This result is quite surprising. When John originally introduced
> this option, x86 was significantly slower when compiling with than
> without it. As a matter of fact, given that probably some 90+% of
> all Squeak platforms are now x86 I was thinking about removing it
> altogether (after all, it's just a pointless memory dereferencing
> which is only advantageous on platforms that don't have direct
> addressing modes).
>
>> Please , let me know, if my patch is acceptable, from this depends
>> the
>> way how i implement VM pointers table. :)
>
> To be blunt, there are two things I don't like about it: First, it
> introduces the need for another dereferencing in an already
> register-deprived model. Second, anything containing "struct foo
> fum" is immediately on my list of things I never want to see in my
> code. Changing these names to something sensible would make it a
> lot easier to convince me about the changes.

Ah, well the history why it was Foo was because I had discovered that
under PPC the usage of a structure would remove one instruction for
each read or write to a VM memory location. This made a significant
change to the performance of the PowerPC VM, if you run 1/3 less
instructions you get more work done. I set out one weekend to alter
the VM and named the structure Foo as a joke, and then dug deep into
SLang to figure out how to change it so that references to global
variables would refer to the Foo structure because I really didn't
think I was going to be able to change it. However I was successful
and left it named Foo as a reminder how well build slang was, oddly
no one complained until tonight (took years I note). Also of course
I had to make it so that you could build the VM with or without the
feature because as Andreas pointed out it did not produce good
assembler on the Intel Platform, so getting all that to work was non-
trival.

Lurking in here also was some comments from people wanting to build
VMs for some special purpose CPUS where they would hang all the
globals off a single structure pointed to by a register versus having
1000 separate globals, plus a thought about making a VM with multiple
VM threads that would only require a register switch to change squeak
VM processes.

Other notes.

(a) Sometimes depending on the compiler version Arrays are, or are
not allocated into the structure because of how the compiler feels
it should generate the code. Sometimes it does insane things, other
times it removed one or two instructions for PowerPC references. This
behaviour is tied to the compiler version. Truthfully I've not check
this on macintel to see if it makes any difference, likely not.

(b) The other few none-foo structure variables are variables
initialized to constants, these could have been moved into foo and an
initialization routine used to populate them, but work on that never
happen. I guess if someone wants to change the foo name then those
few initialized variables should be dragged into the structure for
completeness as part of the cleanup.

A few years back I noticed Ian was compiling the Unix Intel VM with
the foo structure and I asked him why? Since I had earlier noted the
intel performance degradation. I think Ian said he had checked and
there was no longer an issue and there was no harm in compiling with
foo for the intel platform. I believe now what happens is because
it's declared as struct foo * foo = &fum; you just end up with a
reference into the dynamic storage area for the VM with the
precomputed offset being the location of the fum and the variable
offset. Earlier compilers I guess would first reference the storage
area to the pointer, then reference the variable into the structure
which gave the poor performance values.

Because PowerPC is not yet dead, don't all the game consoles use it?
It would not be wise to abandon this feature because today all
mainstream platforms are Intel based register-deprived solutions,
someday that might change.
Well that and PowerPC based macintosh machines likely will still be
around for 5 to 7 more years given the historical longevity of
macintosh hardware.

> However, I can probably fix up the support code so that it's
> possible to compile a "struct foo VM", which I presume is your main
> need. Although, given that a "struct foo VM" will compile trivially
> without the indirection, it may be easier for you to compile Unix
> and Mac VMs without the extra indirection.

A few years back I changed all the mac support code to avoid
referring to foo or fum or interp.c globals directly and use the vm
supplied accessors via the interpreterProxy or via interp.c accessor
routine.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

johnmci

Re: Switching to use foo struct on Windows VM

In reply to this post by Andreas.Raab

> anything containing "struct foo fum"

In case anyone was slow, it was "Fe Fi Fo Fum" and building things
(aka standing) on the shoulders of Giants
for this minor change to all the VM work that came before.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Bryce Kampjes

Re: Switching to use foo struct on Windows VM

In reply to this post by Andreas.Raab

Andreas Raab writes:
> sig wrote:
> > I tried to introduce VM pointers table for use by Exupery, but found
> > that there's no common way for adding this code because all platforms,
> > except win32 using foo struct for globals.
>
> Can you say what the requirements for this patch are? E.g., why exactly
> does it matter if the VM is compiled with struct foo or not?

The goal is to provide a generic way of getting pointers to the
interpreters variables and functions. Exupery needs these because it
generates code that does the same thing as the interpreter. Sig needs
these as he's interested in allowing low-level programming to be done
inside the image. At the moment Exupery has a lot of trivial accessor
functions to return the addresses.

The problem is you can't put "&foo->activeContext" into a initialiser
in C as at compile time C can not know where foo points.

Using #returnPrefixFromVariable: to generate the variable accessing
code will also allow generated code to work in VM's that use foo or
don't use foo. #returnPrefixFromVariable: is called when translating
addressOf: for this reason.

I'm guessing that the problem could also be solved by generating
accessors the way that your #addressOf: operation does.

> > benchmark shows no noticeable difference using foo struct or not.
> > Maybe this is bad benchmark for this case..
>
> This result is quite surprising. When John originally introduced this
> option, x86 was significantly slower when compiling with than without
> it. As a matter of fact, given that probably some 90+% of all Squeak
> platforms are now x86 I was thinking about removing it altogether (after
> all, it's just a pointless memory dereferencing which is only
> advantageous on platforms that don't have direct addressing modes).

Low level performance is getting more complex as it gets faster. The
interpreter does not execute many instructions per clock (sorry, I
don't have the numbers handy and they will change depending on
architecture). Given how low the instructions per clock is adding
extra work to the interpreter doesn't matter so long as the extra work
stays inside the delays (probably branch misspredicts) that are
currently limiting the interpreters speed. That's the magic of out of
order execution.

I'd guess that on slower in-order x86 CPUs using foo will have more of
an adverse impact on performance. And having foo is likely to be
most important on slower CPUs including ARMs in phones/handhelds.

Bryce

Bert Freudenberg

Re: Switching to use foo struct on Windows VM

In reply to this post by johnmci

On Jul 15, 2007, at 10:51 , John M McIntosh wrote:

>
> On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:
>
>> This result is quite surprising. When John originally introduced
>> this option, x86 was significantly slower when compiling with than
>> without it. As a matter of fact, given that probably some 90+% of
>> all Squeak platforms are now x86 I was thinking about removing it
>> altogether (after all, it's just a pointless memory dereferencing
>> which is only advantageous on platforms that don't have direct
>> addressing modes).
>>
>>> Please , let me know, if my patch is acceptable, from this
>>> depends the
>>> way how i implement VM pointers table. :)
>>
>> To be blunt, there are two things I don't like about it: First, it
>> introduces the need for another dereferencing in an already
>> register-deprived model. Second, anything containing "struct foo
>> fum" is immediately on my list of things I never want to see in my
>> code. Changing these names to something sensible would make it a
>> lot easier to convince me about the changes.
>
> Ah, well the history why it was Foo was because I had discovered
> that under PPC the usage of a structure would remove one
> instruction for each read or write to a VM memory location. This
> made a significant change to the performance of the PowerPC VM, if
> you run 1/3 less instructions you get more work done. I set out one
> weekend to alter the VM and named the structure Foo as a joke, and
> then dug deep into SLang to figure out how to change it so that
> references to global variables would refer to the Foo structure
> because I really didn't think I was going to be able to change it.
> However I was successful and left it named Foo as a reminder how
> well build slang was, oddly no one complained until tonight (took
> years I note). Also of course I had to make it so that you could
> build the VM with or without the feature because as Andreas pointed
> out it did not produce good assembler on the Intel Platform, so
> getting all that to work was non-trival.
>
> Lurking in here also was some comments from people wanting to build
> VMs for some special purpose CPUS where they would hang all the
> globals off a single structure pointed to by a register versus
> having 1000 separate globals, plus a thought about making a VM with
> multiple VM threads that would only require a register switch to
> change squeak VM processes.
>
> Other notes.
>
> (a) Sometimes depending on the compiler version Arrays are, or are
> not allocated into the structure because of how the compiler feels
> it should generate the code. Sometimes it does insane things,
> other times it removed one or two instructions for PowerPC
> references. This behaviour is tied to the compiler version.
> Truthfully I've not check this on macintel to see if it makes any
> difference, likely not.
>
> (b) The other few none-foo structure variables are variables
> initialized to constants, these could have been moved into foo and
> an initialization routine used to populate them, but work on that
> never happen. I guess if someone wants to change the foo name then
> those few initialized variables should be dragged into the
> structure for completeness as part of the cleanup.
>
>
> A few years back I noticed Ian was compiling the Unix Intel VM with
> the foo structure and I asked him why? Since I had earlier noted
> the intel performance degradation. I think Ian said he had checked
> and there was no longer an issue and there was no harm in compiling
> with foo for the intel platform. I believe now what happens is
> because it's declared as struct foo * foo = &fum; you just end up
> with a reference into the dynamic storage area for the VM with the
> precomputed offset being the location of the fum and the variable
> offset. Earlier compilers I guess would first reference the storage
> area to the pointer, then reference the variable into the structure
> which gave the poor performance values.
>
> Because PowerPC is not yet dead, don't all the game consoles use
> it? It would not be wise to abandon this feature because today all
> mainstream platforms are Intel based register-deprived solutions,
> someday that might change.
> Well that and PowerPC based macintosh machines likely will still be
> around for 5 to 7 more years given the historical longevity of
> macintosh hardware.
>
>
>> However, I can probably fix up the support code so that it's
>> possible to compile a "struct foo VM", which I presume is your
>> main need. Although, given that a "struct foo VM" will compile
>> trivially without the indirection, it may be easier for you to
>> compile Unix and Mac VMs without the extra indirection.
>
>
> A few years back I changed all the mac support code to avoid
> referring to foo or fum or interp.c globals directly and use the vm
> supplied accessors via the interpreterProxy or via interp.c
> accessor routine.

Wonder how that would affect the AMD Geode, which is a not-so-modern
x86 processor, but still quite important for Squeak. Once we get a
Geode LX we need to seriously measure performance ... what magic bit
do I need to flip to disable/enable foo fum?

- Bert -

johnmci

Re: Switching to use foo struct on Windows VM

MacOSPowerPCOS9VMMaker>>createCodeGenerator
"set up a CCodeGenerator for this VMMaker - Mac OS uses the global
struct and local def of the structure"
^CCodeGeneratorGlobalStructure new initialize; globalStructDefined:
true

overides

VMMaker>>createCodeGenerator
"set up a CCodeGenerator for this VMMaker"
^CCodeGenerator new initialize

This override happens for unix, risc, mac, but not for windows which
is the VMMakerWithFileCopying/Win32VMMaker subclass structure.

> Wonder how that would affect the AMD Geode, which is a not-so-
> modern x86 processor, but still quite important for Squeak. Once we
> get a Geode LX we need to seriously measure performance ... what
> magic bit do I need to flip to disable/enable foo fum?
>
> - Bert -
>
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Igor Stasenko

Re: Switching to use foo struct on Windows VM

In reply to this post by Bert Freudenberg

On 15/07/07, Bert Freudenberg <[hidden email]> wrote:

>
> On Jul 15, 2007, at 10:51 , John M McIntosh wrote:
>
> >
> > On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote:
> >
> >> This result is quite surprising. When John originally introduced
> >> this option, x86 was significantly slower when compiling with than
> >> without it. As a matter of fact, given that probably some 90+% of
> >> all Squeak platforms are now x86 I was thinking about removing it
> >> altogether (after all, it's just a pointless memory dereferencing
> >> which is only advantageous on platforms that don't have direct
> >> addressing modes).
> >>

Everywhere when some method uses foo struct, generator places
following line in function:
register struct foo * foo = &fum;

and then uses everywhere foo->bar.
So, the difference in compiled code when using foo struct or not is minimal:

mov reg, [bar] <- using globals
mov reg, [foo + bar_offset] <- with foo

Of course, this depends how well GCC optimizes code, but in optimal
case - difference between loading value using direct pointer or using
base+offset is a just few cycles. And i don't think that this may
cause a major speed degradation.

The only platform , which uses another level of indirection is RiscOS
(which passes
'globalStructDefined: false' to CCodeGeneratorGlobalStructure).
when globalStructDefined: false, it not generates a line in each
function (register struct foo * foo = &fum;) and uses foo directly (it
seems that 'foo' declared somewhere in platform code, because
CCodeGeneratorGlobalStructure omits declaration of foo, when
globalStructDefined: false).

> >>> Please , let me know, if my patch is acceptable, from this
> >>> depends the
> >>> way how i implement VM pointers table. :)
> >>
> >> To be blunt, there are two things I don't like about it: First, it
> >> introduces the need for another dereferencing in an already
> >> register-deprived model. Second, anything containing "struct foo
> >> fum" is immediately on my list of things I never want to see in my
> >> code. Changing these names to something sensible would make it a
> >> lot easier to convince me about the changes.
> >
> > Ah, well the history why it was Foo was because I had discovered
> > that under PPC the usage of a structure would remove one
> > instruction for each read or write to a VM memory location. This
> > made a significant change to the performance of the PowerPC VM, if
> > you run 1/3 less instructions you get more work done. I set out one
> > weekend to alter the VM and named the structure Foo as a joke, and
> > then dug deep into SLang to figure out how to change it so that
> > references to global variables would refer to the Foo structure
> > because I really didn't think I was going to be able to change it.
> > However I was successful and left it named Foo as a reminder how
> > well build slang was, oddly no one complained until tonight (took
> > years I note). Also of course I had to make it so that you could
> > build the VM with or without the feature because as Andreas pointed
> > out it did not produce good assembler on the Intel Platform, so
> > getting all that to work was non-trival.
> >
> > Lurking in here also was some comments from people wanting to build
> > VMs for some special purpose CPUS where they would hang all the
> > globals off a single structure pointed to by a register versus
> > having 1000 separate globals, plus a thought about making a VM with
> > multiple VM threads that would only require a register switch to
> > change squeak VM processes.
> >
> > Other notes.
> >
> > (a) Sometimes depending on the compiler version Arrays are, or are
> > not allocated into the structure because of how the compiler feels
> > it should generate the code. Sometimes it does insane things,
> > other times it removed one or two instructions for PowerPC
> > references. This behaviour is tied to the compiler version.
> > Truthfully I've not check this on macintel to see if it makes any
> > difference, likely not.
> >
> > (b) The other few none-foo structure variables are variables
> > initialized to constants, these could have been moved into foo and
> > an initialization routine used to populate them, but work on that
> > never happen. I guess if someone wants to change the foo name then
> > those few initialized variables should be dragged into the
> > structure for completeness as part of the cleanup.
> >
> >
> > A few years back I noticed Ian was compiling the Unix Intel VM with
> > the foo structure and I asked him why? Since I had earlier noted
> > the intel performance degradation. I think Ian said he had checked
> > and there was no longer an issue and there was no harm in compiling
> > with foo for the intel platform. I believe now what happens is
> > because it's declared as struct foo * foo = &fum; you just end up
> > with a reference into the dynamic storage area for the VM with the
> > precomputed offset being the location of the fum and the variable
> > offset. Earlier compilers I guess would first reference the storage
> > area to the pointer, then reference the variable into the structure
> > which gave the poor performance values.
> >
> > Because PowerPC is not yet dead, don't all the game consoles use
> > it? It would not be wise to abandon this feature because today all
> > mainstream platforms are Intel based register-deprived solutions,
> > someday that might change.
> > Well that and PowerPC based macintosh machines likely will still be
> > around for 5 to 7 more years given the historical longevity of
> > macintosh hardware.
> >
> >
> >> However, I can probably fix up the support code so that it's
> >> possible to compile a "struct foo VM", which I presume is your
> >> main need. Although, given that a "struct foo VM" will compile
> >> trivially without the indirection, it may be easier for you to
> >> compile Unix and Mac VMs without the extra indirection.
> >

What i would like to see, is to make sources unified for different platforms.
The situation is simple: i made modifications to VM and all working
fine, but only for Win32 platform, because i was not aware that
other's using foo struct.
Well, i can make things work regardless CCodeGenerator uses foo struct or not.

> >
> > A few years back I changed all the mac support code to avoid
> > referring to foo or fum or interp.c globals directly and use the vm
> > supplied accessors via the interpreterProxy or via interp.c
> > accessor routine.
>
> Wonder how that would affect the AMD Geode, which is a not-so-modern
> x86 processor, but still quite important for Squeak. Once we get a
> Geode LX we need to seriously measure performance ... what magic bit
> do I need to flip to disable/enable foo fum?
>

See overridden method #createCodeGenerator
to use foo, it uses CCodeGeneratorGlobalStructure
to use globals - simple CCodeGenerator.

I don't think that switching back to globals will introduce problems
in generated code which prevent it from building. Event if so, the
code will require few fixes.

> - Bert -
>
>
>

Igor Stasenko

Re: Switching to use foo struct on Windows VM

In reply to this post by Bryce Kampjes

Another point why i'd prefer to use a single struct (call it foo, or
anything else) for interpreter globals, is to encapsulate all global
values in single place:
- VM variables
- pointers to VM functions.

And in generated code use foo->bar for values, and foo->bar(...) for
function calls.

This will give me ability to replace a function pointer with own code
on the fly in running VM, without recompiling code at all.
And moreover, this eliminates the need in having InterpreterProxy
variable for each plugin.

johnmci

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

On Jul 15, 2007, at 11:55 AM, sig wrote:
> Everywhere when some method uses foo struct, generator places
> following line in function:
> register struct foo * foo = &fum;

I believe we only generate that if the foo structure was used in the
routine more than once.
On powerpc this was a clue that the structure pointer should be in a
register which gain us some performance
in earlier versions of GC. In later GCC compilers it seems they
ignore the register hint now. I once tried to use
the GCC global register hint, which worked quite well, but was
fraught with issues if all the plugins were not
recompiled and if foo was not setup before anyone invoked a interp.c
routine as part of VM setup.

>
> and then uses everywhere foo->bar.
> So, the difference in compiled code when using foo struct or not is
> minimal:
>
> mov reg, [bar] <- using globals
> mov reg, [foo + bar_offset] <- with foo
>
> Of course, this depends how well GCC optimizes code, but in optimal
> case - difference between loading value using direct pointer or using
> base+offset is a just few cycles. And i don't think that this may
> cause a major speed degradation.

A cycle here, a cycle there, add up to real cycles.
This is the first byte code in intel assembler properly optimized.

L10161:
addl $1, %esi
movzbl (%esi), %ebx
addl $4, %edi
movl _foo, %eax
movl 84(%eax), %eax
movl 4(%eax), %eax
movl %eax, (%edi)
movl 512(%esp,%ebx,4), %eax
L10421:
jmp *%eax

less than optimal compiles can result in 12 instructions, 9 versus
12 instructions does equal a difference in real physical time.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Igor Stasenko

Re: Switching to use foo struct on Windows VM

On 15/07/07, John M McIntosh <[hidden email]> wrote:

>
> On Jul 15, 2007, at 11:55 AM, sig wrote:
> > Everywhere when some method uses foo struct, generator places
> > following line in function:
> > register struct foo * foo = &fum;
>
> I believe we only generate that if the foo structure was used in the
> routine more than once.
> On powerpc this was a clue that the structure pointer should be in a
> register which gain us some performance
> in earlier versions of GC. In later GCC compilers it seems they
> ignore the register hint now. I once tried to use
> the GCC global register hint, which worked quite well, but was
> fraught with issues if all the plugins were not
> recompiled and if foo was not setup before anyone invoked a interp.c
> routine as part of VM setup.
>

>
> >
> > and then uses everywhere foo->bar.
> > So, the difference in compiled code when using foo struct or not is
> > minimal:
> >
> > mov reg, [bar] <- using globals
> > mov reg, [foo + bar_offset] <- with foo
> >
> > Of course, this depends how well GCC optimizes code, but in optimal
> > case - difference between loading value using direct pointer or using
> > base+offset is a just few cycles. And i don't think that this may
> > cause a major speed degradation.
>
> A cycle here, a cycle there, add up to real cycles.
> This is the first byte code in intel assembler properly optimized.
>
> L10161:
> addl $1, %esi
> movzbl (%esi), %ebx
> addl $4, %edi
> movl _foo, %eax
> movl 84(%eax), %eax
> movl 4(%eax), %eax
> movl %eax, (%edi)
> movl 512(%esp,%ebx,4), %eax
> L10421:
> jmp *%eax
>
>
> less than optimal compiles can result in 12 instructions, 9 versus
> 12 instructions does equal a difference in real physical time.
>

While you, people, fighting with different GCC compilers to force them
produce optimal code, my intent is to PROVIDE this optimal code
written by hands and compiled by Exupery. And in my case, if things go
well, example above will prove nothing, because i will be able to
reimplement any VM function (even interpret() ) and have much better
control on how to avoid producing extra jumps/calls.

Bryce Kampjes

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

sig writes:
> On 15/07/07, Bert Freudenberg <[hidden email]> wrote:
> Everywhere when some method uses foo struct, generator places
> following line in function:
> register struct foo * foo = &fum;
>
> and then uses everywhere foo->bar.
> So, the difference in compiled code when using foo struct or not is minimal:
>
> mov reg, [bar] <- using globals
> mov reg, [foo + bar_offset] <- with foo
>
> Of course, this depends how well GCC optimizes code, but in optimal
> case - difference between loading value using direct pointer or using
> base+offset is a just few cycles. And i don't think that this may
> cause a major speed degradation.

The cost is to be efficient you need to use a register to
hold foo. The x86 is register starved with only 6 or 7 registers
available.

It's so bad that people will commonly compile with a compiler flag to
free up the frame pointer which makes debugging much harder as the
debugger can no longer reliably find the stack. This frees up 1
register which can provide a 20% performance improvement.

Bryce

Bryce Kampjes

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

sig writes:
> Another point why i'd prefer to use a single struct (call it foo, or
> anything else) for interpreter globals, is to encapsulate all global
> values in single place:
> - VM variables
> - pointers to VM functions.
>
> And in generated code use foo->bar for values, and foo->bar(...) for
> function calls.
>
> This will give me ability to replace a function pointer with own code
> on the fly in running VM, without recompiling code at all.
> And moreover, this eliminates the need in having InterpreterProxy
> variable for each plugin.

There are two separate questions here:
* Should you be able to always use foo?
* Should other people be able to not use foo?

In my opinion the ideal answer is yes to both questions.

Bryce

johnmci

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

> While you, people, fighting with different GCC compilers to force them
> produce optimal code, my intent is to PROVIDE this optimal code
> written by hands and compiled by Exupery. And in my case, if things go
> well, example above will prove nothing, because i will be able to
> reimplement any VM function (even interpret() ) and have much better
> control on how to avoid producing extra jumps/calls.

Well sure all you need to do is take

pushReceiverVariableBytecode

self fetchNextBytecode.
"this bytecode will be expanded so that refs to currentBytecode
below will be constant"
self pushReceiverVariable: (currentBytecode bitAnd: 16rF).

which requires all these routines

fetchNextBytecode
"This method fetches the next instruction (bytecode). Each bytecode
method is responsible for fetching the next bytecode, preferably as
early as possible to allow the memory system time to process the
request before the next dispatch."

currentBytecode := self fetchByte.

fetchByte
"This method uses the preIncrement builtin function which has no
Smalltalk equivalent. Thus, it must be overridden in the simulator."

^ self byteAtPointer: localIP preIncrement

pushReceiverVariable: fieldIndex

self internalPush:
(self fetchPointer: fieldIndex ofObject: receiver).

fetchPointer: fieldIndex ofObject: oop
"index by word size, and return a pointer as long as the word size"

^ self longAt: oop + BaseHeaderSize + (fieldIndex << ShiftForWord)

internalPush: object

self longAtPointer: (localSP := localSP + BytesPerWord) put: object.

longAtPointer: pointer put: longValue
"This gets implemented by Macros in C, where its types will also be
checked.
pointer is a raw address, and longValue is the width of a machine
word."

^ self longAt: pointer put: longValue

which SLANG mushes into

CASE(0)
/* pushReceiverVariableBytecode */
{
/* begin fetchNextBytecode */
currentBytecode = byteAtPointer(++localIP);
/* begin pushReceiverVariable: */
/* begin internalPush: */
longAtPointerput(localSP += BytesPerWord, longAt((foo->receiver +
BaseHeaderSize) + ((0 & 15) << ShiftForWord)));
}

Then provide proper assembler for Intel (AMD/variations), powerpc,
Risc, unknown. Although you can argue you could ignore 10% or less of
the
population and just do intel, but the compiler and instruction
purists would argue not all intel like CPUS like the same sequence of
instruction mixes.

LIkely of course hand coded assembler *might* be better, although I
think people now seem to think with multiple execution unit hardware and
smarter compilers that statement is becoming difficult to prove.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Igor Stasenko

Re: Switching to use foo struct on Windows VM

In reply to this post by Bryce Kampjes

On 16/07/07, [hidden email] <[hidden email]> wrote:

> sig writes:
> > Another point why i'd prefer to use a single struct (call it foo, or
> > anything else) for interpreter globals, is to encapsulate all global
> > values in single place:
> > - VM variables
> > - pointers to VM functions.
> >
> > And in generated code use foo->bar for values, and foo->bar(...) for
> > function calls.
> >
> > This will give me ability to replace a function pointer with own code
> > on the fly in running VM, without recompiling code at all.
> > And moreover, this eliminates the need in having InterpreterProxy
> > variable for each plugin.
>
> There are two separate questions here:
> * Should you be able to always use foo?
> * Should other people be able to not use foo?
>
> In my opinion the ideal answer is yes to both questions.
>

This depends on how well VM infrastructure organized.
In ideal situation, there must be a single global variable
static foo * VM.
This variable can be a pointer to foo struct or simply a value -
depends on if you want to be able switching between different
interpreters using single executable, as someone suggested.
In current code, foo always assigned to &fum , so its not possible to
switch between different VM's. And semantically using foo->bar is the
same as using fum.bar.
All plugins using InterpreterProxy, and already calling VM functions
indirectly.
I see no big harm to make VM behave similar - call it's functions indirectly.

> Bryce
>
>

Igor Stasenko

Re: Switching to use foo struct on Windows VM

In reply to this post by Bryce Kampjes

On 16/07/07, [hidden email] <[hidden email]> wrote:

I changed code to generate indirect calls everywheren in interp.c.
See results:

1 tinyBenchmarks
direct calls:
'120640904 bytecodes/sec; 3180012 sends/sec'
'118518518 bytecodes/sec; 3260940 sends/sec'
'119962511 bytecodes/sec; 3253634 sends/sec'
'119180633 bytecodes/sec; 3227123 sends/sec'
'117323556 bytecodes/sec; 3227123 sends/sec'

indirect calls:
'119626168 bytecodes/sec; 3263383 sends/sec'
'118848653 bytecodes/sec; 3219968 sends/sec'
'118408880 bytecodes/sec; 3305475 sends/sec'
'118628359 bytecodes/sec; 3441245 sends/sec'
'117972350 bytecodes/sec; 3273190 sends/sec'

As you suggested yearly, the main bottleneck is branch mispredicting.
As you can see benchmarks results, difference lies in error bounds. It
may be slower than direct calls ( by 1/1000 maybe).
At least on my AMD Athlon 1.1 Ghz i see no reason, why i must
sacrifice having VM with ability to replace different functions at run
time for 1/1000 speed boost.

timrowledge

Re: Switching to use foo struct on Windows VM

In reply to this post by johnmci

>
It also makes a significant improvement on ARM machines; y'know, the
*other* 50% of all the 32bit cpus in the world (or thereabouts) such
as pretty much every cellphone, fax, router, camera and, oh yes the
iPhone.

I couldn't care less how silly the name seems. If it bothers you that
much then change it.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Full of wisdumb.

timrowledge

Re: Switching to use foo struct on Windows VM

In reply to this post by Igor Stasenko

On 15-Jul-07, at 11:55 AM, sig wrote:

>
> The only platform , which uses another level of indirection is RiscOS
> (which passes
> 'globalStructDefined: false' to CCodeGeneratorGlobalStructure).
> when globalStructDefined: false, it not generates a line in each
> function (register struct foo * foo = &fum;) and uses foo directly (it
> seems that 'foo' declared somewhere in platform code, because
> CCodeGeneratorGlobalStructure omits declaration of foo, when
> globalStructDefined: false).
The ARM compiler makes it nice and easy to declare global register
variables and foo is so declared. It means that all those globals are
accessible by a nice simple
LDR val, [foo, #offsetforval]
instead of
LDR val, [stackframe base, #offset1]
LDR val,[val, #offsetforval]
which also gets replicated for stores.

The idea for global register variables was (so far as I know) another
bit of genius from Eliot; he had been faking it by spoofing the SUN
compiler and since I couldn't be bothered to try the same trickery on
the ARM cc I spoke to the guys at ARM that wrote the compiler and
persuaded them to add the facility as a proper pragma. IIRC it was
worth about 30% performance back in 1988 on a 12MHz ARM3 system. At
some later date I believe Eliot was able to persuade the gcc people
to add a similar capability.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
If it was easy, the hardware people would take care of it.

Bryce Kampjes

Re: Switching to use foo struct on Windows VM

In reply to this post by timrowledge

tim Rowledge writes:
> >
> It also makes a significant improvement on ARM machines; y'know, the
> *other* 50% of all the 32bit cpus in the world (or thereabouts) such
> as pretty much every cellphone, fax, router, camera and, oh yes the
> iPhone.

And gaining fast if only because x86 is slowly moving to 64 bit.

Bryce