I tried to introduce VM pointers table for use by Exupery, but found
that there's no common way for adding this code because all platforms, except win32 using foo struct for globals. I investigated how easy to patch win32 VM for using foo struct and found that there are little places to change in platform-specific code. So i decided to make patch. 1 tinyBenchmarks using old VM: '118518518 bytecodes/sec; 3351243 sends/sec' '121673003 bytecodes/sec; 3338403 sends/sec' '121788772 bytecodes/sec; 3335847 sends/sec' '122020972 bytecodes/sec; 3323125 sends/sec' using VM with foo struct '121327014 bytecodes/sec; 3387727 sends/sec' '122020972 bytecodes/sec; 3379842 sends/sec' '120075046 bytecodes/sec; 3536215 sends/sec' '120640904 bytecodes/sec; 3335847 sends/sec' benchmark shows no noticeable difference using foo struct or not. Maybe this is bad benchmark for this case.. Please , let me know, if my patch is acceptable, from this depends the way how i implement VM pointers table. :) |
added as issue 0006561 on mantis
|
In reply to this post by Igor Stasenko
sig wrote:
> I tried to introduce VM pointers table for use by Exupery, but found > that there's no common way for adding this code because all platforms, > except win32 using foo struct for globals. Can you say what the requirements for this patch are? E.g., why exactly does it matter if the VM is compiled with struct foo or not? > benchmark shows no noticeable difference using foo struct or not. > Maybe this is bad benchmark for this case.. This result is quite surprising. When John originally introduced this option, x86 was significantly slower when compiling with than without it. As a matter of fact, given that probably some 90+% of all Squeak platforms are now x86 I was thinking about removing it altogether (after all, it's just a pointless memory dereferencing which is only advantageous on platforms that don't have direct addressing modes). > Please , let me know, if my patch is acceptable, from this depends the > way how i implement VM pointers table. :) To be blunt, there are two things I don't like about it: First, it introduces the need for another dereferencing in an already register-deprived model. Second, anything containing "struct foo fum" is immediately on my list of things I never want to see in my code. Changing these names to something sensible would make it a lot easier to convince me about the changes. However, I can probably fix up the support code so that it's possible to compile a "struct foo VM", which I presume is your main need. Although, given that a "struct foo VM" will compile trivially without the indirection, it may be easier for you to compile Unix and Mac VMs without the extra indirection. Cheers, - Andreas |
On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote: > This result is quite surprising. When John originally introduced > this option, x86 was significantly slower when compiling with than > without it. As a matter of fact, given that probably some 90+% of > all Squeak platforms are now x86 I was thinking about removing it > altogether (after all, it's just a pointless memory dereferencing > which is only advantageous on platforms that don't have direct > addressing modes). > >> Please , let me know, if my patch is acceptable, from this depends >> the >> way how i implement VM pointers table. :) > > To be blunt, there are two things I don't like about it: First, it > introduces the need for another dereferencing in an already > register-deprived model. Second, anything containing "struct foo > fum" is immediately on my list of things I never want to see in my > code. Changing these names to something sensible would make it a > lot easier to convince me about the changes. Ah, well the history why it was Foo was because I had discovered that under PPC the usage of a structure would remove one instruction for each read or write to a VM memory location. This made a significant change to the performance of the PowerPC VM, if you run 1/3 less instructions you get more work done. I set out one weekend to alter the VM and named the structure Foo as a joke, and then dug deep into SLang to figure out how to change it so that references to global variables would refer to the Foo structure because I really didn't think I was going to be able to change it. However I was successful and left it named Foo as a reminder how well build slang was, oddly no one complained until tonight (took years I note). Also of course I had to make it so that you could build the VM with or without the feature because as Andreas pointed out it did not produce good assembler on the Intel Platform, so getting all that to work was non- trival. Lurking in here also was some comments from people wanting to build VMs for some special purpose CPUS where they would hang all the globals off a single structure pointed to by a register versus having 1000 separate globals, plus a thought about making a VM with multiple VM threads that would only require a register switch to change squeak VM processes. Other notes. (a) Sometimes depending on the compiler version Arrays are, or are not allocated into the structure because of how the compiler feels it should generate the code. Sometimes it does insane things, other times it removed one or two instructions for PowerPC references. This behaviour is tied to the compiler version. Truthfully I've not check this on macintel to see if it makes any difference, likely not. (b) The other few none-foo structure variables are variables initialized to constants, these could have been moved into foo and an initialization routine used to populate them, but work on that never happen. I guess if someone wants to change the foo name then those few initialized variables should be dragged into the structure for completeness as part of the cleanup. A few years back I noticed Ian was compiling the Unix Intel VM with the foo structure and I asked him why? Since I had earlier noted the intel performance degradation. I think Ian said he had checked and there was no longer an issue and there was no harm in compiling with foo for the intel platform. I believe now what happens is because it's declared as struct foo * foo = &fum; you just end up with a reference into the dynamic storage area for the VM with the precomputed offset being the location of the fum and the variable offset. Earlier compilers I guess would first reference the storage area to the pointer, then reference the variable into the structure which gave the poor performance values. Because PowerPC is not yet dead, don't all the game consoles use it? It would not be wise to abandon this feature because today all mainstream platforms are Intel based register-deprived solutions, someday that might change. Well that and PowerPC based macintosh machines likely will still be around for 5 to 7 more years given the historical longevity of macintosh hardware. > However, I can probably fix up the support code so that it's > possible to compile a "struct foo VM", which I presume is your main > need. Although, given that a "struct foo VM" will compile trivially > without the indirection, it may be easier for you to compile Unix > and Mac VMs without the extra indirection. A few years back I changed all the mac support code to avoid referring to foo or fum or interp.c globals directly and use the vm supplied accessors via the interpreterProxy or via interp.c accessor routine. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
> anything containing "struct foo fum"
In case anyone was slow, it was "Fe Fi Fo Fum" and building things (aka standing) on the shoulders of Giants for this minor change to all the VM work that came before. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
Andreas Raab writes:
> sig wrote: > > I tried to introduce VM pointers table for use by Exupery, but found > > that there's no common way for adding this code because all platforms, > > except win32 using foo struct for globals. > > Can you say what the requirements for this patch are? E.g., why exactly > does it matter if the VM is compiled with struct foo or not? The goal is to provide a generic way of getting pointers to the interpreters variables and functions. Exupery needs these because it generates code that does the same thing as the interpreter. Sig needs these as he's interested in allowing low-level programming to be done inside the image. At the moment Exupery has a lot of trivial accessor functions to return the addresses. The problem is you can't put "&foo->activeContext" into a initialiser in C as at compile time C can not know where foo points. Using #returnPrefixFromVariable: to generate the variable accessing code will also allow generated code to work in VM's that use foo or don't use foo. #returnPrefixFromVariable: is called when translating addressOf: for this reason. I'm guessing that the problem could also be solved by generating accessors the way that your #addressOf: operation does. > > benchmark shows no noticeable difference using foo struct or not. > > Maybe this is bad benchmark for this case.. > > This result is quite surprising. When John originally introduced this > option, x86 was significantly slower when compiling with than without > it. As a matter of fact, given that probably some 90+% of all Squeak > platforms are now x86 I was thinking about removing it altogether (after > all, it's just a pointless memory dereferencing which is only > advantageous on platforms that don't have direct addressing modes). Low level performance is getting more complex as it gets faster. The interpreter does not execute many instructions per clock (sorry, I don't have the numbers handy and they will change depending on architecture). Given how low the instructions per clock is adding extra work to the interpreter doesn't matter so long as the extra work stays inside the delays (probably branch misspredicts) that are currently limiting the interpreters speed. That's the magic of out of order execution. I'd guess that on slower in-order x86 CPUs using foo will have more of an adverse impact on performance. And having foo is likely to be most important on slower CPUs including ARMs in phones/handhelds. Bryce |
In reply to this post by johnmci
On Jul 15, 2007, at 10:51 , John M McIntosh wrote: > > On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote: > >> This result is quite surprising. When John originally introduced >> this option, x86 was significantly slower when compiling with than >> without it. As a matter of fact, given that probably some 90+% of >> all Squeak platforms are now x86 I was thinking about removing it >> altogether (after all, it's just a pointless memory dereferencing >> which is only advantageous on platforms that don't have direct >> addressing modes). >> >>> Please , let me know, if my patch is acceptable, from this >>> depends the >>> way how i implement VM pointers table. :) >> >> To be blunt, there are two things I don't like about it: First, it >> introduces the need for another dereferencing in an already >> register-deprived model. Second, anything containing "struct foo >> fum" is immediately on my list of things I never want to see in my >> code. Changing these names to something sensible would make it a >> lot easier to convince me about the changes. > > Ah, well the history why it was Foo was because I had discovered > that under PPC the usage of a structure would remove one > instruction for each read or write to a VM memory location. This > made a significant change to the performance of the PowerPC VM, if > you run 1/3 less instructions you get more work done. I set out one > weekend to alter the VM and named the structure Foo as a joke, and > then dug deep into SLang to figure out how to change it so that > references to global variables would refer to the Foo structure > because I really didn't think I was going to be able to change it. > However I was successful and left it named Foo as a reminder how > well build slang was, oddly no one complained until tonight (took > years I note). Also of course I had to make it so that you could > build the VM with or without the feature because as Andreas pointed > out it did not produce good assembler on the Intel Platform, so > getting all that to work was non-trival. > > Lurking in here also was some comments from people wanting to build > VMs for some special purpose CPUS where they would hang all the > globals off a single structure pointed to by a register versus > having 1000 separate globals, plus a thought about making a VM with > multiple VM threads that would only require a register switch to > change squeak VM processes. > > Other notes. > > (a) Sometimes depending on the compiler version Arrays are, or are > not allocated into the structure because of how the compiler feels > it should generate the code. Sometimes it does insane things, > other times it removed one or two instructions for PowerPC > references. This behaviour is tied to the compiler version. > Truthfully I've not check this on macintel to see if it makes any > difference, likely not. > > (b) The other few none-foo structure variables are variables > initialized to constants, these could have been moved into foo and > an initialization routine used to populate them, but work on that > never happen. I guess if someone wants to change the foo name then > those few initialized variables should be dragged into the > structure for completeness as part of the cleanup. > > > A few years back I noticed Ian was compiling the Unix Intel VM with > the foo structure and I asked him why? Since I had earlier noted > the intel performance degradation. I think Ian said he had checked > and there was no longer an issue and there was no harm in compiling > with foo for the intel platform. I believe now what happens is > because it's declared as struct foo * foo = &fum; you just end up > with a reference into the dynamic storage area for the VM with the > precomputed offset being the location of the fum and the variable > offset. Earlier compilers I guess would first reference the storage > area to the pointer, then reference the variable into the structure > which gave the poor performance values. > > Because PowerPC is not yet dead, don't all the game consoles use > it? It would not be wise to abandon this feature because today all > mainstream platforms are Intel based register-deprived solutions, > someday that might change. > Well that and PowerPC based macintosh machines likely will still be > around for 5 to 7 more years given the historical longevity of > macintosh hardware. > > >> However, I can probably fix up the support code so that it's >> possible to compile a "struct foo VM", which I presume is your >> main need. Although, given that a "struct foo VM" will compile >> trivially without the indirection, it may be easier for you to >> compile Unix and Mac VMs without the extra indirection. > > > A few years back I changed all the mac support code to avoid > referring to foo or fum or interp.c globals directly and use the vm > supplied accessors via the interpreterProxy or via interp.c > accessor routine. Wonder how that would affect the AMD Geode, which is a not-so-modern x86 processor, but still quite important for Squeak. Once we get a Geode LX we need to seriously measure performance ... what magic bit do I need to flip to disable/enable foo fum? - Bert - |
MacOSPowerPCOS9VMMaker>>createCodeGenerator
"set up a CCodeGenerator for this VMMaker - Mac OS uses the global struct and local def of the structure" ^CCodeGeneratorGlobalStructure new initialize; globalStructDefined: true overides VMMaker>>createCodeGenerator "set up a CCodeGenerator for this VMMaker" ^CCodeGenerator new initialize This override happens for unix, risc, mac, but not for windows which is the VMMakerWithFileCopying/Win32VMMaker subclass structure. > Wonder how that would affect the AMD Geode, which is a not-so- > modern x86 processor, but still quite important for Squeak. Once we > get a Geode LX we need to seriously measure performance ... what > magic bit do I need to flip to disable/enable foo fum? > > - Bert - > > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Bert Freudenberg
On 15/07/07, Bert Freudenberg <[hidden email]> wrote:
> > On Jul 15, 2007, at 10:51 , John M McIntosh wrote: > > > > > On Jul 14, 2007, at 7:45 PM, Andreas Raab wrote: > > > >> This result is quite surprising. When John originally introduced > >> this option, x86 was significantly slower when compiling with than > >> without it. As a matter of fact, given that probably some 90+% of > >> all Squeak platforms are now x86 I was thinking about removing it > >> altogether (after all, it's just a pointless memory dereferencing > >> which is only advantageous on platforms that don't have direct > >> addressing modes). > >> Everywhere when some method uses foo struct, generator places following line in function: register struct foo * foo = &fum; and then uses everywhere foo->bar. So, the difference in compiled code when using foo struct or not is minimal: mov reg, [bar] <- using globals mov reg, [foo + bar_offset] <- with foo Of course, this depends how well GCC optimizes code, but in optimal case - difference between loading value using direct pointer or using base+offset is a just few cycles. And i don't think that this may cause a major speed degradation. The only platform , which uses another level of indirection is RiscOS (which passes 'globalStructDefined: false' to CCodeGeneratorGlobalStructure). when globalStructDefined: false, it not generates a line in each function (register struct foo * foo = &fum;) and uses foo directly (it seems that 'foo' declared somewhere in platform code, because CCodeGeneratorGlobalStructure omits declaration of foo, when globalStructDefined: false). > >>> Please , let me know, if my patch is acceptable, from this > >>> depends the > >>> way how i implement VM pointers table. :) > >> > >> To be blunt, there are two things I don't like about it: First, it > >> introduces the need for another dereferencing in an already > >> register-deprived model. Second, anything containing "struct foo > >> fum" is immediately on my list of things I never want to see in my > >> code. Changing these names to something sensible would make it a > >> lot easier to convince me about the changes. > > > > Ah, well the history why it was Foo was because I had discovered > > that under PPC the usage of a structure would remove one > > instruction for each read or write to a VM memory location. This > > made a significant change to the performance of the PowerPC VM, if > > you run 1/3 less instructions you get more work done. I set out one > > weekend to alter the VM and named the structure Foo as a joke, and > > then dug deep into SLang to figure out how to change it so that > > references to global variables would refer to the Foo structure > > because I really didn't think I was going to be able to change it. > > However I was successful and left it named Foo as a reminder how > > well build slang was, oddly no one complained until tonight (took > > years I note). Also of course I had to make it so that you could > > build the VM with or without the feature because as Andreas pointed > > out it did not produce good assembler on the Intel Platform, so > > getting all that to work was non-trival. > > > > Lurking in here also was some comments from people wanting to build > > VMs for some special purpose CPUS where they would hang all the > > globals off a single structure pointed to by a register versus > > having 1000 separate globals, plus a thought about making a VM with > > multiple VM threads that would only require a register switch to > > change squeak VM processes. > > > > Other notes. > > > > (a) Sometimes depending on the compiler version Arrays are, or are > > not allocated into the structure because of how the compiler feels > > it should generate the code. Sometimes it does insane things, > > other times it removed one or two instructions for PowerPC > > references. This behaviour is tied to the compiler version. > > Truthfully I've not check this on macintel to see if it makes any > > difference, likely not. > > > > (b) The other few none-foo structure variables are variables > > initialized to constants, these could have been moved into foo and > > an initialization routine used to populate them, but work on that > > never happen. I guess if someone wants to change the foo name then > > those few initialized variables should be dragged into the > > structure for completeness as part of the cleanup. > > > > > > A few years back I noticed Ian was compiling the Unix Intel VM with > > the foo structure and I asked him why? Since I had earlier noted > > the intel performance degradation. I think Ian said he had checked > > and there was no longer an issue and there was no harm in compiling > > with foo for the intel platform. I believe now what happens is > > because it's declared as struct foo * foo = &fum; you just end up > > with a reference into the dynamic storage area for the VM with the > > precomputed offset being the location of the fum and the variable > > offset. Earlier compilers I guess would first reference the storage > > area to the pointer, then reference the variable into the structure > > which gave the poor performance values. > > > > Because PowerPC is not yet dead, don't all the game consoles use > > it? It would not be wise to abandon this feature because today all > > mainstream platforms are Intel based register-deprived solutions, > > someday that might change. > > Well that and PowerPC based macintosh machines likely will still be > > around for 5 to 7 more years given the historical longevity of > > macintosh hardware. > > > > > >> However, I can probably fix up the support code so that it's > >> possible to compile a "struct foo VM", which I presume is your > >> main need. Although, given that a "struct foo VM" will compile > >> trivially without the indirection, it may be easier for you to > >> compile Unix and Mac VMs without the extra indirection. > > The situation is simple: i made modifications to VM and all working fine, but only for Win32 platform, because i was not aware that other's using foo struct. Well, i can make things work regardless CCodeGenerator uses foo struct or not. > > > > A few years back I changed all the mac support code to avoid > > referring to foo or fum or interp.c globals directly and use the vm > > supplied accessors via the interpreterProxy or via interp.c > > accessor routine. > > Wonder how that would affect the AMD Geode, which is a not-so-modern > x86 processor, but still quite important for Squeak. Once we get a > Geode LX we need to seriously measure performance ... what magic bit > do I need to flip to disable/enable foo fum? > to use foo, it uses CCodeGeneratorGlobalStructure to use globals - simple CCodeGenerator. I don't think that switching back to globals will introduce problems in generated code which prevent it from building. Event if so, the code will require few fixes. > - Bert - > > > |
In reply to this post by Bryce Kampjes
Another point why i'd prefer to use a single struct (call it foo, or
anything else) for interpreter globals, is to encapsulate all global values in single place: - VM variables - pointers to VM functions. And in generated code use foo->bar for values, and foo->bar(...) for function calls. This will give me ability to replace a function pointer with own code on the fly in running VM, without recompiling code at all. And moreover, this eliminates the need in having InterpreterProxy variable for each plugin. |
In reply to this post by Igor Stasenko
On Jul 15, 2007, at 11:55 AM, sig wrote: > Everywhere when some method uses foo struct, generator places > following line in function: > register struct foo * foo = &fum; I believe we only generate that if the foo structure was used in the routine more than once. On powerpc this was a clue that the structure pointer should be in a register which gain us some performance in earlier versions of GC. In later GCC compilers it seems they ignore the register hint now. I once tried to use the GCC global register hint, which worked quite well, but was fraught with issues if all the plugins were not recompiled and if foo was not setup before anyone invoked a interp.c routine as part of VM setup. > > and then uses everywhere foo->bar. > So, the difference in compiled code when using foo struct or not is > minimal: > > mov reg, [bar] <- using globals > mov reg, [foo + bar_offset] <- with foo > > Of course, this depends how well GCC optimizes code, but in optimal > case - difference between loading value using direct pointer or using > base+offset is a just few cycles. And i don't think that this may > cause a major speed degradation. A cycle here, a cycle there, add up to real cycles. This is the first byte code in intel assembler properly optimized. L10161: addl $1, %esi movzbl (%esi), %ebx addl $4, %edi movl _foo, %eax movl 84(%eax), %eax movl 4(%eax), %eax movl %eax, (%edi) movl 512(%esp,%ebx,4), %eax L10421: jmp *%eax less than optimal compiles can result in 12 instructions, 9 versus 12 instructions does equal a difference in real physical time. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
On 15/07/07, John M McIntosh <[hidden email]> wrote:
> > On Jul 15, 2007, at 11:55 AM, sig wrote: > > Everywhere when some method uses foo struct, generator places > > following line in function: > > register struct foo * foo = &fum; > > I believe we only generate that if the foo structure was used in the > routine more than once. > On powerpc this was a clue that the structure pointer should be in a > register which gain us some performance > in earlier versions of GC. In later GCC compilers it seems they > ignore the register hint now. I once tried to use > the GCC global register hint, which worked quite well, but was > fraught with issues if all the plugins were not > recompiled and if foo was not setup before anyone invoked a interp.c > routine as part of VM setup. > > > > > > and then uses everywhere foo->bar. > > So, the difference in compiled code when using foo struct or not is > > minimal: > > > > mov reg, [bar] <- using globals > > mov reg, [foo + bar_offset] <- with foo > > > > Of course, this depends how well GCC optimizes code, but in optimal > > case - difference between loading value using direct pointer or using > > base+offset is a just few cycles. And i don't think that this may > > cause a major speed degradation. > > A cycle here, a cycle there, add up to real cycles. > This is the first byte code in intel assembler properly optimized. > > L10161: > addl $1, %esi > movzbl (%esi), %ebx > addl $4, %edi > movl _foo, %eax > movl 84(%eax), %eax > movl 4(%eax), %eax > movl %eax, (%edi) > movl 512(%esp,%ebx,4), %eax > L10421: > jmp *%eax > > > less than optimal compiles can result in 12 instructions, 9 versus > 12 instructions does equal a difference in real physical time. > While you, people, fighting with different GCC compilers to force them produce optimal code, my intent is to PROVIDE this optimal code written by hands and compiled by Exupery. And in my case, if things go well, example above will prove nothing, because i will be able to reimplement any VM function (even interpret() ) and have much better control on how to avoid producing extra jumps/calls. |
In reply to this post by Igor Stasenko
sig writes:
> On 15/07/07, Bert Freudenberg <[hidden email]> wrote: > Everywhere when some method uses foo struct, generator places > following line in function: > register struct foo * foo = &fum; > > and then uses everywhere foo->bar. > So, the difference in compiled code when using foo struct or not is minimal: > > mov reg, [bar] <- using globals > mov reg, [foo + bar_offset] <- with foo > > Of course, this depends how well GCC optimizes code, but in optimal > case - difference between loading value using direct pointer or using > base+offset is a just few cycles. And i don't think that this may > cause a major speed degradation. The cost is to be efficient you need to use a register to hold foo. The x86 is register starved with only 6 or 7 registers available. It's so bad that people will commonly compile with a compiler flag to free up the frame pointer which makes debugging much harder as the debugger can no longer reliably find the stack. This frees up 1 register which can provide a 20% performance improvement. Bryce |
In reply to this post by Igor Stasenko
sig writes:
> Another point why i'd prefer to use a single struct (call it foo, or > anything else) for interpreter globals, is to encapsulate all global > values in single place: > - VM variables > - pointers to VM functions. > > And in generated code use foo->bar for values, and foo->bar(...) for > function calls. > > This will give me ability to replace a function pointer with own code > on the fly in running VM, without recompiling code at all. > And moreover, this eliminates the need in having InterpreterProxy > variable for each plugin. There are two separate questions here: * Should you be able to always use foo? * Should other people be able to not use foo? In my opinion the ideal answer is yes to both questions. Bryce |
In reply to this post by Igor Stasenko
> While you, people, fighting with different GCC compilers to force them > produce optimal code, my intent is to PROVIDE this optimal code > written by hands and compiled by Exupery. And in my case, if things go > well, example above will prove nothing, because i will be able to > reimplement any VM function (even interpret() ) and have much better > control on how to avoid producing extra jumps/calls. Well sure all you need to do is take pushReceiverVariableBytecode self fetchNextBytecode. "this bytecode will be expanded so that refs to currentBytecode below will be constant" self pushReceiverVariable: (currentBytecode bitAnd: 16rF). which requires all these routines fetchNextBytecode "This method fetches the next instruction (bytecode). Each bytecode method is responsible for fetching the next bytecode, preferably as early as possible to allow the memory system time to process the request before the next dispatch." currentBytecode := self fetchByte. fetchByte "This method uses the preIncrement builtin function which has no Smalltalk equivalent. Thus, it must be overridden in the simulator." ^ self byteAtPointer: localIP preIncrement pushReceiverVariable: fieldIndex self internalPush: (self fetchPointer: fieldIndex ofObject: receiver). fetchPointer: fieldIndex ofObject: oop "index by word size, and return a pointer as long as the word size" ^ self longAt: oop + BaseHeaderSize + (fieldIndex << ShiftForWord) internalPush: object self longAtPointer: (localSP := localSP + BytesPerWord) put: object. longAtPointer: pointer put: longValue "This gets implemented by Macros in C, where its types will also be checked. pointer is a raw address, and longValue is the width of a machine word." ^ self longAt: pointer put: longValue which SLANG mushes into CASE(0) /* pushReceiverVariableBytecode */ { /* begin fetchNextBytecode */ currentBytecode = byteAtPointer(++localIP); /* begin pushReceiverVariable: */ /* begin internalPush: */ longAtPointerput(localSP += BytesPerWord, longAt((foo->receiver + BaseHeaderSize) + ((0 & 15) << ShiftForWord))); } Then provide proper assembler for Intel (AMD/variations), powerpc, Risc, unknown. Although you can argue you could ignore 10% or less of the population and just do intel, but the compiler and instruction purists would argue not all intel like CPUS like the same sequence of instruction mixes. LIkely of course hand coded assembler *might* be better, although I think people now seem to think with multiple execution unit hardware and smarter compilers that statement is becoming difficult to prove. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Bryce Kampjes
On 16/07/07, [hidden email] <[hidden email]> wrote:
> sig writes: > > Another point why i'd prefer to use a single struct (call it foo, or > > anything else) for interpreter globals, is to encapsulate all global > > values in single place: > > - VM variables > > - pointers to VM functions. > > > > And in generated code use foo->bar for values, and foo->bar(...) for > > function calls. > > > > This will give me ability to replace a function pointer with own code > > on the fly in running VM, without recompiling code at all. > > And moreover, this eliminates the need in having InterpreterProxy > > variable for each plugin. > > There are two separate questions here: > * Should you be able to always use foo? > * Should other people be able to not use foo? > > In my opinion the ideal answer is yes to both questions. > In ideal situation, there must be a single global variable static foo * VM. This variable can be a pointer to foo struct or simply a value - depends on if you want to be able switching between different interpreters using single executable, as someone suggested. In current code, foo always assigned to &fum , so its not possible to switch between different VM's. And semantically using foo->bar is the same as using fum.bar. All plugins using InterpreterProxy, and already calling VM functions indirectly. I see no big harm to make VM behave similar - call it's functions indirectly. > Bryce > > |
In reply to this post by Bryce Kampjes
On 16/07/07, [hidden email] <[hidden email]> wrote:
> sig writes: > > Another point why i'd prefer to use a single struct (call it foo, or > > anything else) for interpreter globals, is to encapsulate all global > > values in single place: > > - VM variables > > - pointers to VM functions. > > > > And in generated code use foo->bar for values, and foo->bar(...) for > > function calls. > > > > This will give me ability to replace a function pointer with own code > > on the fly in running VM, without recompiling code at all. > > And moreover, this eliminates the need in having InterpreterProxy > > variable for each plugin. > I changed code to generate indirect calls everywheren in interp.c. See results: 1 tinyBenchmarks direct calls: '120640904 bytecodes/sec; 3180012 sends/sec' '118518518 bytecodes/sec; 3260940 sends/sec' '119962511 bytecodes/sec; 3253634 sends/sec' '119180633 bytecodes/sec; 3227123 sends/sec' '117323556 bytecodes/sec; 3227123 sends/sec' indirect calls: '119626168 bytecodes/sec; 3263383 sends/sec' '118848653 bytecodes/sec; 3219968 sends/sec' '118408880 bytecodes/sec; 3305475 sends/sec' '118628359 bytecodes/sec; 3441245 sends/sec' '117972350 bytecodes/sec; 3273190 sends/sec' As you suggested yearly, the main bottleneck is branch mispredicting. As you can see benchmarks results, difference lies in error bounds. It may be slower than direct calls ( by 1/1000 maybe). At least on my AMD Athlon 1.1 Ghz i see no reason, why i must sacrifice having VM with ability to replace different functions at run time for 1/1000 speed boost. |
In reply to this post by johnmci
>
It also makes a significant improvement on ARM machines; y'know, the *other* 50% of all the 32bit cpus in the world (or thereabouts) such as pretty much every cellphone, fax, router, camera and, oh yes the iPhone. I couldn't care less how silly the name seems. If it bothers you that much then change it. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- Full of wisdumb. |
In reply to this post by Igor Stasenko
On 15-Jul-07, at 11:55 AM, sig wrote: > > The only platform , which uses another level of indirection is RiscOS > (which passes > 'globalStructDefined: false' to CCodeGeneratorGlobalStructure). > when globalStructDefined: false, it not generates a line in each > function (register struct foo * foo = &fum;) and uses foo directly (it > seems that 'foo' declared somewhere in platform code, because > CCodeGeneratorGlobalStructure omits declaration of foo, when > globalStructDefined: false). The ARM compiler makes it nice and easy to declare global register variables and foo is so declared. It means that all those globals are accessible by a nice simple LDR val, [foo, #offsetforval] instead of LDR val, [stackframe base, #offset1] LDR val,[val, #offsetforval] which also gets replicated for stores. The idea for global register variables was (so far as I know) another bit of genius from Eliot; he had been faking it by spoofing the SUN compiler and since I couldn't be bothered to try the same trickery on the ARM cc I spoke to the guys at ARM that wrote the compiler and persuaded them to add the facility as a proper pragma. IIRC it was worth about 30% performance back in 1988 on a 12MHz ARM3 system. At some later date I believe Eliot was able to persuade the gcc people to add a similar capability. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim If it was easy, the hardware people would take care of it. |
In reply to this post by timrowledge
tim Rowledge writes:
> > > It also makes a significant improvement on ARM machines; y'know, the > *other* 50% of all the 32bit cpus in the world (or thereabouts) such > as pretty much every cellphone, fax, router, camera and, oh yes the > iPhone. And gaining fast if only because x86 is slowly moving to 64 bit. Bryce |
Free forum by Nabble | Edit this page |