Administrator
|
We often talk about making the VM faster. How about making it slower? In 1980, there were some optimizations that were needed for Smalltalk to be even usable, but now:
- Moore's Law has theoretically given us 131072 more computing power (2^((2014-1980)/2)) - Cog runs up to 3x slower than C [1] - Ruby, which is widely accepted, seems to be much slower than Cog [2] For example, inlined functions can be baffling for new users. I just ran into this myself when writing an #ifNil:ifNotNil: that was not picked up by the system [3], and Ungar and Smith describe several cases in the History of Self (pg. 9-5). How many of these are premature optimizations that can be eliminated, or at least turned off by default until they're actually needed? I know Clement mentioned in [3] that some make a big difference, but it would certainly make the system more uniform and easy to understand. [1] http://lists.gforge.inria.fr/pipermail/pharo-project/2011-February/042489.html [2] http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=yarv&lang2=gcc [3] https://www.mail-archive.com/pharo-dev@lists.pharo.org/msg11694.html
Cheers,
Sean |
Hello Sean, That's true that the ruby interpreter and CPython are around 20x slower that Cog. Now the use cases are different. Firstly, their tool suite is not written in ruby/python so they do not need speed to have a good IDE. For example, see the new SqueakJS VM, as it is slower Morphic is hardly usable, therefore they had to fall back on the old and fast MVC UI. We do not want to have to do that in Pharo/Squeak.
In addition, Ruby/python work well due to their good integration with C, because a ruby/python programmer needs to bind its performance critical methods to C methods. In most cases, we do not bing performance critical method in Pharo/Squeak with C method because we don't need to, and I don't think we want to do that.
So I wouldn't say that we can have pharo/squeak running 20x slower and still be happy. One thing that you didn't mention is the Stack VM. This interpreter based VM is less efficient than Cog (2x-10x slower) but much more flexible IMO. For example, overriding each message send interpretation or adding new byte codes is quite easy. So somehow we have already a slower VM more flexible.
In addition, the Opal's compiler options allows to disable some optimized constructs as you mention, but this is static infos with pragmas. Disabling conditions inlining would decrease by 2.5x the performance of Pharo/Squeak according to Urs hozle phd, but recent attempts showed that the speed problem is even worst due to the fact the kernel was optimized knowing these constructs where inlined.
Solution for this problem As you may have seen, in the Self VM, they do not have these optimized constructs but not because they are slower but because they have an adaptive recompiler. Currently I am working with Eliot on speculative inlining and different optimizations for the Cog. You can see a description of the project here: http://clementbera.wordpress.com/2014/01/09/the-sista-chronicles-i-an-introduction-to-adaptive-recompilation/ . I wrote that article quickly so there might be some typos and English errors but the overall should be OK. This is a big project, so we will have a production ready result in several months, perhaps even in a few years.
This will allow to both increase Cog's performance and reduce the code complexity due to optimizations with inlined constructs. Precise solutions needs to be discussed and benchmarked, be we could have, as their performance impact will be lowered:
- ifNil:/ifNotNil: not inlined. - all the specific messages as regular message sends in all cases (including #==): #(#+ 1 #- 1 #< 1 #> 1 #<= 1 #>= 1 #= 1 #~= 1 #* 1 #/ 1 #\\ 1 #@ 1 #bitShift: 1 #// 1 #bitAnd: 1 #bitOr: 1 #at: 1 #at:put: 2 #size 0 #next 0 #nextPut: 1 #atEnd 0 #== 1 nil 0 #blockCopy: 1 #value 0 #value: 1 #do: 1 #new 0 #new: 1 #x 0 #y 0)
ifTrue:ifFalse: is the most complex case. I know Eliot has a plan for it. You can look at the video at the bottom of the sista article where in the end Eliot explains AoSta (the ancestor of sista) and he mentions somethings about mustBeBoolean.
Best, Clément 2014-02-09 5:37 GMT+01:00 Sean P. DeNigris <[hidden email]>:
|
In reply to this post by Sean P. DeNigris
Sean P. DeNigris wrote: > > We often talk about making the VM faster. How about making it slower? In > 1980, there were some optimizations that were needed for Smalltalk to be > even usable, but now: > - Moore's Law has theoretically given us 131072 more computing power > (2^((2014-1980)/2)) > - Cog runs up to 3x slower than C [1] > - Ruby, which is widely accepted, seems to be much slower than Cog [2] > > For example, inlined functions can be baffling for new users. Not VM related but it sparks a random idea - how about syntax highlighting inlined messages with a different colour? cheers -ben > I just ran > into this myself when writing an #ifNil:ifNotNil: that was not picked up by > the system [3], and Ungar and Smith describe several cases in the History of > Self (pg. 9-5). > > How many of these are premature optimizations that can be eliminated, or at > least turned off by default until they're actually needed? I know Clement > mentioned in [3] that some make a big difference, but it would certainly > make the system more uniform and easy to understand. > > [1] > http://lists.gforge.inria.fr/pipermail/pharo-project/2011-February/042489.html > [2] > http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=yarv&lang2=gcc > [3] https://www.mail-archive.com/pharo-dev@.../msg11694.html > > > > ----- > Cheers, > Sean > -- > View this message in context: http://forum.world.st/Making-a-Slower-VM-tp4742391.html > Sent from the Squeak VM mailing list archive at Nabble.com. > > |
In reply to this post by Sean P. DeNigris
On 09.02.2014, at 05:37, Sean P. DeNigris <[hidden email]> wrote: > We often talk about making the VM faster. How about making it slower? In > 1980, there were some optimizations that were needed for Smalltalk to be > even usable, but now: > - Moore's Law has theoretically given us 131072 more computing power > (2^((2014-1980)/2)) > - Cog runs up to 3x slower than C [1] > - Ruby, which is widely accepted, seems to be much slower than Cog [2] > > For example, inlined functions can be baffling for new users. I just ran > into this myself when writing an #ifNil:ifNotNil: that was not picked up by > the system [3], and Ungar and Smith describe several cases in the History of > Self (pg. 9-5). > > How many of these are premature optimizations that can be eliminated, or at > least turned off by default until they're actually needed? I know Clement > mentioned in [3] that some make a big difference, but it would certainly > make the system more uniform and easy to understand. If the VM encounters an #ifNil:ifNotNil: send, it will faithfully do a method lookup and execute that. It will even do that if it sees an #ifTrue:. There is no short-circuiting of actual message sends in the VM. What *does* happen is that the Compiler replaces an #ifNil:ifNotNil: send with "== nil ifTrue:ifFalse:" and then compiles the latter into jump bytecodes. That means the VM never sees the original #ifNil:ifNotNil: message. It is pretty simple to turn off the Compiler's inlining of ifNil:ifNotNil:. It should also be pretty simple to make ifTrue:/ifFalse: be actual message sends, although I would expect a pretty big slow down since it will need real blocks. But at least their Smalltalk implementation is "executable". It's harder for whileTrue:/whileFalse: because if you wanted to implement them with real messages you would need tail call optimization, which Smalltalk VMs don't ususally do. Hence the implementation in the image that relies on compiler inlining. - Bert - smime.p7s (5K) Download Attachment |
On Sun, Feb 9, 2014 at 9:08 AM, Bert Freudenberg <[hidden email]> wrote: It is pretty simple to turn off the Compiler's inlining of ifNil:ifNotNil:. It should also be pretty simple to make ifTrue:/ifFalse: be actual message sends, although I would expect a pretty big slow down since it will need real blocks. But at least their Smalltalk implementation is "executable". It's harder for whileTrue:/whileFalse: because if you wanted to implement them with real messages you would need tail call optimization, which Smalltalk VMs don't ususally do. Hence the implementation in the image that relies on compiler inlining. Well, since we're talking about de-optimizing here, you *could* do #whileTrue: without optimizing tail calls. It's just that it would be really slow, especially if you wanted to guard against run-away memory use for loops with lots of iterations. If you want to make things slower, the sky's the limit!
Colin |
In reply to this post by Sean P. DeNigris
On Sat, Feb 08, 2014 at 08:37:46PM -0800, Sean P. DeNigris wrote: > > We often talk about making the VM faster. How about making it slower? We do not usually get too many requests to make the VM slower, what a refreshing change of perspective ;-) http://www.ispot.tv/ad/Y94D/xfinity-internet-traffic-featuring-bill-and-karolyn-slowsky Joking aside, there actually is one legitimate reason for wanting a slow VM. With high performance VMs and with ever faster hardware, it is very easy to implement sloppy things in the image that go unnoticed until someone runs the image on an old machine or on limited hardware. It is sometimes useful to test on old hardware or on a slow VM to check for this. I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results. Dave |
In reply to this post by Colin Putney-3
On 09.02.2014, at 17:49, Colin Putney <[hidden email]> wrote:
True. But in any case you would have to touch the implementation, otherwise you just get an infinite recursion :) - Bert - smime.p7s (5K) Download Attachment |
In reply to this post by David T. Lewis
On 09-02-2014, at 10:07 AM, David T. Lewis <[hidden email]> wrote: > Joking aside, there actually is one legitimate reason for wanting a slow VM. > With high performance VMs and with ever faster hardware, it is very easy to > implement sloppy things in the image that go unnoticed until someone runs the > image on an old machine or on limited hardware. It is sometimes useful to > test on old hardware or on a slow VM to check for this. The cheapest and easiest way to do it these days is to buy a Raspberry Pi. You’ll learn very quickly where you have used crappy algorithms or poor technique… though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn’t send every pixel to the screen via Deep Space Network to the relay on Sedna. > I think someone mentioned it earlier, but a very easy way to produce an > intentionally slow VM is to generate the sources from VMMaker with the > inlining step disabled. The slang inliner is extremely effective, and turning > it off produces impressively sluggish results. Does that actually work these days? Last I remember was that turning inlining off wouldn’t produce a buildable interp.c file. If someone has had the patience to make it work then I’m impressed. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: SDLI: Shift Disk Left Immediate |
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote: > > > On 09-02-2014, at 10:07 AM, David T. Lewis <[hidden email]> wrote: > > Joking aside, there actually is one legitimate reason for wanting a slow VM. > > With high performance VMs and with ever faster hardware, it is very easy to > > implement sloppy things in the image that go unnoticed until someone runs the > > image on an old machine or on limited hardware. It is sometimes useful to > > test on old hardware or on a slow VM to check for this. > > The cheapest and easiest way to do it these days is to buy a Raspberry Pi. You?ll learn very quickly where you have used crappy algorithms or poor technique? though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn?t send every pixel to the screen via Deep Space Network to the relay on Sedna. > > > I think someone mentioned it earlier, but a very easy way to produce an > > intentionally slow VM is to generate the sources from VMMaker with the > > inlining step disabled. The slang inliner is extremely effective, and turning > > it off produces impressively sluggish results. > > Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed. > Dang it, you're right, it's not working. I guess I have not tried this in a while, though I know that it used to work. Making things go slower seems like a worthwhile thing to do on a Sunday afternoon, so I think I'll see if I can fix it. Dave |
On Sun, Feb 9, 2014 at 11:46 AM, David T. Lewis <[hidden email]> wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP. This optimization absolutely depends on inlining. Which reminds me that anyone who is interested in creating a StackInterpreter or CoInterpreter that *doesn't* use the internal methods and uses only stackPointer, framePointer and instructionPointer would have my full support. I'm very curious to see what the performance of stack+internal vs stack-internal, and cog+internal vs cog-internal will be. I'm hoping that the performance of the -internal versions is good enough that we could eliminate all that duplication.
best, Eliot
|
On 10-02-2014, at 11:53 AM, Eliot Miranda <[hidden email]> wrote: > > I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP. It’s really hard to be sure but I suspect that this isn’t the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added. OK, some quick email searching reveals some work done in ’03 by johnMcI, Craig & me. Craig found the following code helped - !'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'! !Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj. [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: aContext. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx]. ^self push: nilOop! ! !Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 15:33'! interpret "This is the main interpreter loop. It normally loops forever, fetching and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently." "record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable]. localIP _ localIP - 1. "undo the pre-increment of IP before returning" self externalizeIPandSP. ! ! !Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop." | nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120. nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx) = nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e. from a method to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext. "Just possibly a faster test would be to compare the homeContext and activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of #aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: thisCntx. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through: thisCntx]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx. ]. "If we get here there is no unwind to worry about. Simply terminate the stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject: thisCntx. "zap exited contexts so any future attempted use will be caught" self storePointerUnchecked: SenderIndex ofObject: thisCntx withValue: nilOop. self storePointerUnchecked: InstructionPointerIndex ofObject: thisCntx withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount - 1. self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. self internalFetchContextRegisters: thisCntx. "updates local IP and SP" self fetchNextBytecode. self internalPush: localVal. ! ! Shortly after that I released the VMMaker3.6 with a note that it couldn’t produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I can’t remember any context to help extend the history. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks. |
I was looking at the trunk VMM yesterday and found that most of the issues were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention of #setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious. The variable 'memory' is a challenge because it is used extensively both directly and through #memory and #memory:. I was considering changing the variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea. I ran out of time yesterday and did not pursue it beyond this. Dave > > > On 10-02-2014, at 11:53 AM, Eliot Miranda <[hidden email]> wrote: >> >> I *think* the issue is the internal/external split brought abut by the >> introduction of the localFoo variables, such as localSP and localIP. > > Its really hard to be sure but I suspect that this isnt the (only) > issue. IIRC we used to be able to make non-inlined VMs at one point and > that was well after the internalFoo code was added. > > OK, some quick email searching reveals some work done in 03 by johnMcI, > Craig & me. > Craig found the following code helped - > > !'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 > July 2003 at 1:11:25 pm'! > > !Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! > primitiveFindNextUnwindContext > "Primitive. Search up the context stack for the next method context > marked for unwind handling from the receiver up to but not including the > argument. Return nil if none found." > | thisCntx nilOop aContext isUnwindMarked header meth pIndex | > aContext _ self popStack. > thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. > nilOop _ nilObj. > > [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ > > header _ self baseHeader: aContext. > > (self isMethodContextHeader: header) > ifTrue: [ > meth _ self fetchPointer: MethodIndex ofObject: aContext. > pIndex _ self primitiveIndexOf: meth. > isUnwindMarked _ pIndex == 198] > ifFalse: [isUnwindMarked _ false]. > isUnwindMarked ifTrue:[ > self push: thisCntx. > ^nil]. > thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx]. > > ^self push: nilOop! ! > > !Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 15:33'! > interpret > "This is the main interpreter loop. It normally loops forever, fetching > and executing bytecodes. When running in the context of a browser plugin > VM, however, it must return control to the browser periodically. This > should done only when the state of the currently running Squeak thread is > safely stored in the object heap. Since this is the case at the moment > that a check for interrupts is performed, that is when we return to the > browser if it is time to do so. Interrupt checks happen quite > frequently." > > "record entry time when running as a browser plug-in" > "self browserPluginInitialiseIfNeeded" > self internalizeIPandSP. > self fetchNextBytecode. > [true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable]. > localIP _ localIP - 1. "undo the pre-increment of IP before returning" > self externalizeIPandSP. > ! ! > > !Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! > returnValueTo > "Note: Assumed to be inlined into the dispatch loop." > > | nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked > header meth pIndex | > self inline: true. > self sharedCodeNamed: 'commonReturn' inCase: 120. > > nilOop _ nilObj. "keep in a register" > thisCntx _ activeContext. > localCntx _ cntx. > localVal _ val. > > "make sure we can return to the given context" > ((localCntx = nilOop) or: > [(self fetchPointer: InstructionPointerIndex ofObject: localCntx) = > nilOop]) ifTrue: [ > "error: sender's instruction pointer or context is nil; cannot return" > ^self internalCannotReturn: localVal]. > > "If this return is not to our immediate predecessor (i.e. from a method > to its sender, or from a block to its caller), scan the stack for the > first unwind marked context and inform this context and let it deal with > it. This provides a chance for ensure unwinding to occur." > thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext. > > "Just possibly a faster test would be to compare the homeContext and > activeContext - they are of course different for blocks. Thus we might be > able to optimise a touch by having a different returnTo for the > blockreteurn (since we know that must return to caller) and then if > active ~= home we must be doing a non-local return. I think. Maybe." > [thisCntx = localCntx] whileFalse: [ > thisCntx = nilObj ifTrue:[ > "error: sender's instruction pointer or context is nil; cannot return" > ^self internalCannotReturn: localVal]. > "Climb up stack towards localCntx. Break out to a send of > #aboutToReturn:through: if an unwind marked context is found" > header _ self baseHeader: thisCntx. > > (self isMethodContextHeader: header) > ifTrue: [ > meth _ self fetchPointer: MethodIndex ofObject: thisCntx. > pIndex _ self primitiveIndexOf: meth. > isUnwindMarked _ pIndex == 198] > ifFalse: [isUnwindMarked _ false]. > > isUnwindMarked ifTrue:[ > "context is marked; break out" > ^self internalAboutToReturn: localVal through: thisCntx]. > thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx. > ]. > > "If we get here there is no unwind to worry about. Simply terminate the > stack up to the localCntx - often just the sender of the method" > thisCntx _ activeContext. > [thisCntx = localCntx] > whileFalse: > ["climb up stack to localCntx" > contextOfCaller _ self fetchPointer: SenderIndex ofObject: thisCntx. > > "zap exited contexts so any future attempted use will be caught" > self storePointerUnchecked: SenderIndex ofObject: thisCntx withValue: > nilOop. > self storePointerUnchecked: InstructionPointerIndex ofObject: thisCntx > withValue: nilOop. > reclaimableContextCount > 0 ifTrue: > ["try to recycle this context" > reclaimableContextCount _ reclaimableContextCount - 1. > self recycleContextIfPossible: thisCntx]. > thisCntx _ contextOfCaller]. > > activeContext _ thisCntx. > (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. > > self internalFetchContextRegisters: thisCntx. "updates local IP and SP" > self fetchNextBytecode. > self internalPush: localVal. > ! ! > > Shortly after that I released the VMMaker3.6 with a note that it couldnt > produce a completely non-inlined VM because of a problem in fetchByte if > globalstruct was enabled, and some odd problems in B2DPlugin. When > VMMaker3.7 was released a year late (march 04) I apparently thought it > could make the core vm non-inlined. Since this is all a bazillion years > ago I cant remember any context to help extend the history. > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Science is imagination equipped with grappling hooks. > |
Hi David, do you realize that Eliot is (ab)using this in COG in order to eliminate some direct cCode: '...' inclusion?2014-02-10 21:51 GMT+01:00 David T. Lewis <[hidden email]>:
|
In reply to this post by David T. Lewis
On Mon, Feb 10, 2014 at 12:51 PM, David T. Lewis <[hidden email]> wrote:
There's a more convenient hack: memory <cmacro: '() GIV(memory)'>
^memory memory: aValue ^memory := aValue
See above.
best, Eliot
|
In reply to this post by Nicolas Cellier
On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote: > > Hi David, > do you realize that Eliot is (ab)using this in COG in order to eliminate > some direct cCode: '...' inclusion? > So setFoo: is not an option (or i misunderstood something) > Hi Nicolas, Actually I am not sure what you are referring to here, so probably I am missing something. Can you explain why setFoo: would be a problem in Cog? I cannot check it myself right now but I am interested to know if I am missing something important. Thanks, Dave > > 2014-02-10 21:51 GMT+01:00 David T. Lewis <[hidden email]>: > > > > > I was looking at the trunk VMM yesterday and found that most of the issues > > were just caused by accessor methods, where #foo and #foo: generate > > conflicting foo(void) and foo(aParameter). In most cases, a convention of > > #setFoo: rather than #foo: takes care of the problem. There were a few > > other miscellaneous issues as well, but nothing that looked serious. > > > > The variable 'memory' is a challenge because it is used extensively both > > directly and through #memory and #memory:. I was considering changing the > > variable name to something like memoryBase, and leaving the accessors > > alone though I'm not sure that would be a very good idea. > > > > I ran out of time yesterday and did not pursue it beyond this. > > > > Dave > > > > > > > > > > > On 10-02-2014, at 11:53 AM, Eliot Miranda <[hidden email]> > > wrote: > > >> > > >> I *think* the issue is the internal/external split brought abut by the > > >> introduction of the localFoo variables, such as localSP and localIP. > > > > > > It's really hard to be sure but I suspect that this isn't the (only) > > > issue. IIRC we used to be able to make non-inlined VMs at one point and > > > that was well after the internalFoo code was added. > > > > > > OK, some quick email searching reveals some work done in '03 by johnMcI, > > > Craig & me. > > > Craig found the following code helped - > > > > > > !'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 > > > July 2003 at 1:11:25 pm'! > > > > > > !Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! > > > primitiveFindNextUnwindContext > > > "Primitive. Search up the context stack for the next method context > > > marked for unwind handling from the receiver up to but not including the > > > argument. Return nil if none found." > > > | thisCntx nilOop aContext isUnwindMarked header meth pIndex | > > > aContext _ self popStack. > > > thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. > > > nilOop _ nilObj. > > > > > > [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ > > > > > > header _ self baseHeader: aContext. > > > > > > (self isMethodContextHeader: header) > > > ifTrue: [ > > > meth _ self fetchPointer: MethodIndex ofObject: > > aContext. > > > pIndex _ self primitiveIndexOf: meth. > > > isUnwindMarked _ pIndex == 198] > > > ifFalse: [isUnwindMarked _ false]. > > > isUnwindMarked ifTrue:[ > > > self push: thisCntx. > > > ^nil]. > > > thisCntx _ self fetchPointer: SenderIndex ofObject: > > thisCntx]. > > > > > > ^self push: nilOop! ! > > > > > > !Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 > > 15:33'! > > > interpret > > > "This is the main interpreter loop. It normally loops forever, > > fetching > > > and executing bytecodes. When running in the context of a browser plugin > > > VM, however, it must return control to the browser periodically. This > > > should done only when the state of the currently running Squeak thread is > > > safely stored in the object heap. Since this is the case at the moment > > > that a check for interrupts is performed, that is when we return to the > > > browser if it is time to do so. Interrupt checks happen quite > > > frequently." > > > > > > "record entry time when running as a browser plug-in" > > > "self browserPluginInitialiseIfNeeded" > > > self internalizeIPandSP. > > > self fetchNextBytecode. > > > [true] whileTrue: [self dispatchOn: currentBytecode in: > > BytecodeTable]. > > > localIP _ localIP - 1. "undo the pre-increment of IP before > > returning" > > > self externalizeIPandSP. > > > ! ! > > > > > > !Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! > > > returnValueTo > > > "Note: Assumed to be inlined into the dispatch loop." > > > > > > | nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked > > > header meth pIndex | > > > self inline: true. > > > self sharedCodeNamed: 'commonReturn' inCase: 120. > > > > > > nilOop _ nilObj. "keep in a register" > > > thisCntx _ activeContext. > > > localCntx _ cntx. > > > localVal _ val. > > > > > > "make sure we can return to the given context" > > > ((localCntx = nilOop) or: > > > [(self fetchPointer: InstructionPointerIndex ofObject: localCntx) > > = > > > nilOop]) ifTrue: [ > > > "error: sender's instruction pointer or context is nil; > > cannot return" > > > ^self internalCannotReturn: localVal]. > > > > > > "If this return is not to our immediate predecessor (i.e. from a > > method > > > to its sender, or from a block to its caller), scan the stack for the > > > first unwind marked context and inform this context and let it deal with > > > it. This provides a chance for ensure unwinding to occur." > > > thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext. > > > > > > "Just possibly a faster test would be to compare the homeContext > > and > > > activeContext - they are of course different for blocks. Thus we might be > > > able to optimise a touch by having a different returnTo for the > > > blockreteurn (since we know that must return to caller) and then if > > > active ~= home we must be doing a non-local return. I think. Maybe." > > > [thisCntx = localCntx] whileFalse: [ > > > thisCntx = nilObj ifTrue:[ > > > "error: sender's instruction pointer or context is > > nil; cannot return" > > > ^self internalCannotReturn: localVal]. > > > "Climb up stack towards localCntx. Break out to a send of > > > #aboutToReturn:through: if an unwind marked context is found" > > > header _ self baseHeader: thisCntx. > > > > > > (self isMethodContextHeader: header) > > > ifTrue: [ > > > meth _ self fetchPointer: MethodIndex ofObject: > > thisCntx. > > > pIndex _ self primitiveIndexOf: meth. > > > isUnwindMarked _ pIndex == 198] > > > ifFalse: [isUnwindMarked _ false]. > > > > > > isUnwindMarked ifTrue:[ > > > "context is marked; break out" > > > ^self internalAboutToReturn: localVal through: > > thisCntx]. > > > thisCntx _ self fetchPointer: SenderIndex ofObject: > > thisCntx. > > > ]. > > > > > > "If we get here there is no unwind to worry about. Simply > > terminate the > > > stack up to the localCntx - often just the sender of the method" > > > thisCntx _ activeContext. > > > [thisCntx = localCntx] > > > whileFalse: > > > ["climb up stack to localCntx" > > > contextOfCaller _ self fetchPointer: SenderIndex ofObject: > > thisCntx. > > > > > > "zap exited contexts so any future attempted use will be > > caught" > > > self storePointerUnchecked: SenderIndex ofObject: thisCntx > > withValue: > > > nilOop. > > > self storePointerUnchecked: InstructionPointerIndex > > ofObject: thisCntx > > > withValue: nilOop. > > > reclaimableContextCount > 0 ifTrue: > > > ["try to recycle this context" > > > reclaimableContextCount _ reclaimableContextCount > > - 1. > > > self recycleContextIfPossible: thisCntx]. > > > thisCntx _ contextOfCaller]. > > > > > > activeContext _ thisCntx. > > > (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. > > > > > > self internalFetchContextRegisters: thisCntx. "updates local IP > > and SP" > > > self fetchNextBytecode. > > > self internalPush: localVal. > > > ! ! > > > > > > Shortly after that I released the VMMaker3.6 with a note that it couldn't > > > produce a completely non-inlined VM because of a problem in fetchByte if > > > globalstruct was enabled, and some odd problems in B2DPlugin. When > > > VMMaker3.7 was released a year late (march 04) I apparently thought it > > > could make the core vm non-inlined. Since this is all a bazillion years > > > ago I can't remember any context to help extend the history. > > > > > > tim > > > -- > > > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > > > Science is imagination equipped with grappling hooks. > > > > > > > > > |
Hi David, I wanted to say that COG depends on (self malloc: n) to be translated malloc(n); and not setMalloc(n); for example (you can have many others by browsing unimplemented calls), but maybe foo was not a generic ID in your case?2014-02-11 15:05 GMT+01:00 David T. Lewis <[hidden email]>:
|
OK thank you, I am aware of that trick so not a problem. (But you should not blame Eliot, I think I started abusing slang that way in OSProcessPlugin many years ago, so you can blame me just as well) Thanks a lot, Dave > Hi David, > I wanted to say that COG depends on (self malloc: n) to be translated > malloc(n); and not setMalloc(n); for example (you can have many others by > browsing unimplemented calls), but maybe foo was not a generic ID in your > case? > > > 2014-02-11 15:05 GMT+01:00 David T. Lewis <[hidden email]>: > >> >> On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote: >> > >> > Hi David, >> > do you realize that Eliot is (ab)using this in COG in order to >> eliminate >> > some direct cCode: '...' inclusion? >> > So setFoo: is not an option (or i misunderstood something) >> > >> >> Hi Nicolas, >> >> Actually I am not sure what you are referring to here, so probably I am >> missing something. Can you explain why setFoo: would be a problem in >> Cog? >> I cannot check it myself right now but I am interested to know if I am >> missing something important. >> >> Thanks, >> Dave >> >> >> > >> > 2014-02-10 21:51 GMT+01:00 David T. Lewis <[hidden email]>: >> > >> > > >> > > I was looking at the trunk VMM yesterday and found that most of the >> issues >> > > were just caused by accessor methods, where #foo and #foo: generate >> > > conflicting foo(void) and foo(aParameter). In most cases, a >> convention >> of >> > > #setFoo: rather than #foo: takes care of the problem. There were a >> few >> > > other miscellaneous issues as well, but nothing that looked serious. >> > > >> > > The variable 'memory' is a challenge because it is used extensively >> both >> > > directly and through #memory and #memory:. I was considering >> changing >> the >> > > variable name to something like memoryBase, and leaving the >> accessors >> > > alone though I'm not sure that would be a very good idea. >> > > >> > > I ran out of time yesterday and did not pursue it beyond this. >> > > >> > > Dave >> > > >> > > > >> > > > >> > > > On 10-02-2014, at 11:53 AM, Eliot Miranda >> <[hidden email]> >> > > wrote: >> > > >> >> > > >> I *think* the issue is the internal/external split brought abut >> by >> the >> > > >> introduction of the localFoo variables, such as localSP and >> localIP. >> > > > >> > > > It's really hard to be sure but I suspect that this isn't the >> (only) >> > > > issue. IIRC we used to be able to make non-inlined VMs at one >> point >> and >> > > > that was well after the internalFoo code was added. >> > > > >> > > > OK, some quick email searching reveals some work done in '03 by >> johnMcI, >> > > > Craig & me. >> > > > Craig found the following code helped - >> > > > >> > > > !'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] >> on >> 21 >> > > > July 2003 at 1:11:25 pm'! >> > > > >> > > > !Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! >> > > > primitiveFindNextUnwindContext >> > > > "Primitive. Search up the context stack for the next method >> context >> > > > marked for unwind handling from the receiver up to but not >> including >> the >> > > > argument. Return nil if none found." >> > > > | thisCntx nilOop aContext isUnwindMarked header meth pIndex >> | >> > > > aContext _ self popStack. >> > > > thisCntx _ self fetchPointer: SenderIndex ofObject: self >> popStack. >> > > > nilOop _ nilObj. >> > > > >> > > > [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: >> [ >> > > > >> > > > header _ self baseHeader: aContext. >> > > > >> > > > (self isMethodContextHeader: header) >> > > > ifTrue: [ >> > > > meth _ self fetchPointer: MethodIndex >> ofObject: >> > > aContext. >> > > > pIndex _ self primitiveIndexOf: meth. >> > > > isUnwindMarked _ pIndex == 198] >> > > > ifFalse: [isUnwindMarked _ false]. >> > > > isUnwindMarked ifTrue:[ >> > > > self push: thisCntx. >> > > > ^nil]. >> > > > thisCntx _ self fetchPointer: SenderIndex ofObject: >> > > thisCntx]. >> > > > >> > > > ^self push: nilOop! ! >> > > > >> > > > !Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 >> > > 15:33'! >> > > > interpret >> > > > "This is the main interpreter loop. It normally loops >> forever, >> > > fetching >> > > > and executing bytecodes. When running in the context of a browser >> plugin >> > > > VM, however, it must return control to the browser periodically. >> This >> > > > should done only when the state of the currently running Squeak >> thread is >> > > > safely stored in the object heap. Since this is the case at the >> moment >> > > > that a check for interrupts is performed, that is when we return >> to >> the >> > > > browser if it is time to do so. Interrupt checks happen quite >> > > > frequently." >> > > > >> > > > "record entry time when running as a browser plug-in" >> > > > "self browserPluginInitialiseIfNeeded" >> > > > self internalizeIPandSP. >> > > > self fetchNextBytecode. >> > > > [true] whileTrue: [self dispatchOn: currentBytecode in: >> > > BytecodeTable]. >> > > > localIP _ localIP - 1. "undo the pre-increment of IP before >> > > returning" >> > > > self externalizeIPandSP. >> > > > ! ! >> > > > >> > > > !Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 >> 16:05'! >> > > > returnValueTo >> > > > "Note: Assumed to be inlined into the dispatch loop." >> > > > >> > > > | nilOop thisCntx contextOfCaller localCntx localVal >> isUnwindMarked >> > > > header meth pIndex | >> > > > self inline: true. >> > > > self sharedCodeNamed: 'commonReturn' inCase: 120. >> > > > >> > > > nilOop _ nilObj. "keep in a register" >> > > > thisCntx _ activeContext. >> > > > localCntx _ cntx. >> > > > localVal _ val. >> > > > >> > > > "make sure we can return to the given context" >> > > > ((localCntx = nilOop) or: >> > > > [(self fetchPointer: InstructionPointerIndex ofObject: >> localCntx) >> > > = >> > > > nilOop]) ifTrue: [ >> > > > "error: sender's instruction pointer or context is >> nil; >> > > cannot return" >> > > > ^self internalCannotReturn: localVal]. >> > > > >> > > > "If this return is not to our immediate predecessor (i.e. >> from >> a >> > > method >> > > > to its sender, or from a block to its caller), scan the stack for >> the >> > > > first unwind marked context and inform this context and let it >> deal >> with >> > > > it. This provides a chance for ensure unwinding to occur." >> > > > thisCntx _ self fetchPointer: SenderIndex ofObject: >> activeContext. >> > > > >> > > > "Just possibly a faster test would be to compare the >> homeContext >> > > and >> > > > activeContext - they are of course different for blocks. Thus we >> might be >> > > > able to optimise a touch by having a different returnTo for the >> > > > blockreteurn (since we know that must return to caller) and then >> if >> > > > active ~= home we must be doing a non-local return. I think. >> Maybe." >> > > > [thisCntx = localCntx] whileFalse: [ >> > > > thisCntx = nilObj ifTrue:[ >> > > > "error: sender's instruction pointer or >> context is >> > > nil; cannot return" >> > > > ^self internalCannotReturn: localVal]. >> > > > "Climb up stack towards localCntx. Break out to a >> send >> of >> > > > #aboutToReturn:through: if an unwind marked context is found" >> > > > header _ self baseHeader: thisCntx. >> > > > >> > > > (self isMethodContextHeader: header) >> > > > ifTrue: [ >> > > > meth _ self fetchPointer: MethodIndex >> ofObject: >> > > thisCntx. >> > > > pIndex _ self primitiveIndexOf: meth. >> > > > isUnwindMarked _ pIndex == 198] >> > > > ifFalse: [isUnwindMarked _ false]. >> > > > >> > > > isUnwindMarked ifTrue:[ >> > > > "context is marked; break out" >> > > > ^self internalAboutToReturn: localVal >> through: >> > > thisCntx]. >> > > > thisCntx _ self fetchPointer: SenderIndex ofObject: >> > > thisCntx. >> > > > ]. >> > > > >> > > > "If we get here there is no unwind to worry about. Simply >> > > terminate the >> > > > stack up to the localCntx - often just the sender of the method" >> > > > thisCntx _ activeContext. >> > > > [thisCntx = localCntx] >> > > > whileFalse: >> > > > ["climb up stack to localCntx" >> > > > contextOfCaller _ self fetchPointer: SenderIndex >> ofObject: >> > > thisCntx. >> > > > >> > > > "zap exited contexts so any future attempted use >> will >> be >> > > caught" >> > > > self storePointerUnchecked: SenderIndex ofObject: >> thisCntx >> > > withValue: >> > > > nilOop. >> > > > self storePointerUnchecked: InstructionPointerIndex >> > > ofObject: thisCntx >> > > > withValue: nilOop. >> > > > reclaimableContextCount > 0 ifTrue: >> > > > ["try to recycle this context" >> > > > reclaimableContextCount _ >> reclaimableContextCount >> > > - 1. >> > > > self recycleContextIfPossible: thisCntx]. >> > > > thisCntx _ contextOfCaller]. >> > > > >> > > > activeContext _ thisCntx. >> > > > (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx >> ]. >> > > > >> > > > self internalFetchContextRegisters: thisCntx. "updates >> local >> IP >> > > and SP" >> > > > self fetchNextBytecode. >> > > > self internalPush: localVal. >> > > > ! ! >> > > > >> > > > Shortly after that I released the VMMaker3.6 with a note that it >> couldn't >> > > > produce a completely non-inlined VM because of a problem in >> fetchByte if >> > > > globalstruct was enabled, and some odd problems in B2DPlugin. When >> > > > VMMaker3.7 was released a year late (march 04) I apparently >> thought >> it >> > > > could make the core vm non-inlined. Since this is all a bazillion >> years >> > > > ago I can't remember any context to help extend the history. >> > > > >> > > > tim >> > > > -- >> > > > tim Rowledge; [hidden email]; http://www.rowledge.org/tim >> > > > Science is imagination equipped with grappling hooks. >> > > > >> > > >> > > >> > > >> >> > |
On Tue, Feb 11, 2014 at 11:05 AM, David T. Lewis <[hidden email]> wrote:
Personally I fin it far less of an abuse than the horrible cCode: 'aString...' idiom. With "self malloc: n" I can look for senders etc, but more importantly I can actually implement it in the simulator. You'll see in the Cog branch working implementations of str:n:cmp: mem:mo:ve: etc which are actually required by the simulator. Let me plead for those of you writing VM code to avoid cCode: as much as possible. Use it to include code that only the simulator should use by all means, but please try and generate your C calls from Smalltalk code.
Here's the kind of thing I mean. This coerces an address into a simulator's CogMethod: printCogMethod: cogMethod <api>
<var: #cogMethod type: #'CogMethod *'> | address primitive | self cCode: ''
inSmalltalk: [self transcript ensureCr. cogMethod isInteger ifTrue:
[^self printCogMethod: (self cCoerceSimple: cogMethod to: #'CogMethod *')]]. address := cogMethod asInteger.
self printHex: address; print: ' <-> '; printHex: address + cogMethod blockSize.
cogMethod cmType = CMMethod ifTrue: ... Here's the kind of thing to be avoided: interpreterProxy success:
((interpreterProxy isBytes: oop) and: [(interpreterProxy slotSizeOf: oop) = (self cCode: 'sizeof(AsyncFile)')]).
It could be written as (and if so, simulated!!) interpreterProxy success: ((interpreterProxy isBytes: oop)
and: [(interpreterProxy slotSizeOf: oop) = (self sizeof: #AsyncFile)]). cheers! Thanks a lot, best, Eliot
|
> Let me plead for those of you writing VM code to avoid cCode: as much > as possible. Use it to include code that only the simulator should > use by all means, but please try and generate your C calls from > Smalltalk code. Hear, hear! -C -- Craig Latta www.netjam.org/resume +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS) |
In reply to this post by timrowledge
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote: > > On 09-02-2014, at 10:07 AM, David T. Lewis <[hidden email]> wrote: > > > > I think someone mentioned it earlier, but a very easy way to produce an > > intentionally slow VM is to generate the sources from VMMaker with the > > inlining step disabled. The slang inliner is extremely effective, and turning > > it off produces impressively sluggish results. > > Does that actually work these days? Last I remember was that turning > inlining off wouldn?t produce a buildable interp.c file. If someone has > had the patience to make it work then I?m impressed. > You're right about one thing, it required a lot of patience ;-) I did manage to get it working though, and the results are in VMMaker-dtl.342. This turned out to be a useful exercise, as I flushed out a couple of type declaration bugs along the way. The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined. I went with the approach of naming the accessors getFoo and setFoo: as well as, for the case of array access, fooAt: and fooAt:put:. This is not very pleasing from a readability point of view, but it is simple and it works. If I compile a VM with inlining disabled and compiler optimization turned off, the result is about 1/8th the speed of the same interpreter VM built normally. Dave |
Free forum by Nabble | Edit this page |