Klaus D. Witzel writes:
> Hi Bryce, > > turning the suggested primitive behind BlockContext>>#apply:from:to: into > a #primitiveValue is, after checking receiver and arguments, easy. > > Problems come as soon as the block later sends, whereafter #commonReturn > does no longer know anything about #primitiveApplyFromTo. This asks for > something that stores such state, for example by making a > MethodContext>>#apply:from:to:on: solution (with last argument the block). > > I can imagine that Exupery encounters similar situations (i.e. native code > does a full send for whatever reason). Is there something that can be > learned (or, perhaps, even copied) from Exupery. Exupery modifies commonReturn: and transferTo: amongst other methods to allow it's modified contexts to be re-entered from compiled code. I suspect that implementing #apply:from:to:on: in Squeak is a mistake. First because it'll make the code less clear. Second because it's going to be very hard to do with the current VM and contexts, VisualAge doesn't have context objects which would make this sort of thing easier but also means that it can't support Seaside. Third because I'd need to re-implement any such primitive in Exupery. Forth because you'll need to implement the primitive in the debugger so you can still single step through do:. Fifth because such a primitive will be harder to understand than our current do: implementation. And finally because the current implementation or even one using count: could be fully optimised by using full method inlining and a decent optimising compiler. Full method inlining will optimise much more than just do: and friends. Full method inlining would also allow us to stop inlining methods such as ifNotNil: without any drop in speed. Either optimise a very general case or optimise a specific case based on profiling. Bryce |
In reply to this post by Jon Hylands
Hi Jon,
on Wed, 13 Sep 2006 19:43:38 +0200, you wrote: > On Wed, 13 Sep 2006 18:10:30 +0200, "Klaus D. Witzel" wrote: > >> I doubt that a primitive implementation would show a significantly >> better >> factor, because the VM already is fast! and because in practice the >> #to:do: intervals are usually smaller than in the above. The only >> difference I saw in the bytecodes was related to the #to:do: and #value: >> message send. > > The point of the #apply... primitive is so that all the enumeration > methods > run as fast as #to:do:, Absolutely. The primitive code has to be shadowed (because of the always possible #primitiveFailed situation) and the one shadow that is also inlined by the compiler is start to: stop do: [:index | self "I'm the block" value: (seqColl at: index)] #primitiveApply:from:to: can do faster but not better, there is no solution with less code. > so you don't have to worry about trading off code > clarity and re-use for performance. I cannot say that I understand this sentence (read it time and again, but no clue). What's the connection. /Klaus > Later, > Jon |
In reply to this post by danil osipchuk
Hi danil,
on Wed, 13 Sep 2006 20:55:39 +0200, you wrote: > But I wonder, if VAST problems with seaside are related to something > like that ;) That would be crazy, wouldn't it '< /Klaus > Danil >> >> :) Nice question, important as well. >> >>> I guess the state >>> (iteration index) is stored in the C stack and cannot be captured by >>> looking at thisContext, right? >> >> No, nothing can be stored in the C stack. Say we have ... apply: >> seqColl from: here to: there, then at each iteration the VM does here >> := here + 1 (that's a privilege of the VM). Later when #commonReturn >> comes back, the situation is precisely as it was at the time of the >> first call, expect that argument 'here' reflects the progress. And >> kangaroo and debugger and friends are not affected. >> >> It is because of the latter that I'd say that that works in harmony >> with continuations. >> >> /Klaus >> >>> >>> Lukas >>> >> >> >> > > > |
In reply to this post by danil osipchuk
danil osipchuk schrieb:
> But I wonder, if VAST problems with seaside are related to something > like that ;) Not primarily. VAST uses a single stack per process instead of Context objects as the Smalltalk-80 derived systems (Squeak and VisualWorks) do, and the stack can not be safely manipulated by non-VM code. I've tried it quite hard, and I did not find a way to implement continuations correctly. The iterator primitive could be implemented with Context objects, but there are some tricky issues: 1. There needs to be a slot in a context where the iteration variable can be stored. One possibility would be to keep the block, collection, limit and current index on the stack of the calling context, instead of popping them off the stack. 2. The return value from the block needs to be discarded somehow. Normally, every return from a method or block pushes the result onto the caller's stack. The problem is that the primitive can only handle the activation of the block, not its return. 3. The primitive needs to ensure that the return will happen to the right place so that the loop can continue, preferably just to the position of the send bytecode. But since bytecodes may be 1 or more bytes long, it is difficult for the primitive to find out what's the correct place. Overall, I think that the performance and readability improvements gained by such a primitive might not be worth the effort. Cheers, Hans-Martin |
In reply to this post by Klaus D. Witzel
On Wed, 13 Sep 2006 21:35:01 +0200, "Klaus D. Witzel"
<[hidden email]> wrote: > > so you don't have to worry about trading off code > > clarity and re-use for performance. > > I cannot say that I understand this sentence (read it time and again, but > no clue). What's the connection. After having done some simple benchmarks, there doesn't appear to be much of a difference in the two techniques: | time collection sum | collection := (1 to: 100000) asOrderedCollection. Smalltalk garbageCollect. time := Time millisecondsToRun: [sum := collection sum]. sum. time versus: | time collection sum | collection := (1 to: 100000) asOrderedCollection. Smalltalk garbageCollect. time := Time millisecondsToRun: [ sum := 0. 1 to: collection size do: [:index | sum := sum + (collection at: index)]]. sum. time They both give more or less identical time (about 280 ms on my machine). It may be that this is a moot point in Squeak... Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com |
In reply to this post by Bryce Kampjes
On Wed, 13 Sep 2006 21:26:13 +0200, Bryce Kampjes wrote:
> Klaus D. Witzel writes: > > Hi Bryce, ... > > I can imagine that Exupery encounters similar situations (i.e. native > code > > does a full send for whatever reason). Is there something that can be > > learned (or, perhaps, even copied) from Exupery. > > Exupery modifies commonReturn: and transferTo: amongst other methods > to allow it's modified contexts to be re-entered from compiled code. Yes, that's what I thought. > I suspect that implementing #apply:from:to:on: in Squeak is a mistake. A possible performance improvement by a factor of 3 is not what people would count a mistake. > First because it'll make the code less clear. "old" start to: stop do: [:index | aBlock value: (self at: index)] "new" self applyTo: aBlock from: start to: stop differences are: method name, receiver and arguments swapped. no difference: self is typed, aBlock is typed by convention. > Second because it's > going to be very hard to do with the current VM and contexts, Yes, certainly. I'd therefore attempt to have a context to which the activation can return. 4 LOC in #commonSend, 6 LOC in #commonReturn (not counting comments). > VisualAge doesn't have context objects which would make this sort of > thing easier but also means that it can't support Seaside. Yes, that's what danil conjectured. Can't say much about VA St. > Third > because I'd need to re-implement any such primitive in Exupery. Since Exupery either compiles or interprets, I'd suggest that it ignores the <primitive: 164> header mark (number is fake here) and just compiles the shadow (same as below). > Forth > because you'll need to implement the primitive in the debugger so you > can still single step through do:. Let noughty stack manipulator just ignore (primitive = 164) and step through the shadow (even on return from some activation). > Fifth because such a primitive will > be harder to understand than our current do: implementation. It is a Stream>>#next followed by a BlockContext>>#value: (with some type checks mixed in). All that can be "reused" from the respective existing primitives in the VM. But those who don't understand Stream>>#next and BlockContext>>#value: would have a hard time ;-) > And > finally because the current implementation or even one using count: > could be fully optimised by using full method inlining and a decent > optimising compiler. Same would be the case with #applyTo:from:to:, what could be a possible difference when comparing the above snippets. > Full method inlining will optimise much more than > just do: and friends. Absolutely. Just ignore cases with (primitive = 164) and do a full inline of the respective shadow. /Klaus > Full method inlining would also allow us to stop inlining methods such > as ifNotNil: without any drop in speed. Either optimise a very general > case or optimise a specific case based on profiling. > > Bryce > > |
In reply to this post by Jon Hylands
Sorry Jon, you timed boxing of LargeIntegers (and perhaps garbage
collection of the boxes). In the below the number 10565520 is short for the interval I suggested, 1 to: (0.5 * SmallInteger maxVal) sqrt truncated * 456. Almost factor 5, that's what I expected (asymptotically 6). My other demoes did almost nothing inside the loops. So: should we care and ask the VM what she can do better than that ;-) /Klaus | time collection sum | collection := (1 to: 10565520) collect: [:none | 1]. Smalltalk garbageCollect. time := Time millisecondsToRun: [ sum := 0. 1 to: collection size do: [:index | sum := sum + (collection at: index)]]. sum. time => 2096 | time collection sum | collection := (1 to: 10565520) collect: [:none | 1]. Smalltalk garbageCollect. time := Time millisecondsToRun: [sum := collection sum]. sum. time => 9681 On Wed, 13 Sep 2006 22:09:55 +0200, Jon Hylands wrote: > On Wed, 13 Sep 2006 21:35:01 +0200, "Klaus D. Witzel" > <[hidden email]> wrote: > >> > so you don't have to worry about trading off code >> > clarity and re-use for performance. >> >> I cannot say that I understand this sentence (read it time and again, >> but >> no clue). What's the connection. > > After having done some simple benchmarks, there doesn't appear to be much > of a difference in the two techniques: > > | time collection sum | > collection := (1 to: 100000) asOrderedCollection. > Smalltalk garbageCollect. > time := Time millisecondsToRun: [sum := collection sum]. > sum. > time > > versus: > > | time collection sum | > collection := (1 to: 100000) asOrderedCollection. > Smalltalk garbageCollect. > time := Time millisecondsToRun: [ > sum := 0. > 1 to: collection size do: [:index | > sum := sum + (collection at: index)]]. > sum. > time > > They both give more or less identical time (about 280 ms on my machine). > > It may be that this is a moot point in Squeak... > > Later, > Jon > > -------------------------------------------------------------- > Jon Hylands [hidden email] http://www.huv.com/jon > > Project: Micro Seeker (Micro Autonomous Underwater Vehicle) > http://www.huv.com > > |
In reply to this post by Klaus D. Witzel
Klaus D. Witzel writes:
> > I suspect that implementing #apply:from:to:on: in Squeak is a mistake. > > A possible performance improvement by a factor of 3 is not what people > would count a mistake. Here are some numbers from Exupery now: arithmaticLoopBenchmark 1376 compiled 92 ratio: 14.957 bytecodeBenchmark 2201 compiled 430 ratio: 5.119 sendBenchmark 1580 compiled 691 ratio: 2.287 doLoopsBenchmark 1069 compiled 729 ratio: 1.466 largeExplorers 372 compiled 432 ratio: 0.861 compilerBenchmark 807 compiled 812 ratio: 0.994 The first column of numbers is the time in milliseconds to interpret the benchmark. The second is the time to run the compiled benchmark. The third is the ratio between the two. arithmaticLoopBenchmark is a simple 1 to: do: loop. It's just SmallInteger additions and looping. bytecodeBenchmark and sendBenchmark are the two tiny benchmarks. doLoopsBenchmark is a basic do: loop, the time is almost certainly going into the call from compiled code to value:. value: is currently implemented by calling ExuperyBlockContext>>value: which is implemented in Slang. Re-implementing as compiled code will speed that up. Improving send performance by a factor of 2.2 is done now so long as both the receiver and sender are compiled. largeExplorers and compilerBenchmarks are two macro benchmarks. Both spend a lot of time in primitives. Enough that speeding up the do: loop execution doesn't seem to matter. I haven't done that much experimentation with these benchmarks recently. It's possible that they may benefit from a different compilation strategy. Bryce P.S. This is all the answer I have time to write tonight. Hopefully, it provides something to think about. |
In reply to this post by Hans-Martin Mosner
Hi Hans-Martin,
thank you for sharing your experience. Now I know whom to ask for a code review on iterators who make a living in contexts ;-) I started with the idea to keep the three arguments aBlock, index and limit located where they are and rewrite index+1 in-place. So this is invariant at #commonSend time. And this is invariant at #commonReturn time (which discards possible return values with a simple ifFalse:). And it implies linear code in the primitive; the ifTrue:'s ifFalse:'s all test only successFlag. A straight forward implementation of Stream>>#next concatenated with BlockContext>>#value:. Since the primitive either successfully populates and activates aBlock (like #value: does, initialIP, argument) or else fails, it does not need to know about any previous bytecode position. Also invariant: the primitive is called by both, #commonSend and #commonReturn, and always does the same - except when index <= limit no longer holds. The latter, (index <= limit) not, has 2 cases at #commonSend time the receiver object is returned at #commonReturn time the receiver object is returned Invariant, again. /Klaus On Wed, 13 Sep 2006 21:47:48 +0200, Hans-Martin Mosner wrote: > danil osipchuk schrieb: >> But I wonder, if VAST problems with seaside are related to something >> like that ;) > Not primarily. VAST uses a single stack per process instead of Context > objects as the Smalltalk-80 derived systems (Squeak and VisualWorks) do, > and the stack can not be safely manipulated by non-VM code. I've tried > it quite hard, and I did not find a way to implement continuations > correctly. > > The iterator primitive could be implemented with Context objects, but > there are some tricky issues: > 1. There needs to be a slot in a context where the iteration variable > can be stored. One possibility would be to keep the block, collection, > limit and current index on the stack of the calling context, instead of > popping them off the stack. > 2. The return value from the block needs to be discarded somehow. > Normally, every return from a method or block pushes the result onto the > caller's stack. The problem is that the primitive can only handle the > activation of the block, not its return. > 3. The primitive needs to ensure that the return will happen to the > right place so that the loop can continue, preferably just to the > position of the send bytecode. But since bytecodes may be 1 or more > bytes long, it is difficult for the primitive to find out what's the > correct place. > > Overall, I think that the performance and readability improvements > gained by such a primitive might not be worth the effort. > > Cheers, > Hans-Martin > > |
In reply to this post by Bryce Kampjes
Great, thank you Bryce!
On Thu, 14 Sep 2006 00:57:13 +0200, Bryce Kampjes wrote: ... > P.S. This is all the answer I have time to write tonight. Hopefully, > it provides something to think about. Yes, it provides many things to think about for an as-yet-not-written primitive :) Thanks again. /Klaus |
In reply to this post by Klaus D. Witzel
On Thu, 14 Sep 2006 00:36:09 +0200, "Klaus D. Witzel"
<[hidden email]> wrote: > Sorry Jon, you timed boxing of LargeIntegers (and perhaps garbage > collection of the boxes). In the below the number 10565520 is short for > the interval I suggested, 1 to: (0.5 * SmallInteger maxVal) sqrt truncated > * 456. Ahhh, okay, that makes sense. Silly mistake. Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com |
Free forum by Nabble | Edit this page |