Smalltalk › Squeak › Squeak - Dev

Idea for a possibly better Collection occurrencesOf method.

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

51 messages Options

123

Bryce Kampjes

Re: Idea for a possibly better Collection occurrencesOf method.

Klaus D. Witzel writes:
> Hi Bryce,
>
> turning the suggested primitive behind BlockContext>>#apply:from:to: into
> a #primitiveValue is, after checking receiver and arguments, easy.
>
> Problems come as soon as the block later sends, whereafter #commonReturn
> does no longer know anything about #primitiveApplyFromTo. This asks for
> something that stores such state, for example by making a
> MethodContext>>#apply:from:to:on: solution (with last argument the block).
>
> I can imagine that Exupery encounters similar situations (i.e. native code
> does a full send for whatever reason). Is there something that can be
> learned (or, perhaps, even copied) from Exupery.

Exupery modifies commonReturn: and transferTo: amongst other methods
to allow it's modified contexts to be re-entered from compiled code.

I suspect that implementing #apply:from:to:on: in Squeak is a mistake.
First because it'll make the code less clear. Second because it's
going to be very hard to do with the current VM and contexts,
VisualAge doesn't have context objects which would make this sort of
thing easier but also means that it can't support Seaside. Third
because I'd need to re-implement any such primitive in Exupery. Forth
because you'll need to implement the primitive in the debugger so you
can still single step through do:. Fifth because such a primitive will
be harder to understand than our current do: implementation. And
finally because the current implementation or even one using count:
could be fully optimised by using full method inlining and a decent
optimising compiler. Full method inlining will optimise much more than
just do: and friends.

Full method inlining would also allow us to stop inlining methods such
as ifNotNil: without any drop in speed. Either optimise a very general
case or optimise a specific case based on profiling.

Bryce

Klaus D. Witzel

Re: Performance figures [Re: Idea for a possibly better Collection occurrencesOf method]

In reply to this post by Jon Hylands

Hi Jon,

on Wed, 13 Sep 2006 19:43:38 +0200, you wrote:

> On Wed, 13 Sep 2006 18:10:30 +0200, "Klaus D. Witzel" wrote:
>
>> I doubt that a primitive implementation would show a significantly
>> better
>> factor, because the VM already is fast! and because in practice the
>> #to:do: intervals are usually smaller than in the above. The only
>> difference I saw in the bytecodes was related to the #to:do: and #value:
>> message send.
>
> The point of the #apply... primitive is so that all the enumeration
> methods
> run as fast as #to:do:,

Absolutely. The primitive code has to be shadowed (because of the always
possible #primitiveFailed situation) and the one shadow that is also
inlined by the compiler is
start to: stop do: [:index | self "I'm the block" value: (seqColl at:
index)]

#primitiveApply:from:to: can do faster but not better, there is no
solution with less code.

> so you don't have to worry about trading off code
> clarity and re-use for performance.

I cannot say that I understand this sentence (read it time and again, but
no clue). What's the connection.

/Klaus

> Later,
> Jon

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by danil osipchuk

Hi danil,

on Wed, 13 Sep 2006 20:55:39 +0200, you wrote:

> But I wonder, if VAST problems with seaside are related to something
> like that ;)

That would be crazy, wouldn't it '<

/Klaus

> Danil
>>
>> :) Nice question, important as well.
>>
>>> I guess the state
>>> (iteration index) is stored in the C stack and cannot be captured by
>>> looking at thisContext, right?
>>
>> No, nothing can be stored in the C stack. Say we have ... apply:
>> seqColl from: here to: there, then at each iteration the VM does here
>> := here + 1 (that's a privilege of the VM). Later when #commonReturn
>> comes back, the situation is precisely as it was at the time of the
>> first call, expect that argument 'here' reflects the progress. And
>> kangaroo and debugger and friends are not affected.
>>
>> It is because of the latter that I'd say that that works in harmony
>> with continuations.
>>
>> /Klaus
>>
>>>
>>> Lukas
>>>
>>
>>
>>
>
>
>

Hans-Martin Mosner

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by danil osipchuk

danil osipchuk schrieb:
> But I wonder, if VAST problems with seaside are related to something
> like that ;)
Not primarily. VAST uses a single stack per process instead of Context
objects as the Smalltalk-80 derived systems (Squeak and VisualWorks) do,
and the stack can not be safely manipulated by non-VM code. I've tried
it quite hard, and I did not find a way to implement continuations
correctly.

The iterator primitive could be implemented with Context objects, but
there are some tricky issues:
1. There needs to be a slot in a context where the iteration variable
can be stored. One possibility would be to keep the block, collection,
limit and current index on the stack of the calling context, instead of
popping them off the stack.
2. The return value from the block needs to be discarded somehow.
Normally, every return from a method or block pushes the result onto the
caller's stack. The problem is that the primitive can only handle the
activation of the block, not its return.
3. The primitive needs to ensure that the return will happen to the
right place so that the loop can continue, preferably just to the
position of the send bytecode. But since bytecodes may be 1 or more
bytes long, it is difficult for the primitive to find out what's the
correct place.

Overall, I think that the performance and readability improvements
gained by such a primitive might not be worth the effort.

Cheers,
Hans-Martin

Jon Hylands

Re: Performance figures [Re: Idea for a possibly better Collection occurrencesOf method]

In reply to this post by Klaus D. Witzel

On Wed, 13 Sep 2006 21:35:01 +0200, "Klaus D. Witzel"
<[hidden email]> wrote:

> > so you don't have to worry about trading off code
> > clarity and re-use for performance.
>
> I cannot say that I understand this sentence (read it time and again, but
> no clue). What's the connection.

After having done some simple benchmarks, there doesn't appear to be much
of a difference in the two techniques:

| time collection sum |
collection := (1 to: 100000) asOrderedCollection.
Smalltalk garbageCollect.
time := Time millisecondsToRun: [sum := collection sum].
sum.
time

versus:

| time collection sum |
collection := (1 to: 100000) asOrderedCollection.
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := 0.
1 to: collection size do: [:index |
sum := sum + (collection at: index)]].
sum.
time

They both give more or less identical time (about 280 ms on my machine).

It may be that this is a moot point in Squeak...

Later,
Jon

--------------------------------------------------------------
Jon Hylands [hidden email] http://www.huv.com/jon

Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
http://www.huv.com

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Bryce Kampjes

On Wed, 13 Sep 2006 21:26:13 +0200, Bryce Kampjes wrote:
> Klaus D. Witzel writes:
> > Hi Bryce,
...
> > I can imagine that Exupery encounters similar situations (i.e. native
> code
> > does a full send for whatever reason). Is there something that can be
> > learned (or, perhaps, even copied) from Exupery.
>
> Exupery modifies commonReturn: and transferTo: amongst other methods
> to allow it's modified contexts to be re-entered from compiled code.

Yes, that's what I thought.

> I suspect that implementing #apply:from:to:on: in Squeak is a mistake.

A possible performance improvement by a factor of 3 is not what people
would count a mistake.

> First because it'll make the code less clear.

"old" start to: stop do: [:index | aBlock value: (self at: index)]
"new" self applyTo: aBlock from: start to: stop

differences are: method name, receiver and arguments swapped.

no difference: self is typed, aBlock is typed by convention.

> Second because it's
> going to be very hard to do with the current VM and contexts,

Yes, certainly. I'd therefore attempt to have a context to which the
activation can return. 4 LOC in #commonSend, 6 LOC in #commonReturn (not
counting comments).

> VisualAge doesn't have context objects which would make this sort of
> thing easier but also means that it can't support Seaside.

Yes, that's what danil conjectured. Can't say much about VA St.

> Third
> because I'd need to re-implement any such primitive in Exupery.

Since Exupery either compiles or interprets, I'd suggest that it ignores
the <primitive: 164> header mark (number is fake here) and just compiles
the shadow (same as below).

> Forth
> because you'll need to implement the primitive in the debugger so you
> can still single step through do:.

Let noughty stack manipulator just ignore (primitive = 164) and step
through the shadow (even on return from some activation).

> Fifth because such a primitive will
> be harder to understand than our current do: implementation.

It is a Stream>>#next followed by a BlockContext>>#value: (with some type
checks mixed in). All that can be "reused" from the respective existing
primitives in the VM. But those who don't understand Stream>>#next and
BlockContext>>#value: would have a hard time ;-)

> And
> finally because the current implementation or even one using count:
> could be fully optimised by using full method inlining and a decent
> optimising compiler.

Same would be the case with #applyTo:from:to:, what could be a possible
difference when comparing the above snippets.

> Full method inlining will optimise much more than
> just do: and friends.

Absolutely. Just ignore cases with (primitive = 164) and do a full inline
of the respective shadow.

/Klaus

> Full method inlining would also allow us to stop inlining methods such
> as ifNotNil: without any drop in speed. Either optimise a very general
> case or optimise a specific case based on profiling.
>
> Bryce
>
>

Klaus D. Witzel

Re: Performance figures [Re: Idea for a possibly better Collection occurrencesOf method]

In reply to this post by Jon Hylands

Sorry Jon, you timed boxing of LargeIntegers (and perhaps garbage
collection of the boxes). In the below the number 10565520 is short for
the interval I suggested, 1 to: (0.5 * SmallInteger maxVal) sqrt truncated
* 456.

Almost factor 5, that's what I expected (asymptotically 6). My other
demoes did almost nothing inside the loops.

So: should we care and ask the VM what she can do better than that ;-)

/Klaus

| time collection sum |
collection := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := 0.
1 to: collection size do: [:index |
sum := sum + (collection at: index)]].
sum.
time
=> 2096

| time collection sum |
collection := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [sum := collection sum].
sum.
time
=> 9681

On Wed, 13 Sep 2006 22:09:55 +0200, Jon Hylands wrote:

> On Wed, 13 Sep 2006 21:35:01 +0200, "Klaus D. Witzel"
> <[hidden email]> wrote:
>
>> > so you don't have to worry about trading off code
>> > clarity and re-use for performance.
>>
>> I cannot say that I understand this sentence (read it time and again,
>> but
>> no clue). What's the connection.
>
> After having done some simple benchmarks, there doesn't appear to be much
> of a difference in the two techniques:
>
> | time collection sum |
> collection := (1 to: 100000) asOrderedCollection.
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [sum := collection sum].
> sum.
> time
>
> versus:
>
> | time collection sum |
> collection := (1 to: 100000) asOrderedCollection.
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [
> sum := 0.
> 1 to: collection size do: [:index |
> sum := sum + (collection at: index)]].
> sum.
> time
>
> They both give more or less identical time (about 280 ms on my machine).
>
> It may be that this is a moot point in Squeak...
>
> Later,
> Jon
>
> --------------------------------------------------------------
> Jon Hylands [hidden email] http://www.huv.com/jon
>
> Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
> http://www.huv.com
>
>

Bryce Kampjes

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Klaus D. Witzel

Klaus D. Witzel writes:

> > I suspect that implementing #apply:from:to:on: in Squeak is a mistake.
>
> A possible performance improvement by a factor of 3 is not what people
> would count a mistake.

Here are some numbers from Exupery now:

arithmaticLoopBenchmark 1376 compiled 92 ratio: 14.957
bytecodeBenchmark 2201 compiled 430 ratio: 5.119
sendBenchmark 1580 compiled 691 ratio: 2.287
doLoopsBenchmark 1069 compiled 729 ratio: 1.466
largeExplorers 372 compiled 432 ratio: 0.861
compilerBenchmark 807 compiled 812 ratio: 0.994

The first column of numbers is the time in milliseconds to interpret
the benchmark. The second is the time to run the compiled
benchmark. The third is the ratio between the two.

arithmaticLoopBenchmark is a simple 1 to: do: loop. It's just
SmallInteger additions and looping.

bytecodeBenchmark and sendBenchmark are the two tiny benchmarks.

doLoopsBenchmark is a basic do: loop, the time is almost certainly
going into the call from compiled code to value:. value: is currently
implemented by calling ExuperyBlockContext>>value: which is
implemented in Slang. Re-implementing as compiled code will speed that
up.

Improving send performance by a factor of 2.2 is done now so long
as both the receiver and sender are compiled.

largeExplorers and compilerBenchmarks are two macro benchmarks. Both
spend a lot of time in primitives. Enough that speeding up the do:
loop execution doesn't seem to matter. I haven't done that much
experimentation with these benchmarks recently. It's possible that
they may benefit from a different compilation strategy.

Bryce

P.S. This is all the answer I have time to write tonight. Hopefully,
it provides something to think about.

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Hans-Martin Mosner

Hi Hans-Martin,

thank you for sharing your experience. Now I know whom to ask for a code
review on iterators who make a living in contexts ;-)

I started with the idea to keep the three arguments aBlock, index and
limit located where they are and rewrite index+1 in-place.

So this is invariant at #commonSend time.

And this is invariant at #commonReturn time (which discards possible
return values with a simple ifFalse:).

And it implies linear code in the primitive; the ifTrue:'s ifFalse:'s all
test only successFlag. A straight forward implementation of Stream>>#next
concatenated with BlockContext>>#value:.

Since the primitive either successfully populates and activates aBlock
(like #value: does, initialIP, argument) or else fails, it does not need
to know about any previous bytecode position.

Also invariant: the primitive is called by both, #commonSend and
#commonReturn, and always does the same - except when index <= limit no
longer holds.

The latter, (index <= limit) not, has 2 cases

at #commonSend time the receiver object is returned

at #commonReturn time the receiver object is returned

Invariant, again.

/Klaus

On Wed, 13 Sep 2006 21:47:48 +0200, Hans-Martin Mosner wrote:

> danil osipchuk schrieb:
>> But I wonder, if VAST problems with seaside are related to something
>> like that ;)
> Not primarily. VAST uses a single stack per process instead of Context
> objects as the Smalltalk-80 derived systems (Squeak and VisualWorks) do,
> and the stack can not be safely manipulated by non-VM code. I've tried
> it quite hard, and I did not find a way to implement continuations
> correctly.
>
> The iterator primitive could be implemented with Context objects, but
> there are some tricky issues:
> 1. There needs to be a slot in a context where the iteration variable
> can be stored. One possibility would be to keep the block, collection,
> limit and current index on the stack of the calling context, instead of
> popping them off the stack.
> 2. The return value from the block needs to be discarded somehow.
> Normally, every return from a method or block pushes the result onto the
> caller's stack. The problem is that the primitive can only handle the
> activation of the block, not its return.
> 3. The primitive needs to ensure that the return will happen to the
> right place so that the loop can continue, preferably just to the
> position of the send bytecode. But since bytecodes may be 1 or more
> bytes long, it is difficult for the primitive to find out what's the
> correct place.
>
> Overall, I think that the performance and readability improvements
> gained by such a primitive might not be worth the effort.
>
> Cheers,
> Hans-Martin
>
>

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Bryce Kampjes

Great, thank you Bryce!

On Thu, 14 Sep 2006 00:57:13 +0200, Bryce Kampjes wrote:
...
> P.S. This is all the answer I have time to write tonight. Hopefully,
> it provides something to think about.

Yes, it provides many things to think about for an as-yet-not-written
primitive :)

Thanks again.

/Klaus

Jon Hylands

Re: Performance figures [Re: Idea for a possibly better Collection occurrencesOf method]

In reply to this post by Klaus D. Witzel

On Thu, 14 Sep 2006 00:36:09 +0200, "Klaus D. Witzel"
<[hidden email]> wrote:

> Sorry Jon, you timed boxing of LargeIntegers (and perhaps garbage
> collection of the boxes). In the below the number 10565520 is short for
> the interval I suggested, 1 to: (0.5 * SmallInteger maxVal) sqrt truncated
> * 456.

Ahhh, okay, that makes sense. Silly mistake.

Later,
Jon

--------------------------------------------------------------
Jon Hylands [hidden email] http://www.huv.com/jon

Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
http://www.huv.com

123