Smalltalk › Squeak › Squeak - Dev

[Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

86 messages Options

12345

Klaus D. Witzel

[Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Attached is #primitiveApplyToFromTo as it compiles and runs using the
win32 Squeak 3.7.1 build environment here; I'm testing it with the 3.8 and
3.9 images.

The performance is compared against plain #do: (apples v.s. apples) and
#to:do: (apples v.s. oranges) further down below.

Compared to #do: with block, #applyTo:from:to: with block is definitively
faster (smaller hidden constants) and there also is at least one item
waiting for optimization (and for me having time to do that).

I'd say that the factor can arrive at about 2 (#do: v.s.
#applyTo:from:to:). But the primitive will not outperform an inlined
#to:do: this was clear from the beginning. As Jon wrote, the primitive is
good for "fixing" the standard enumeration methods so they all have (more
or less) identical performance.

The current implementation is for non-Strings only (as dictated by an
atCache parameter) but can of course be extended to work with Strings.
Another item that's missing is returning self from the primitive, this is
currently handled by falling into the shadow which then returns (and some
superflous bytecode cycles are burned).

Example method in SequenceableCollection
applyTo: aBlock from: index to: limit
<primitive: 164>
[index <= limit] whileTrue:
[aBlock value: (self basicAt: index).
thisContext tempAt: 2 put: index + 1]

The shadow code is exactly what is done by the primitive. As discussed
ealier,the primitive can be called from #commonSend and from #commonReturn
and in both cases might decide to exit with primitiveFail.

The overrides for #commonSend and #commonReturn are not attached but if
you want them ask me.

Any questions?

/Klaus

--------------------

| time array sum |
array := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := array sum].
sum.
time
=> 9093

--------------------

| time array sum |
array := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := 0.
array do: [:each | sum := sum + each]].
sum.
time
=> 4764

--------------------

| time array sum |
array := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := 0.
array applyTo: [:each | sum := sum + each]
from: 1 to: array size].
sum.
time
=> 3499

--------------------

| time array sum |
array := (1 to: 10565520) collect: [:none | 1].
Smalltalk garbageCollect.
time := Time millisecondsToRun: [
sum := 0.
1 to: array size do: [:index | sum := sum + (array at: index)]].
sum.
time
=> 2089

--------------------

PrimitiveApplyFromTo-kwl.1.cs (3K) Download Attachment

Andreas.Raab

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Where is the other half of the changes? What you sent isn't enough to
generate a functioning VM. I'm in particular curious how you've dealt
with the loop inlining and it seems there is some code missing which
does that.

Cheers,
- Andreas

Klaus D. Witzel wrote:

> Attached is #primitiveApplyToFromTo as it compiles and runs using the
> win32 Squeak 3.7.1 build environment here; I'm testing it with the 3.8
> and 3.9 images.
>
> The performance is compared against plain #do: (apples v.s. apples) and
> #to:do: (apples v.s. oranges) further down below.
>
> Compared to #do: with block, #applyTo:from:to: with block is
> definitively faster (smaller hidden constants) and there also is at
> least one item waiting for optimization (and for me having time to do
> that).
>
> I'd say that the factor can arrive at about 2 (#do: v.s.
> #applyTo:from:to:). But the primitive will not outperform an inlined
> #to:do: this was clear from the beginning. As Jon wrote, the primitive
> is good for "fixing" the standard enumeration methods so they all have
> (more or less) identical performance.
>
> The current implementation is for non-Strings only (as dictated by an
> atCache parameter) but can of course be extended to work with Strings.
> Another item that's missing is returning self from the primitive, this
> is currently handled by falling into the shadow which then returns (and
> some superflous bytecode cycles are burned).
>
> Example method in SequenceableCollection
> applyTo: aBlock from: index to: limit
> <primitive: 164>
> [index <= limit] whileTrue:
> [aBlock value: (self basicAt: index).
> thisContext tempAt: 2 put: index + 1]
>
> The shadow code is exactly what is done by the primitive. As discussed
> ealier,the primitive can be called from #commonSend and from
> #commonReturn and in both cases might decide to exit with primitiveFail.
>
> The overrides for #commonSend and #commonReturn are not attached but if
> you want them ask me.
>
> Any questions?
>
> /Klaus
>
> --------------------
>
> | time array sum |
> array := (1 to: 10565520) collect: [:none | 1].
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [
> sum := array sum].
> sum.
> time
> => 9093
>
> --------------------
>
> | time array sum |
> array := (1 to: 10565520) collect: [:none | 1].
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [
> sum := 0.
> array do: [:each | sum := sum + each]].
> sum.
> time
> => 4764
>
> --------------------
>
> | time array sum |
> array := (1 to: 10565520) collect: [:none | 1].
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [
> sum := 0.
> array applyTo: [:each | sum := sum + each]
> from: 1 to: array size].
> sum.
> time
> => 3499
>
> --------------------
>
> | time array sum |
> array := (1 to: 10565520) collect: [:none | 1].
> Smalltalk garbageCollect.
> time := Time millisecondsToRun: [
> sum := 0.
> 1 to: array size do: [:index | sum := sum + (array at: index)]].
> sum.
> time
> => 2089
>
> --------------------
>
>
> ------------------------------------------------------------------------
>
>

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

The previously missing #returnReceiver case is now solved and the new .cs
posted to

- http://bugs.impara.de/view.php?id=4925

The Mantis report also includes the overrides for #commonSend and
#commonReturn.

Andreas, #commonSend and #commonReturn are the same what I emailed you
last night.

It would be interesting to compare the performance figures for platforms
other than win32.

/Klaus

Bryce Kampjes

[Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

At the moment no primitives are re-entered. We don't have primitive
contexts. Changing that is a large design change even if the code
change is small. Normal primitives can use shadowing because there is
no way to capture their execution half way through. In this case that
is not true. Shadowing does not fully hide the primitive as it may
need to be re-entered later.

This primitive will slow down sends and returns on some architectures
but probably not all. Out of order execution is great at hiding cost
behind other delays. I wouldn't be surprised if there is no cost on
either a PPC or an x86 but there will be on some chips. Any
optimisation of the common case send code will increase the
performance loss caused by this primitive. (1)

It could crash if you save an image on a VM with the primitive then
load it on a VM with the image. This will only happen if one of the
primitives is in an active context. Looking at
internalActivateNewMethod the PC will be set to it's initial value but
that will cause the loop to begin again which could be problematic too.

What happens if you step back into the primitiveApplyToFromTo method
from a debugger? So execution entered via the interpreter and used
the primitive then the debugger (or any tool that simulates bytecode
execution) re-enters the primitive method.

There is maintenance risk if the shadow implementation and the real
implementation get out of sync because the bugs may only occur
when switching from running with the primitive to running without the
primitive. This will definitely happen if you move an image to an
older VM but could also happen if you improve the primitive so it can
be re-entered as bytecode.

Also a primitive failing does not necessarily mean that it can be
replaced by the back-up code. In many cases the method code after a
primitive handles a different set of conditions. Have a look at
Object>>size. In general, no execution engine can assume that it can
ignore a primitive.

Bryce

1) Have a look at commonReturn. The simple case when a method or
block is returning directly to it's caller can be simplified. The
general case needs to handle any unwind blocks that might be walked
over while exiting which the common case will not do. Also the common
case could be coded without the loops which risk branch mispredict
when exiting.

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Hi Bryce,

reading your comments below I'm under the impression that you have not
understood the novel concept (which is, BTW old) but, of course, I may be
mistaken.

And I appreciate your comments and thoughts very much :) In the below I
offer discussing some issues off list but, I'd happily discuss all issues
here.

On Fri, 15 Sep 2006 23:44:47 +0200, you wrote:
>
> At the moment no primitives are re-entered.

primitiveApplyToFromTo is always called from the *beginning* with the same
context. No pc is maintained. No sp is maintained. Of cause both
quantities are defined for the context, but not touched in any way.

> We don't have primitive
> contexts.

This is not true, all primitives have their arguments in the context and
also leave their result there, since the Blue Book.

> Changing that is a large design change even if the code
> change is small.

There was no design change, either executeNewMethod is called which in
turn calls activateNewMethod then newActiveContext, or between
activateNewMethod and newActiveContext is primitiveApplyToFromTo called.
The flow of control is 100% pure[tm] Blue Book and untouched.

> Normal primitives can use shadowing because there is
> no way to capture their execution half way through.

No, there is no half way through. primitiveApplyToFromTo is always full
way through. It is not so that, say 50% of its code is covered by the
first invocation and the rest by the next invocation. The coverage is
always 100%.

> In this case that
> is not true. Shadowing does not fully hide the primitive as it may
> need to be re-entered later.

Absolutely not. primitiveApplyToFromTo can send, say, the first 10 values
to the block and then may fail. Thereafter its shadow can, if it wants,
send say, another 10 values to the block.

Your comment shows me that the code of primitiveApplyToFromTo is not easy
to understand. This must be my fault, we can discuss this off list to any
level you want.

> This primitive will slow down sends and returns on some architectures
> but probably not all.

On which, Bryce?

> Out of order execution is great at hiding cost
> behind other delays.

primitiveApplyToFromTo concatenates (the primitive implementation of)
Stream>>#next and BlockContext>>#value:, what is out of order with this?

> I wouldn't be surprised if there is no cost on
> either a PPC or an x86 but there will be on some chips.

Which chips, Bryce?

> Any
> optimisation of the common case send code will increase the
> performance loss caused by this primitive. (1)

Until now, you have not shown any performance loss caused by this
primitive. So what's this about?

> It could crash if you save an image on a VM with the primitive then
> load it on a VM with the image.

It cannot crash. It can start on VM-a *with* primitiveApplyToFromTo
support, then, before its block ends, the image can be written to a
snapshot, then resumed on VM-b *without* primitiveApplyToFromTo support,
and vice versa. Let's walk through this off list.

> This will only happen if one of the
> primitives is in an active context. Looking at
> internalActivateNewMethod the PC will be set to it's initial value but
> that will cause the loop to begin again which could be problematic too.

Every loop begins again because of the long jump backwards. But in
primitiveApplyToFromTo there is no jump backwards, no loop. And there is
nothing which causes the loop to begin again. As I wrote earlier, it is
(*marker*) [index < limit] whileTrue: ...
and there exists no jump back to the (*marker*) position. I repeat: there
is no jump.

> What happens if you step back into the primitiveApplyToFromTo method
> from a debugger?

The same what happens when the block returns (I mean: indistiguishable,
invariant).

> So execution entered via the interpreter and used
> the primitive then the debugger (or any tool that simulates bytecode
> execution) re-enters the primitive method.

This is supported. All the debugger must know is that it cannot step
through the primitive (easy). But it can step through the shadow and then,
when in the shadow (or in the block) one does "proceed", it is again the
same as when the block returns (indistiguishable, invariant).

> There is maintenance risk if the shadow implementation and the real
> implementation get out of sync because the bugs may only occur
> when switching from running with the primitive to running without the
> primitive.

You mean like with the shadow of any other primitive? Sure, this always
was so and will always be so. Two pieces of software which implement the
same function, but one transcedenting the other, have always this problem.

> This will definitely happen if you move an image to an
> older VM

No, see the case with VM-a and VM-b above.

> but could also happen if you improve the primitive so it can
> be re-entered as bytecode.

Huh? the primitive is not bytecode and it is not planned to make it such.

> Also a primitive failing does not necessarily mean that it can be
> replaced by the back-up code.

No, primitives must be replaceable by their shadow, or else every other
existing shadowed primitive could crash the VM.

> In many cases the method code after a
> primitive handles a different set of conditions.

Why do you want me to handle conditions in primitiveApplyToFromTo which
are not handled in the shadow. That would be a bit too much, if not crazy.

> Have a look at
> Object>>size.

Have a look at all the primitives that expect Integer indices but are
passed Floats. This is business as usual.

Back to our case here, if someone passes a Float to primitiveApplyToFromTo
then the shadow will attempt to index the receiver with a Float, like in
(#(1) at: 0.5), so what?

> In general, no execution engine can assume that it can
> ignore a primitive.

Not so fast and not so general: primitiveApplyToFromTo can be ignored by
every VM which was compiled *with* it and by every VM which was compiled
*without* it.

Thank you very, very much Bryce :-)

/Klaus

> Bryce
>

I understand the following as a general comment on the current
implementation of #commonReturn.

> 1) Have a look at commonReturn. The simple case when a method or
> block is returning directly to it's caller can be simplified. The
> general case needs to handle any unwind blocks that might be walked
> over while exiting which the common case will not do. Also the common
> case could be coded without the loops which risk branch mispredict
> when exiting.
>
>

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

What I missed until now was the "thisContext tempAt: 2 put: index + 1"
in the shadow code. I still maintain that your version is overly
clever and very likely to cause mass confusion and maintenance issues.
do: loops are something that should be fully understandable by any
Smalltalk programmer. Everyone is going to be seeing this in any walk
back that involves a do:.

Normally, arguments are immutable in Squeak. Preserving that matters
to me when reading code. Exupery doesn't care as the bytecodes are
identical. I do care, as a programmer that normal invariants are held
especially in code that I'm likely to be glancing at continually when
developing.

Now, the way to optimise an expression that uses thisContext on a
system with Exupery or without your VM patch is to remove the use of
thisContext. The shadow code is going to be much slower on systems
that do not have the VM mod than the current implementation. My policy
for use of thisContext and tempAt:put: is that neither will be
compiled by Exupery, tempAt:put: should cause Exupery to de-optimise
the context then drop back into the interpreter (it doesn't yet). I'm
trying to optimise the common case.

Re-calling a primitive from a return is close enough to re-entering
it. That is something that hasn't been done. It is a major design
change.

Your VM mod adds work to both common send and common return which
every send is going to have to execute.

Do you have any measurements to show that any realistic code is
spending enough time in do: loop overhead to justify slowing down
all sends?

On my machine, I can do an interpreted message send every 291
clocks. You are adding at least 11 instructions to the send return
sequence which would take four clocks to execute at the peak execution
rate. The worst case time is about 42 clocks which is the full latency
costs plus two branch mispredicts (at 15 clocks each). The time
estimates are based on a Athlon 64 though they will be very similar
for other CPUs modern desktop CPUs (nuclear reactors in silicon)
except the Pentium 4 where a mispredict costs 30 clocks.

The performance effects of the VM mod will depend on the architecture,
the compiler, and how well the branch predictor manages on those two
extra branches. Unfortunately, to prove there is not a speed loss, you
are really going to need to test on many architectures and compilers
under many different loads. Two branch mispredicts alone could cost
10% in send performance. Just on the x86 the costs are likely to
differ between the Pentium 3, Pentium 4, Pentium M, Intel's core, and
Athlon (where the XP may be different to the 64).

An out of order CPU may be able to hide the cost of the extra
instructions behind the current flabby send and return code. An in
order CPU will not be able do do this. So expect a greater performance
loss on slower machines such as ARMs and other chips aimed at the
hand-held and embedded market. Also the risks of a speed drop on a
Pentium-M are greater that those on an Athlon, the Pentium-M manages
to execute more instructions per clock when interpreting.

I calculated the clocks for a send from the clock speed 2.2 GHz and
the sends/sec from tinyBenchmarks.

232,515,894 bytecodes/sec; 7,563,509 sends/sec

Exupery's tiny benchmark numbers are:

1,151,856,017 bytecodes/sec; 16,731,576 sends/sec

So for Exupery a common case send costs 132 clocks. There is still
plenty of room to remove waste from that. At 300 clocks, the cost is
about 15% worst case and 1% best case without out of order execution
being able to hide costs behind other delays. At 132 clocks, the
numbers are much worse. There is a good chance that with more tuning
Exuperys sends may be reduced to about 60 clocks without
inlining. VisualWorks sends cost 30 clocks. Optimise sends and the
best optimisation will be to remove primitiveApplyToFromTo if it's
not a net loss in performance now.

I vote strongly that this patch is not included in the VM. I've used
IBM Smalltalk and enjoy working on a system where do: is easy to
understand. Please don't trade simplicity for an optimisation that
risks slowing down more code than it speeds up. A do: that is trivial
to understand and is free from magic is worth a lot.

Bryce

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Hi Bryce,

on Sat, 16 Sep 2006 14:08:09 +0200, you wrote:

> What I missed until now was the "thisContext tempAt: 2 put: index + 1"
> in the shadow code.

Some clever Squeaker will present alternatives, I'm sure ;-)

> I still maintain that your version is overly
> clever and very likely to cause mass confusion and maintenance issues.

I personally know of no software developer who ever participated in a mass
confusion, Bryce, and I can say that for the last 30 years. Your
prediction is not believable.

Would you say that
- http://en.wikipedia.org/wiki/Duff's_device
caused mass confusion? It is overly clever as well.

> do: loops are something that should be fully understandable by any
> Smalltalk programmer. Everyone is going to be seeing this in any walk
> back that involves a do:.

Yes, that is a novelty, perhaps it should be called an innovation. But I
cannot claim that I have invented it, this was done some time ago.

> Normally, arguments are immutable in Squeak.

Not all all! Pass an argument to another method which does a #become: on
the argument, Bryce. This is an illusion, we are talking about Smalltalk.

How do you handle this situation in Exupery.

primitiveApplyToFromTo for example is robust, it does not do anything when
an argument is #become:'ed behind its back to something else.

> Now, the way to optimise an expression that uses thisContext on a
> system with Exupery or without your VM patch is to remove the use of
> thisContext. The shadow code is going to be much slower on systems
> that do not have the VM mod than the current implementation.

This is the case for all pimitive v.s. shadow code comparisions. They must
be slower. Have you ever seen the opposite?

> My policy
> for use of thisContext and tempAt:put: is that neither will be
> compiled by Exupery,

But #tempAt:put: is used by debugger and friends (through clients)! Why
blame debugger that it *must* use #tempAt:put: - what would the
Smalltalker do without it, have Java? </grin>

> Re-calling a primitive from a return is close enough to re-entering
> it. That is something that hasn't been done. It is a major design
> change.

When you'd look at #commonReturn all you see is that another context is
declared to be activeContext. This is irrelevant of my change. This
happens as is since the Blue Book.

But okay you want it to be a design change, I can live with compliments ;-)

> Your VM mod adds work to both common send and common return which
> every send is going to have to execute.

Right. That's the price to pay. What do you expect when adding new
functionality to the VM, that the system runs faster? If so, you must pay
the price.

> Do you have any measurements to show that any realistic code is
> spending enough time in do: loop overhead to justify slowing down
> all sends?

No, but since you are insisting on this all the time I expect that you
post that.

> On my machine, I can do an interpreted message send every 291
> clocks. You are adding at least 11 instructions to the send return
> sequence which would take four clocks to execute at the peak execution
> rate
... yes, I agree it has a price.

> The performance effects of the VM mod will depend on the architecture,
> the compiler, and how well the branch predictor manages on those two
> extra branches. Unfortunately, to prove there is not a speed loss, you
> are really going to need to test on many architectures and compilers
> under many different loads.

Since the VM is used on so many platforms, I do not see any problem
getting this feedback.

> An out of order CPU may be able to hide the cost of the extra
> instructions behind the current flabby send and return code. An in
> order CPU will not be able do do this. So expect a greater performance
> loss on slower machines such as ARMs and other chips aimed at the
> hand-held and embedded market. Also the risks of a speed drop on a
> Pentium-M are greater that those on an Athlon, the Pentium-M manages
> to execute more instructions per clock when interpreting.

Bryce, aren't you overexaggerating when you blame young, innocent
primitiveApplyToFromTo for performance loss out of all these technical
reasons. A simple ABC analysis reveals that, A bytecode routines, B
interpreter primitives and C message sends have frequency A>>B>>C with >>
the usual much grater than.

So if you want to save VM's performance, get rid of performance lost in
bytecode routines, thereafter in interpreter primitives and thereafter in
message sends - not first C then B then A. I think that this is what you
aim for with Exupery?

> I calculated the clocks for a send from the clock speed 2.2 GHz and
> the sends/sec from tinyBenchmarks.
>
> 232,515,894 bytecodes/sec; 7,563,509 sends/sec
>
> Exupery's tiny benchmark numbers are:
>
> 1,151,856,017 bytecodes/sec; 16,731,576 sends/sec

Fascinating. Is this with or without primitiveApplyToFromTo compiled into
the VM.

> So for Exupery a common case send costs 132 clocks. There is still
> plenty of room to remove waste from that. At 300 clocks, the cost is
> about 15% worst case and 1% best case without out of order execution
> being able to hide costs behind other delays. At 132 clocks, the
> numbers are much worse. There is a good chance that with more tuning
> Exuperys sends may be reduced to about 60 clocks without
> inlining. VisualWorks sends cost 30 clocks. Optimise sends and the
> best optimisation will be to remove primitiveApplyToFromTo if it's
> not a net loss in performance now.

Will be to remove, Bryce?

> I vote strongly that this patch is not included in the VM.

This was forseable ;-)

Thanks again, Bryce, it was a pleasure.

/Klaus

> Bryce
>
>

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Klaus D. Witzel writes:
> Hi Bryce,
>
> on Sat, 16 Sep 2006 14:08:09 +0200, you wrote:
>
> > What I missed until now was the "thisContext tempAt: 2 put: index + 1"
> > in the shadow code.
>
> Some clever Squeaker will present alternatives, I'm sure ;-)

The issue is that arguments are immutable.

The performance issues from the #tempAt:put: could be got around by
using a custom bytecode compiler that allows assignment to
arguments. But that does nothing to make your implementation simpler
to understand or verify.

> > I still maintain that your version is overly
> > clever and very likely to cause mass confusion and maintenance issues.
>
> I personally know of no software developer who ever participated in a mass
> confusion, Bryce, and I can say that for the last 30 years. Your
> prediction is not believable.
>
> Would you say that
> - http://en.wikipedia.org/wiki/Duff's_device
> caused mass confusion? It is overly clever as well.

Yes it is overly clever, for many uses. And I think Duff said so
originally when he announced it.

In this case IBM's version of this primitive did cause me a large
amount of confusion many years ago. It's not until this argument that
I'm starting to understand what they did and why.

> > do: loops are something that should be fully understandable by any
> > Smalltalk programmer. Everyone is going to be seeing this in any walk
> > back that involves a do:.
>
> Yes, that is a novelty, perhaps it should be called an innovation. But I
> cannot claim that I have invented it, this was done some time ago.
>
> > Normally, arguments are immutable in Squeak.
>
> Not all all! Pass an argument to another method which does a #become: on
> the argument, Bryce. This is an illusion, we are talking about Smalltalk.
>
> How do you handle this situation in Exupery.
>
> primitiveApplyToFromTo for example is robust, it does not do anything when
> an argument is #become:'ed behind its back to something else.

Arguments are immutable. Try compiling "selector: a a := 1". We allow
deep access and the possibility to extend the language. This is a good
thing. However you have stepped into language modification from normal
programming. Having the power to do that easily is a wonderful thing,
using it is almost always mistaken.

Exupery needs to bail out if you modify the context too much. It makes
no assumptions when execution is outside of the method except that
neither the PC nor stack pointer are touched. When it is executing it
owns the context. Exupery is safe because it can always drop back to
the interpreter.

The issue with maintainability is how likely the implementation is to
remain correct and how hard it is to verify that it is correct. It was
only this morning that I was tolerably confident that it is correct
ignoring interrupts. I have not done the work required to have any
confidence that it is correct if an interrupt occurs, it may be.

> > Now, the way to optimise an expression that uses thisContext on a
> > system with Exupery or without your VM patch is to remove the use of
> > thisContext. The shadow code is going to be much slower on systems
> > that do not have the VM mod than the current implementation.
>
> This is the case for all pimitive v.s. shadow code comparisions. They must
> be slower. Have you ever seen the opposite?

Your shadow do: will be much slower than a regular do:. It will
definitely be much slower after compilation.

Now, there is no guarantee that the code after a primitive is shadow
code. It is not always. Sometimes it handles cases that the primitive
doesn't. The only way to know for sure is to study both carefully and
fully understand what both do in all cases where they're used.

Your modified VM will be slower executing sends on some architectures
and some compilers executing some loads. It does more work. From my
calculations the slow down should be between 1% and 15%. It is
possible that the magic of modern hardware will hide the cost in some
cases, but not all. You are doing more work in the common case to
speed up a special case.

> > My policy
> > for use of thisContext and tempAt:put: is that neither will be
> > compiled by Exupery,
>
> But #tempAt:put: is used by debugger and friends (through clients)! Why
> blame debugger that it *must* use #tempAt:put: - what would the
> Smalltalker do without it, have Java? </grin>

I never said we should remove #tempAt:put:. It is a fine and useful
tool. I just object to it being abused to break a language rule in
code that will be read by many people. I also object to it because it
is much slower than the equivalent byte-codes.

Your shadow code will allow your implementation to run but it will
slow down images when running on VMs without your primitive.

> > Re-calling a primitive from a return is close enough to re-entering
> > it. That is something that hasn't been done. It is a major design
> > change.
>
> When you'd look at #commonReturn all you see is that another context is
> declared to be activeContext. This is irrelevant of my change. This
> happens as is since the Blue Book.

What does the blue book have to do with this?

> But okay you want it to be a design change, I can live with compliments ;-)
>
> > Your VM mod adds work to both common send and common return which
> > every send is going to have to execute.
>
> Right. That's the price to pay. What do you expect when adding new
> functionality to the VM, that the system runs faster? If so, you must pay
> the price.

You are not adding new functionality. This change is a purely an
optimisation. Therefore if it must speed up the system, not just
part of it.

> > Do you have any measurements to show that any realistic code is
> > spending enough time in do: loop overhead to justify slowing down
> > all sends?
>
> No, but since you are insisting on this all the time I expect that you
> post that.

I'm not proposing changing the VM or do:. I'm arguing for the status
quo. The burden of proof is on you. You're also proposing an
optimisation with high maintenance costs that replaces simple code
with very clever code, that adds cost to message sends which are a
very common operation. Such changes should be considered guilty until
proven innocent beyond any doubt.

> > On my machine, I can do an interpreted message send every 291
> > clocks. You are adding at least 11 instructions to the send return
> > sequence which would take four clocks to execute at the peak execution
> > rate
> ... yes, I agree it has a price.
>
> > The performance effects of the VM mod will depend on the architecture,
> > the compiler, and how well the branch predictor manages on those two
> > extra branches. Unfortunately, to prove there is not a speed loss, you
> > are really going to need to test on many architectures and compilers
> > under many different loads.
>
> Since the VM is used on so many platforms, I do not see any problem
> getting this feedback.

The VM is also a mature slow moving piece of software that many people
rely on. VM bugs are painful. VM changes are a very conservative
thing. It will take several years for a new VM to enter normal use
after it's been released.

What are you proposing here? We release this change then discover if
it's good or not afterwards? That we add yet another optimisation that
may may have no noticeable benefit which adds a high maintenance risk?
For this change it's worse, it adds work to message sends. All high
level code has to bear that.

We have too many optimisations that provide negligible gain already.
The ifNotNil: bug earlier was a perfect example. There were two
implementations, they got out of sync. And in a normal system both
will be used. And while we're at it class should be a standard
primitive not a bytecode so that people can override it if they wish.
The VM change for class is tiny, just reimplement the bytecode to do
a send then let that execute the primitive.

> > An out of order CPU may be able to hide the cost of the extra
> > instructions behind the current flabby send and return code. An in
> > order CPU will not be able do do this. So expect a greater performance
> > loss on slower machines such as ARMs and other chips aimed at the
> > hand-held and embedded market. Also the risks of a speed drop on a
> > Pentium-M are greater that those on an Athlon, the Pentium-M manages
> > to execute more instructions per clock when interpreting.
>
> Bryce, aren't you overexaggerating when you blame young, innocent
> primitiveApplyToFromTo for performance loss out of all these technical
> reasons. A simple ABC analysis reveals that, A bytecode routines, B
> interpreter primitives and C message sends have frequency A>>B>>C with >>
> the usual much grater than.

Where are the numbers? Where is the analysis?

You are not optimising primitive execution. You are optimising
do:. You can only gain noticeably if most of the time is spent in in
do: overhead not in the work that is done. And not waiting on the
memory.

primitiveApplyToFromTo is more subtle than most of the primitives in
the VM. The only thing that I can think of that is more subtle is
exception handling. That provides useful
functionality. primitiveApplyToFromTo does not. All
primitiveApplyToFromTo can provide is performance. So all it's costs,
including performance costs must be justified by performance
arguments.

> So if you want to save VM's performance, get rid of performance lost in
> bytecode routines, thereafter in interpreter primitives and thereafter in
> message sends - not first C then B then A. I think that this is what you
> aim for with Exupery?
>
> > I calculated the clocks for a send from the clock speed 2.2 GHz and
> > the sends/sec from tinyBenchmarks.
> >
> > 232,515,894 bytecodes/sec; 7,563,509 sends/sec
> >
> > Exupery's tiny benchmark numbers are:
> >
> > 1,151,856,017 bytecodes/sec; 16,731,576 sends/sec
>
> Fascinating. Is this with or without primitiveApplyToFromTo compiled into
> the VM.

Without the primitiveApplyToFromTo applied.

> > So for Exupery a common case send costs 132 clocks. There is still
> > plenty of room to remove waste from that. At 300 clocks, the cost is
> > about 15% worst case and 1% best case without out of order execution
> > being able to hide costs behind other delays. At 132 clocks, the
> > numbers are much worse. There is a good chance that with more tuning
> > Exuperys sends may be reduced to about 60 clocks without
> > inlining. VisualWorks sends cost 30 clocks. Optimise sends and the
> > best optimisation will be to remove primitiveApplyToFromTo if it's
> > not a net loss in performance now.
>
> Will be to remove, Bryce?

I don't understand what you're asking here.

Personally, I'd solve the original problem of by either leaving the
current implementation of occurrencesOf: or just using count: as it
stands. I'm concerned about making the core more complex both in the
image and in the VM for no real gains and possibly a loss in
performance.

Bryce

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Hi Bryce,

I hope we both are now tired enough from your comparision of Exupery to
the proposed primitiveApplyToFromTo.

You have not shown a convincing argument, in the sense that any other
primitive has to be (and always was) treated the same way by the VM and
undergoes the same performance penalties under the various CPUs. When we
would believe you, only primitiveApplyToFromTo would slow down the CPU,
the existing primitives wouldn't do that.

Of course you are absolutely correct by pointing with your finger to the
additional instructions performed in #commonSend and #commonReturn and to
the apparent incompatibility of proposed primitiveApplyToFromTo with
Exupery's compiling technique (I took your words for granted, I'm not
working with Exupery but with the standard VMMaker classes to which
primitiveApplyToFromTo is not incompatible).

Your arguing for the status quo is way too conservative for a living
community, there was a reason why people asked for speeding up enumeration
of collections and also why it was suggested to consider the IBM approach
#apply:from:to:.

Keeping the performance of #do: and friends at the unoptimized level was
just the opposite of what the initiators of the original discussion where
talking about, the question was for better solutions.

Honestly, I wish you good luck with Exupery. Always keep cool 8-)

I for my side have experienced the beauty, power and elegance of
#commonSend and #commonReturn, the two most underestimated routines in a
Smalltalk message sending VM :)

/Klaus

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Klaus D. Witzel writes:
> Hi Bryce,
>
> I hope we both are now tired enough from your comparision of Exupery to
> the proposed primitiveApplyToFromTo.

I am tired of this argument but I would appreciate it if you would
try respond to my arguments.

My main issue is your change makes the system more complex in a
visible way to everybody. Both normal programmers looking at their
calls to do: in walk backs and to people working in the VM.

I am not comparing your primitiveApplyToFromTo primitive to
Exupery. 1% to 15% extra cost to sends was calculated against the
interpreter now, not Exupery now, and definitely not what I think it
could be. I did not provide the numbers for how much those additions
to common send would provide for Exupery now, or potentially in the
future but they should be easy to work out (hint multiply by 2 to 4).
I even told you what it would take to avoid both Exupery and an
unpatched VM from incurring a performance cost when executing your
shadow over a standard shadow.

That cost is a language change. The code change is trivial, just find
the check that stops the byte code compiler from compiling an
assignment to an argument and delete it. The restriction against
changing an argument is not in either the VM or in Exupery. It is a
language choice implemented at a higher level.

Yes, I do end up chasing bugs across both commonSend and commonReturn
code too regularly. Yes, I have found a few of my own bugs due to
subtleties in it involving both GCs and Squeak interrupts. The
additions to commonSend and commonReturn adds extra complexity to an
already dangerously complex area.

> You have not shown a convincing argument, in the sense that any other
> primitive has to be (and always was) treated the same way by the VM and
> undergoes the same performance penalties under the various CPUs. When we
> would believe you, only primitiveApplyToFromTo would slow down the CPU,
> the existing primitives wouldn't do that.

Do you feel that simplicity and personal mastery are not important for
a system like Squeak? Your change reduces both.

I am asking you to justify loss of clarity and simplicity inside the
the image that will be seen to normal non VM programmers.

I am also asking you to provide a case for an optimisation. An
optimisation that adds overhead to a critical part of the system:
message sends and all returns. To justify the optimisation you need to
prove that it gains more than it looses.

Now, may be, you consider the request that you provide a proof that your
optimisation does not lead to a loss of performance for real programs
as unreasonable. I however feel that optimisations must lead to a net
improvement that is worth more than their ongoing costs. Proving that
an optimisation will not lead to a net loss in real use, without
Exupery, can not be an unreasonable request.

I am not asking you prove that this primitive is worth the cost of
development. Merely that it provides enough of a speed improvement to
cover the costs of living with it.

> Of course you are absolutely correct by pointing with your finger to the
> additional instructions performed in #commonSend and #commonReturn and to
> the apparent incompatibility of proposed primitiveApplyToFromTo with
> Exupery's compiling technique (I took your words for granted, I'm not
> working with Exupery but with the standard VMMaker classes to which
> primitiveApplyToFromTo is not incompatible).

Exupery will survive in a system with primitiveApplyToFromTo. It will
just not compile your shadow method. But this is irrelevant for why
this primitive is inappropriate to be added to the VM.

The numbers I chose to use were for interpreted code, not native
compiled code. OK, I did give Exupery's numbers as well and the
figures for VisualWorks.

> Your arguing for the status quo is way too conservative for a living
> community, there was a reason why people asked for speeding up enumeration
> of collections and also why it was suggested to consider the IBM approach
> #apply:from:to:.

It was an idle discussion asking for reasons. A newbie rightfully
wanted to know why it was implemented as it was. At the moment we
don't know. We don't know that a count: implementation of
occurencesOf: would be too slow.

There are very good reasons for being conservative with the VM. Many
people depend on it who are not VM hackers. It has an awkward release
schedule. VM bugs are a right pain.

Bryce

P.S. And yes, I could outline where Exupery is weak for executing
do:. But improving Exupery's do performance does not look like it will
improve either of the large benchmarks I'm working with. Improving #do
performance does not look currently like it will improve Exupery's
practicality.

P.P.S If non-VM hackers would like to contribute either to Exupery's
development or to Squeak's performance then working on developing a
better benchmark suite would be useful. Both benchmarks and the
argument for why they matter are important.

Bert Freudenberg

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Bryce Kampjes

Am 16.09.2006 um 14:48 schrieb Bryce Kampjes:

> I'm not proposing changing the VM or do:. I'm arguing for the status
> quo. The burden of proof is on you. You're also proposing an
> optimisation with high maintenance costs that replaces simple code
> with very clever code, that adds cost to message sends which are a
> very common operation. Such changes should be considered guilty until
> proven innocent beyond any doubt.

+1

- Bert -

Yanni Chiu

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

Klaus D. Witzel wrote:
> Your arguing for the status quo is way too conservative for a living
> community, there was a reason why people asked for speeding up
> enumeration of collections and also why it was suggested to consider
> the IBM approach #apply:from:to:.

I don't think it's conservative at all, just practical.
Changing the VM, by adding a new primitive, will affect
all VM development from that point forward. IMHO, such
a change requires a rock solid case to support it. At this
point is seems hard to justify the complexity and fragility
(i.e. depending of primitive failure code running) of the
proposed performance optimization. Introducing a new VM
primitive has got to be way low down on the list of ways
to meet a performance requirement.

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Bryce Kampjes

Hi Bryce,

earlier you wrote:

> Your shadow code will allow your implementation to run but it
> will slow down images when running on VMs without your primitive.

This is a good reason for objection, I agree.

On Sun, 17 Sep 2006 00:10:00 +0200, you wrote:
> Klaus D. Witzel writes:
> > I hope we both are now tired enough from your comparision of Exupery
> to
> > the proposed primitiveApplyToFromTo.
>
> I am tired of this argument but I would appreciate it if you would
> try respond to my arguments.

I have neither the time nor the patience to go into your Exupery realm
when "just" discussing the suggested primitiveApplyToFromTo. There is not
that much experience with Exupery.

In particular, I'm not interested to discuss in this thread that Exupery
won't support use of thisContext and peeking and poking of contexts.
Clients for such objections are, for example
-
http://www.shaffer-consulting.com/david/Seaside/dandelionOutput/Seaside-Continuations/Continuation.html
Don't misunderstand, I just do not want to discuss this particular
objections of yours in this thread.

Another example, suddenly #class "should be a standard primitive not a
bytecode". I do not want to discuss this in this thread. Even if it
continues with
> so that people can override it [#class] if they wish.
> The VM change for class is tiny, just reimplement the bytecode to do
> a send then let that execute the primitive.
I do not want to argue for or against the performance loss of such a
change, nor for or against
> [th]is your change makes the system more complex in a
> visible way to everybody

My last two examples of what I do not want to discuss in this thread are
> Also the common case [of existing #commonReturn] could be coded
> without the loops which risk branch mispredict when exiting.

and

> That cost is a language change.

-----------

Sorry for no better news :(

/Klaus

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Yanni Chiu

Hi Yanni,

on Sun, 17 Sep 2006 04:48:12 +0200, you wrote:

> Klaus D. Witzel wrote:
>> Your arguing for the status quo is way too conservative for a living
>> community, there was a reason why people asked for speeding up
>> enumeration of collections and also why it was suggested to consider
>> the IBM approach #apply:from:to:.
>
> I don't think it's conservative at all, just practical.
> Changing the VM, by adding a new primitive, will affect
> all VM development from that point forward. IMHO, such
> a change requires a rock solid case to support it. At this
> point is seems hard to justify the complexity and fragility
> (i.e. depending of primitive failure code running) of the
> proposed performance optimization. Introducing a new VM
> primitive has got to be way low down on the list of ways
> to meet a performance requirement.

I fully disagree :) I'm used to plan software change for years to come,
usually the next 3-5 releases (equiv. 3-5 years). Now, there will be a
time when older VM's will no longer run the then current .image, for
example when 4.0 comes out. Not making use of the time in between is,
literally speaking, the often cited waste of time, viewed from a more
pragmatic perspective.

Poor young primitiveApplyToFromTo cannot be adapted to in a minute. There
is much work to be done and much feedback to be collected from all the
various platforms, before it can be concluded that "this was a good idea,
it performs as expected". This has to happen in the time before 4.0, for
sure.

Not doing anything but waiting for Godot has nothing to do with
innovation, only with stagnation, IMO.

Thanks for your comment.

/Klaus

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

Let me summarise my arguments as you have obviously not understood
them or refuse to respond to them.

1) Your change complicates the system for everyone. Adding that
complexity will make it harder to make any future changes in the
areas effected. One of those areas is the VMs message sending
code which is critical.

2) Your changes make the system more complex to all Squeak
programmers when they are just doing normal development. Having
nice readable walk backs is important.

3) Your VM changes add costs to message sends. Message
sends are more common than do: loops. Thus with both your
image changes and your VM changes the system may be slower.
This is an optimisation, to have any value your change must
speed up the system not slow it down. Based on my analysis the
costs will be in the range of 1% to 15% of send speed.

4) Your image changes will slow #do: down for all VMs that do
not have your image changes. I have told you how to avoid the
performance costs here but the problem is you're trying to side
step a deliberate language restriction against changing arguments.

You are proposing an optimisation that you refuse to demonstrate does
not slow the system down. To have a case you need to demonstrate that
the gains to #do: will be greater than the losses on message sends.

Optimisations must provide enough speed improvement to justify the
cost of living with them and preferably the cost of developing them.

No-one has yet demonstrated that there is a practical performance
problem with either our current occurencesOf: or even an
implementation that used count:.

Bryce

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Hi Bryce,

on Sun, 17 Sep 2006 12:12:36 +0200, you wrote:
>
> Let me summarise my arguments as you have obviously not understood
> them or refuse to respond to them.

Hey, this is it, I appreciate this your summary :)

> 1) Your change complicates the system for everyone. Adding that
> complexity will make it harder to make any future changes in the
> areas effected. One of those areas is the VMs message sending
> code which is critical.

Agreed, this is the price to pay. Perhaps the components of send/return,
especially #executeNewMethod with its many #ifTrue:ifFalse: on
primitiveIndex and return's kangaroo, could use a brush up ;-)

> 2) Your changes make the system more complex to all Squeak
> programmers when they are just doing normal development. Having
> nice readable walk backs is important.

Agreed, the debugger needs change.

> 3) Your VM changes add costs to message sends. Message
> sends are more common than do: loops. Thus with both your
> image changes and your VM changes the system may be slower.

Agreed, would you say that we have as yet not considered amortization.

> This is an optimisation, to have any value your change must
> speed up the system not slow it down. Based on my analysis the
> costs will be in the range of 1% to 15% of send speed.

I posted a comparision in the first message. But of course it didn't
include the cost of all other message sends. I appreciate your analysis 1%
to 15% but this is just a static, dry run view. I do not expect the cost
to come close to 1% on average. Amortization is required.

> 4) Your image changes will slow #do: down for all VMs that do
> not have your image changes.

I do not expect primitiveApplyToFromTo's possible before 4.0 and I think
that older VM's will not run 4.x images out of other reasons.

> I have told you how to avoid the
> performance costs here but the problem is you're trying to side
> step a deliberate language restriction against changing arguments.

You perhaps misunderstood. In Smalltalk every argument value can be
changed behind your back, whether you like it or not. I do not see this as
language restriction, that's correct. Everything is an object.

> You are proposing an optimisation that you refuse to demonstrate does
> not slow the system down.

This has nothing to do with unwillingness. I wrote earlier
> It would be interesting to compare the performance figures for platforms
> other than win32.
And I consider how to measure performance cost of #commonSend and
#commonReturn.

> To have a case you need to demonstrate that
> the gains to #do: will be greater than the losses on message sends.

Sure, that's the idea.

> Optimisations must provide enough speed improvement to justify the
> cost of living with them and preferably the cost of developing them.

> No-one has yet demonstrated that there is a practical performance
> problem with either our current occurencesOf: or even an
> implementation that used count:.

This would be ignoring the simple benchmarks of Jon's with my follow up
and the figures I posted in this thread.

> Bryce

Heh, t'was the first dialog without the E...y word (since our first
meeting in the #squeak channel ;-)

/Klaus

Bert Freudenberg

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

Am 17.09.2006 um 03:37 schrieb Klaus D. Witzel:

> Not doing anything but waiting for Godot has nothing to do with
> innovation, only with stagnation, IMO.

Keep on discussing, implementing, benchmarking, etc. :) This might
even convince people that rather want to err on the conservative
side. For example, I'd strongly oppose anything that reduces general
send performance, whereas requiring a new VM for a new image to run
at full speed is fine by me.

- Bert -

Klaus D. Witzel

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

Hi Bert,

on Sun, 17 Sep 2006 16:17:10 +0200, you wrote:
> Am 17.09.2006 um 03:37 schrieb Klaus D. Witzel:
>
>> Not doing anything but waiting for Godot has nothing to do with
>> innovation, only with stagnation, IMO.
>
> Keep on discussing, implementing, benchmarking, etc. :) This might even
> convince people that rather want to err on the conservative side.

:)

> For example, I'd strongly oppose anything that reduces general send
> performance,

I think about moving the glue (in #commonSend, #commonReturn) out of the
way, to a place where performance cannot not suffer, where the price is
already payed: 0% .

> whereas requiring a new VM for a new image to run at full speed is fine
> by me.

That's perhaps the best way to distribute it, as an optional .mcz package
(the user just recompiles the VM), together with the changes to the
Collection hierarchy. So the burden is not on the VM maintainers (and
other maintainers). Applying kiss.

/Klaus

> - Bert -

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

Klaus D. Witzel writes:
>
> Heh, t'was the first dialog without the E...y word (since our first
> meeting in the #squeak channel ;-)

Exupery is not a swear word. If you're planning for something to be
added with a 4.0 then Exupery should definitely be considered. It does
exist. It does beat VisualWorks for the bytecode benchmark. It is
probably as fast as Strongtalk for bytecode performance as well now
(based on comparing the numbers Dan gave with the ones I got. Not
apples to apples on the same hardware). Exupery is relevant if we're
talking about Squeak's performance over the next few years. Exupery is
Squeak's and Smalltalk's best current hope of beating both Strongtalk
and Selfs pioneering work on high performance implementation's.

The cost to Exupery should not be ignored if the community does want
Exupery. I have not argued against your change because of this cost
though I to object to your statement earlier that Exupery should be
ignored for now. You ask that this change be given a chance. I have
invested over 2 days into analysing it and discussing it. You then
claim that something I have worked on for about 4 years should be
ignored is not pleasant. My investment in your change is not that much
less than yours.

I have not argued that your change should be discounted solely because
it will negatively effect performance when running with Exupery. I've
even told you how the shadow can be implemented so it will not
negatively impact performance on either the interpreter or with
Exupery. Just use bytecodes to assign to the argument not tempAt:put:,
neither Exupery nor the interpreter know the difference between an
argument and a temporary. This distinction, while a language one, is
made at a higher level.

If you want to discuss where there are opportunities to speed up
Squeak that will not mess up the language or it's implementation.

Anthony Hannan's send performance work was rejected even though it
did speed up sends and came with proper block closures. Proper block
closures are something we really should have.

My work on Exupery provided me with the knowledge, and also the
experience of evaluating optimisations but it is my experience with
GemStone, VisualWorks, and IBM Smalltalk that provide my dislike of
the costs of living with such optimisations and cleverness.

Bryce

Bryce Kampjes

Re: [Enh][VM] primitiveApplyToFromTo for the heart of the enumeration of collections?

In reply to this post by Klaus D. Witzel

Klaus D. Witzel writes:

> > whereas requiring a new VM for a new image to run at full speed is fine
> > by me.
>
> That's perhaps the best way to distribute it, as an optional .mcz package
> (the user just recompiles the VM), together with the changes to the
> Collection hierarchy. So the burden is not on the VM maintainers (and
> other maintainers). Applying kiss.

Distributing as an optional .mcz package is sensible.

Bryce

12345