Floating point performance again

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Floating point performance again

David Faught
Andreas Raab wrote:
>David Faught wrote:
>> I was expecting good things as a result of this, but was rather
>> disappointed.  The before and after tally results are below.  They
>> show that the B3DVector3(FloatArray) *, -, and + operation times went
>> away (as expected) with pretty big increases in the primitives (this
>> is just shifted from the original operations) and
>> B3DVector3Array>>at:put: times, which was not expected.  What
>> happened, especially with the at:put: times?

>Two possibilities: First, it may be a sampling error since you are only
>using 16 msecs intervals which is extremely coarse and should not be
>used for such a quantitative comparison. Get down to 1 msec instead.
>Second, I have occasionally seen primitive tallies to be assigned to
>other than their containing methods. I'm not sure why this happens but
>the only way to find out for sure is to convert all of the primitives
>into "real sends" (e.g., put extra methods in which call the primitives).

I tried a number of tallies with a 1 msec interval before and after
this optimization with approximately the same results as before.  I'm
guessing that your second possibility is closer to what happens, and
that the before * got folded into the after primitives and the before
+ and - went into the after at:put:.

>> I could see a shift like this in the percentages, but the actual
>> measured times went way up too, with the overall total time being not
>> very much less for the "optimized" version.  Any ideas?

>As an overall, the difference isn't too surprising. Assuming this was
>running with the same parameters, you get some six seconds overall
>speedup which is a roughly 12% speedup. That's not bad and you should
>continue along those lines (like actually measuring the "fast square
>root" since I'm not convinced that it's either correct or faster than a
>straightforward sqrt).

With the 1 msec sample interval and some "background noise"
stabilized, I pretty consistently got about 13% savings from this
little experiment, not nearly what I was hoping for.  I'm again not
sure that doing the core of this process as a plugin will yield the
200% to 300% or better result that I desire.

The fast square root approximation is from Thomas Jakobsen's "Advanced
Character Physics" paper (which this whole simulation is based on),
which is aimed squarely at speed and stability for game simulations,
not particularly at accuracy.  Based on his paper, I have so far been
unsuccessful at converting this calculation back to use sqrt for
comparison.

This little experiment was just a preliminary step in this project to
see what the potential was, and I'm probably (someday) just going on
to the next step in the project, which is a more general rigid body
physics package.  I still like Thomas Jakobsen's approach to this for
its speed and stability.

Thanks for everyone's comments.  Further observations or suggestions
are welcome anytime.