Hi Paolo,
on the hype of the micro benchmarking I started to look at GST (mostly to understand the VM and not so much in making it faster). It appears to me that when defining COMMON/UNCOMMON to nothing we get a speedup on the tinyBenchmark. So this means that either some of the COMMON/UNCOMMOM are placed incorrectly, or that the tinyBenchmark is not what we are optimizing for and that the COMMON/UNCOMMON are better for some workloads. Would you be interested in knowing from where this difference is coming? I could build with gprof and then use lcov and their branch visualization. holger Before: 198180762 bytecodes/sec; 6070565 sends/sec After: 220286113 bytecodes/sec; 6342298 sends/sec PS: stable-3.2 as of yesterday, on a Fedora14/i686 system. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 02/12/2011 01:28 PM, Holger Hans Peter Freyther wrote:
> Hi Paolo, > > Would you be interested in knowing from where this difference is coming? I > could build with gprof and then use lcov and their branch visualization. Hi again, the next speed up is coming from inlining the unwind_context method. I also played with fastcall for oop_put_spec which gives a small change. I have not looked at the size of libgst before and after inlining. holger _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Holger Freyther
On 02/12/2011 01:28 PM, Holger Hans Peter Freyther wrote:
> Hi Paolo, > on the hype of the micro benchmarking I started to look at GST (mostly to > understand the VM and not so much in making it faster). It appears to me that > when defining COMMON/UNCOMMON to nothing we get a speedup on the > tinyBenchmark. So this means that either some of the COMMON/UNCOMMOM are > placed incorrectly, or that the tinyBenchmark is not what we are optimizing > for and that the COMMON/UNCOMMON are better for some workloads. > > Would you be interested in knowing from where this difference is coming? I > could build with gprof and then use lcov and their branch visualization. > > > holger > > > Before: > 198180762 bytecodes/sec; 6070565 sends/sec > > After: > 220286113 bytecodes/sec; 6342298 sends/sec Using master (not a big difference) I'm seeing a marked decrease in performance from removing COMMON/UNCOMMON only from libgst/interp.c: without => 383233532 bytecodes/sec; 10456973 sends/sec with => 309693028 bytecodes/sec; 9462521 sends/sec Adding always_inline to unwind_context is faster in the sends benchmark (~5%) and doesn't affect the bytecodes benchmark (which hardly execute those bytecodes). I'm applying this: diff --git a/libgst/interp.c b/libgst/interp.c index 88080b3..f2be4e2 100644 --- a/libgst/interp.c +++ b/libgst/interp.c @@ -472,7 +472,7 @@ static inline void prepare_context (gst_context_part context, /* Return from the current context and restore the virtual machine's status (ip, sp, _gst_this_method, _gst_self, ...). */ -static void unwind_context (void); +static void __attribute__ ((__always_inline__)) unwind_context (void); /* Check whether it is true that sending SENDSELECTOR to RECEIVER accepts NUMARGS arguments. Note that the RECEIVER is only used to (F14, 32-bit on 64-bit kernel). Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |