Smalltalk - Re: Cog VM -- Thanks and Performance / Optimization Questions

Smalltalk › Pharo › Pharo Smalltalk Developers

Re: Cog VM -- Thanks and Performance / Optimization Questions

Posted by Stéphane Ducasse on Feb 17, 2011; 5:20pm
URL: https://forum.world.st/Cog-VM-Thanks-and-Performance-Optimization-Questions-tp3310828p3311213.html

Hi john

have a look MessageNode class side methods you will see the list of messages that are inlined.

Stef

On Feb 17, 2011, at 3:21 PM, John B Thiel wrote:

> Cog VM -- Thanks and Performance / Optimization Questions
>
>
> To Everyone, thanks for your great work on Pharo and Squeak, and to
> Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
> for the Squeak VM Cog and its precursors, which I was keenly
> anticipating for a decade or so, and is really going into stride with
> the latest builds.
>
> I like to code with awareness of performance issues. Can you tell or
> point me to some performance and efficiency tips for Cog and the
> Squeak compiler -- detail on which methods are inlined, best among
> alternatives, etc. For example, I understand #to:do: is inlined --
> what about #to:do:by: and #timesRepeat and #repeat ? Basically, I
> would like to read a full overview of which core methods are specially
> optimized (or planned).
>
> I know about the list of NoLookup primitives, as per Object
> class>>howToModifyPrimitives, supposing that is still valid?
>
> What do you think is a reasonable speed factor for number-crunching
> Squeak code vs C ? I am seeing about 20x slower in the semi-large
> scale, which surprised me a bit because I got about 10x on smaller
> tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
> That range, 3x tiny tight loop, to 20x for general multi-class
> computation, seems a bit wide -- is it about expected?
>
> My profiling does not reveal any hotspots, as such -- it's basically
> 2, 3, 5% scattered around, so I envision this is just the general
> vm/jit overhead as you scale up -- referencing distant objects, slots,
> dispatch lookups, more cache misses, etc. But maybe I am generally
> using some backwater loop/control methods, techniques, etc. that could
> be tuned up. e.g. I seem to recall a trace at some point showing
> #timesRepeat taking 10% of the time (?!). Also, I recall reading
> about an anomaly with BlockClosures -- something like being rebuilt
> every time thru the loop - has that been fixed? Any other gotchas to
> watch for currently?
>
> (Also, any notoriously slow subsystems? For example, Transcript
> writing is glacial.)
>
> The Squeak bytecode compiler looks fairly straightforward and
> non-optimizing - just statement by statement translation. So it
> misses e.g. chances to store and reuse, instead of pop, etc. I see
> lots of redundant sequences emitted. Are those kind of things now
> optimized out by Cog, or would tighter bytecode be another potential
> optimization path. (Is that what the Opal project is targetting?)
>
> -- jbthiel
>