good questions.
On Thu, Feb 17, 2011 at 6:21 AM, John B Thiel
<[hidden email]> wrote:
Cog VM -- Thanks and Performance / Optimization Questions
To Everyone, thanks for your great work on Pharo and Squeak, and to
Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
for the Squeak VM Cog and its precursors, which I was keenly
anticipating for a decade or so, and is really going into stride with
the latest builds.
I like to code with awareness of performance issues. Can you tell or
point me to some performance and efficiency tips for Cog and the
Squeak compiler -- detail on which methods are inlined, best among
alternatives, etc. For example, I understand #to:do: is inlined --
what about #to:do:by: and #timesRepeat and #repeat ? Basically, I
would like to read a full overview of which core methods are specially
optimized (or planned).
The bytecode compiler inlines a set of selectors if the arguments are suitable (typically literal blocks). The standard compiler's list is MessageNode classPool at: #MacroSelectors, e.g.
#ifTrue: #ifFalse: #ifTrue:ifFalse: #ifFalse:ifTrue: #and: #or: #whileFalse: #whileTrue: #whileFalse #whileTrue #to:do: #to:by:do: #caseOf: #caseOf:otherwise: #ifNil: #ifNotNil: #ifNil:ifNotNil: #ifNotNil:ifNil:
Note that Nicolas Cellier has just added support for inlining repeat and timesRepeat in Squeak trunk.
I know about the list of NoLookup primitives, as per Object
class>>howToModifyPrimitives, supposing that is still valid?
Not for Cog. While #== and #class are inlined all other primitives are looked up.
What do you think is a reasonable speed factor for number-crunching
Squeak code vs C ? I am seeing about 20x slower in the semi-large
scale, which surprised me a bit because I got about 10x on smaller
tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
That range, 3x tiny tight loop, to 20x for general multi-class
computation, seems a bit wide -- is it about expected?
Are you saying that you have a macro benchmark that is 20 times faster in C than in Cog? Cog, while faster than the interpreter, is still a non-inlining, non-globally-optimizing system and so performance is certainly to be expected to be worse than C. But 20x sounds a little high so your benchmark could be useful. If you can post this please do.
The current state of Cog is that the new code generator gives a significant speed-up but that the object model and garbage collector remain substantially the same as the Squeak interpreter. The GC is slow and badly needs replacing. The object model is both slow, especially for class access, which slows down all sends a little, and over-complex, which means that several performance-critical primitives have yet to be implemented in machine-code, especially at:put:, basicNew, basicNew:, and closure creation, all of which currently require expensive calls into C instead of using inline machine code. I would expect that improving all these could add at least another factor of 33%. I'm trying to find funding to work on these two issues ASAP.
My profiling does not reveal any hotspots, as such -- it's basically
2, 3, 5% scattered around, so I envision this is just the general
vm/jit overhead as you scale up -- referencing distant objects, slots,
dispatch lookups, more cache misses, etc. But maybe I am generally
using some backwater loop/control methods, techniques, etc. that could
be tuned up. e.g. I seem to recall a trace at some point showing
#timesRepeat taking 10% of the time (?!). Also, I recall reading
about an anomaly with BlockClosures -- something like being rebuilt
every time thru the loop - has that been fixed? Any other gotchas to
watch for currently?
BlockClosures for non-inlined blocks are still created when mentioned. So if you do have a loop which contains a block creation, consider pulling the block out into a temp variable.
(Also, any notoriously slow subsystems? For example, Transcript
writing is glacial.)
Someone should replace the Transcript's reliance on (I think) some kind of FormMorph which moved huge numbers of bits on each write. But this is not a VM issue. It's a Smalltalk issue. Whoever did this would instantly become a hero.
The Squeak bytecode compiler looks fairly straightforward and
non-optimizing - just statement by statement translation. So it
misses e.g. chances to store and reuse, instead of pop, etc. I see
lots of redundant sequences emitted. Are those kind of things now
optimized out by Cog, or would tighter bytecode be another potential
optimization path. (Is that what the Opal project is targetting?)
There is some limited constant folding in the StackToRegisterMapingCogit. For example the pushTrue jumpFalse sequences generated in inlined and: and or: statements is eliminated. Also constant SmallInteger arithmetic is folded iff the receiver is a literal. The JIT doesn't have any type information so it can't fold var + 1 + 2, but it can and does fold 1 + 2 + var into 3 + var.
Marcus Denker and I are working as I write on the infrastructure for an adaptive-optimizer/speculative-inliner that will initially operate at the bytecode level, deriving type information from the JIT's inline caches and using this to direct bytecode-to-bytecode optimization that will inline blocks, inline methods, etc. We hope eventually to target floating-point and other performance-critical code. Marcus posted <a href=" Smalltalk Books Video Tutorials Smalltalk in Latam About Us! Contact Us Actually, I'm trying to make Ruby natural, not simple. -- Yukihiro "Matz" Matsumoto Home Talks & Presentations Eliot Miranda - Bytecode-to-bytecode adaptive optimization for Smalltalk Eliot Miranda - Bytecode-to-bytecode adaptive optimization for Smalltalk Last Updated (Sunday, 14 February 2010 20:21) | Written by Administrator | Monday, 18 August 2008 19:28 Multimedia Gallery - Talks & Presentations More ... Comments from the google video: This talk summarises two decades of work on Smalltalk and Self compilation and virtual machine technology and describes a novel attempt at an adaptive optimizer for Smalltalk that is written in Smalltalk and to a meaningful extent, portable across implementations. Smalltalk-80 and Self are fully object-oriented implicitly typed dynamic programming languages and interactive programming environments hosted above virtual machines that appear to execute stack-oriented bytecode for a pure stack machine. These systems' code and execution state are all represented by objects programmed in Smalltalk, such that the compiler, debugger, exception system and more are all implemented entirely in Smalltalk and available to the programmer for immediate incremental and interactive modification LikeDislike Community Disqus Add New Comment Optional: Login below. Post as … Showing 0 comments Sort by Subscribe by email Subscribe by RSS blog comments powered by DISQUS back to top < Prev Next > Main Menu Home ClubSmalltalk Merchandise About Us! Contact Us Articles Frontpage news Interviews Community ClubSmalltalk | Mailing List [In Spanish] ClubSmalltalk | LinkedIn Group ClubSmalltalk | Facebook ClubSmalltalk | Community Blogs Smalltalkers Blogs & Personal pages Smalltalkers - Social Network Environments Commercial Smalltalk Environments Free Smalltalk Environments Abbandon Smalltalk Environments Frameworks, Platforms & Tools FAQ Smalltalk Frequently Asked Questions GemStone Frequently Asked Questions ENVY/Manager Frequently Asked Questions Resources Smalltalk Jobs! Smalltalk Web Links Smalltalk News Feeds Smalltalk Books Smalltalk Podcasts Multimedia Gallery Back to the future - Photo Gallery Smalltalk History Channel Talks & Presentations Smalltalk Documentaries Login If you don't want to register in this site, you can use your Gmail or OpenId authentication. Username Password Remember Me Forgot your password? Forgot your username? Create an account Login with an OpenID What is OpenId? Smalltalk on Twitter J_WICKS_CTE (J.Wicks) : I hate 2 word sentences #smalltalk 11th_echo (Yvonne) : Da fand er unseren Smalltalk wohl so einlullend, dass er erstmal seine Schlüssel liegen ließ :DDDDDDDDDDDD höhö Dennis_Klinger (Dennis Klinger) : benachrichtigung bei antwort: hi leute ich hab mal ne frage. wie kann man sich benachrichtigen lassen wenn in ... http://bit.ly/gTytl0 onizee (Oneal Madumo) : Those who understnd #smalltalk ,guy walks in on me in th gym showers,and he says,"eish o tshwara shawara",how do i respond to this? JusCardo (Ricardo Cherry) : #SMALLtalk RT @_SweetTeee @JusCardo lol I hate you Paypal Donation Please, make a Paypal donation at least of 1 dollar! Thanks! Copyright © 2011 ClubSmalltalk. All Rights Reserved. Contact Us! Please, if you like the site visit our sponsors. Thanks. ">a presentation I did a while back that covers the essential ideas. This project could completely close the gap to C if augmented by a good quality code generator. At least, that's a goal.
-- jbthiel