Smalltalk › Squeak › Squeak - Dev

About the new compiler, Part II

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Marcus Denker

About the new compiler, Part II

Part II: Examples
-----------------------

I was made aware that the first part put far to much emphasis on
performance of compilation. I think
it's true that it's fairly unimportant, I did that because in an
ealier thread we got a comment that
suggested that the compiler has to be bad just because of that. This
is of course completely not the
case, performance of compilation is not at all that important...
especially considering today's machines.
And both SmaCC nor the rest of the Compiler were ever optimized for
speed of compilation...

What counts is flexibility and good design, so that the code is
maintainable, reusable and enables
experiments. So I hope this part emphasis the right points better.

Having a framework with the right abstractions simplifies
everything... "Modeling is cheating".

So the question was if the modular design with all these visitors, the
use of SmaCC instead
of a hand-written parser and the IR at the end really are that
interesting to have... I think they
are, but the only way to prove it is to explain a bit what we used the
framework for in the past.
For all these, the architecture proved to be quite useful.

All the things mentioned in the following are *not* part of the
NewCompiler. They have been
build using it, and while building them, we fixed bugs and generalized
the framework a little.

So what did we do with the NewCompiler Framework?

1) Language experiments. Some time ago, I did a small experiment for
Impara with different syntax
for Squeak. The stated goal was to see how little is needed for
having python or JS like syntax
in Squeak.
(Of course, the result was that just having the syntax is not
enough: It's completely unclear
where the similarity to the other language brakes down, and thus
it's unusable.. people want to
pick a book and just type in the code without even undestanding it
completely... the semantics
are where it starts to get interesting and *a lot* of work).

But it's a cool demo... and easy to do: Grammar of JS in SmaCC,
AST Nodes for all constructs, then
a visitor that calls the IRBuilder to generate code. No dealing
with bytecode, but nevertheless
the complete freedom of bytecode abstraction level code generation.

Slides (Squeak image with all code): http://www.iam.unibe.ch/~denker/talks/BabelTalk.zip

2) Bytecode Transformation.

I got interested in Behavioral Reflection some time ago, and
wanted to implement the Reflex model
of partial behavioral Reflection [1] with a student (David
Roethlisberger). For that, we needed
bytecode transformation (at least at that time we thought so...).

So we looked at Javassist [2] and inspired from that build
ByteSurgeon. The idea here is that
we want to insert (or replace) code at any bytecode instruction.
Of course, we don't want to write
the to-be-inlined code as bytecode itself, and we do not want to
deal with the very low level view
of bytecode where there a many different send-bytecodes, for
example.

When you look now at the NewCompiler framwork, then there are two
things directly trivially visible:
1) The IR is exactly on the right level of abstraction.
2) Implementing a small compiler to generate us the to-be-inlined
code as IR is trivial with the
the modular design.

So this is what we did... added transformation (adding/deleting
nodes) to the IR, wrote a Compiler
as a simple subclass of the standard SmaCC based compiler that
generates IR (extended with
special syntax to be able to access e.g. the receiver and arguments
of a send). Then the bytecode
inling framework is a simple thing.

As an example, here is a the code that would annotate the class
Example to log the
receiver objecrs of all message sends:

Example instrumentSend: [ :sendInstr |
sendInstr insertBefore: ’Logger logSendTo: <meta: #receiver> ’
].

More information:
Slides: http://www.iam.unibe.ch/~denker/talks/ByteSurgeon-slides.pdf
Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk06a

ByteSurgeon was then used as the basis for some things:
-> first Geppetto (Unanticipated Partial Behavioral Reflection)
Slides: http://www.iam.unibe.ch/~denker/misc/GeppettoESUG2006.pdf
Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Roet08a
-> a proof-of concept implementation of an Omniscient Debugger
similar to Bill Lewis' work
for Java:
Slides: http://www.iam.unibe.ch/~denker/talks/06NODE/UnstuckNode06.pdf
Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Hofe06a
-> It's one of the backends in the Test Coverage tool Christo done
by Stefan Reichart
http://smallwiki.unibe.ch/stefanreichhart/codecoverage/

3) Compiler hack: Global variables as message sends.
For ChangeBoxes, Pascal Zumkehr needed globals not the be hard-
coded, but to be accessed via
message sends. For this, he changed the NewCompiler. It's easy to
do, and he did it after a short
introduction over the NewCompiler. The old compiler is quie arcane
for all these things. (But for
sure as soon as you get used to the patterns it's not
impossible... but I think it's odd way of
coding)

Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk07c

5) Sub-Method Reflection
Joined work with Phillippe Marschall. This uses the SmaCC/RB-AST part
of the NewCompiler to
generate "Reflective" Methods that use and extendend AST instead of
bytecodes, and it provides
a small in-image "JIT" that generate bytecode on-demand, which is
based on the standard NewCompiler
backend.

Slides: http://www.iam.unibe.ch/~denker/talks/07TOOLS/07PersephoneTOOLS.pdf
Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk07b

This was used e.g. for Adrian Lienhard's work on first class aliases
and Object-Flow Analysis
http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Lien07a
http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Lien07c

6) Textual Annotations on every language constructs.
Phillippe provided textual annotations for all language
constructs in the Persephone system.
This was realized as it's own extended smalltlak compiler (based on
the SMacc grammar)

Nik Haldiman used this to build a pluggable type system for Squeak.
Slides: http://www.iam.unibe.ch/~denker/talks/07ESUG/07TypePlugESUG.pdf
Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Hald07b

6) Reflectivity. This merges sub-method reflection with partial
behavioral reflection.

Homepage: http://www.iam.unibe.ch/~scg/Research/Reflectivity/index.html
Slides: http://www.iam.unibe.ch/~denker/talks/07DYLA/07ReflectivityDylan.pdf

This was used e.g.
-> for Dynamic Analysis http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi?query=Denk07d&abstract=yes
-> for Transactional Memory (Lukas Renggli): http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Reng07b
-> HistOOry, by Frederic Pluquet: http://decomp.ulb.ac.be/frdricpluquet/researchactivities/histoory/

So, all in all I am quite convinced that an open, reusable compiler
infrastructure provides *huge* benefits for building
experiments and tools and thus exploring the future.

Next part:
-> Closures and Performance of Closure code. (this may take some
days... busy)

In addition, I will try to answer the questions that came up and give
a status report soon.

Marcus

(I am not subscribed to Squeak-dev anymore, so please CC: me)

References
==========

[1] Reflex: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Tant03a
[2] Javassist: http://www.csg.is.titech.ac.jp/paper/chiba-gpce03.pdf

--
Marcus Denker -- [hidden email]
http://www.iam.unibe.ch/~denker