A Smalltalk implemented as Self-optimizing Interpreter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

A Smalltalk implemented as Self-optimizing Interpreter

Stefan Marr-5
Hi:

Since we recently published a little article on first experiments with self-optimizing interpreters, I thought I might share it here as well:

It is a first look at RPython and Truffle as foundations for language implementations:
http://stefan-marr.de/2014/09/are-we-there-yet/?utm_source=ggroups&utm_medium=list&utm_campaign=self
(The paywalled version is available here: http://dx.doi.org/10.1109/MS.2014.98)

In the article, we briefly compare the AST-based approach of self-optimizing interpreters with the classic bytecode-based approach.

Since then, we made further progress. Some of it can be seen in the performance numbers here: http://som-st.github.io/#performance
Implementing a Smalltalk in barely 10k lines of code and being easily in the range of 1x-10x of Java looks pretty good, at least on paper :)

Hope you'll find it interesting
Stefan

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
Stefan,

Great stuff!

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
Back in 1984 I came up with an AST based VM for Smalltalk
as an alternative to bytecodes:

http://merlintec.com/lsi/st84.txt

In the mid 1990s I had some thoughts about how to implement
that directly in hardware (with a focus on massive parallelism):

http://merlintec.com/swiki/hardware/19.html

These are all just vague musings, but I still feel they were
good ideas and intend to explore this direction more in the
future.

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Stefan Marr-5
Hi Jecel:

On Tuesday, September 23, 2014 11:14:59 PM UTC+2, Jecel wrote:
Back in 1984 I came up with an AST based VM for Smalltalk
as an alternative to bytecodes:

<a href="http://merlintec.com/lsi/st84.txt" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmerlintec.com%2Flsi%2Fst84.txt\46sa\75D\46sntz\0751\46usg\75AFQjCNHNemvAh2lsTtrpa11L2qK6tU35dQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmerlintec.com%2Flsi%2Fst84.txt\46sa\75D\46sntz\0751\46usg\75AFQjCNHNemvAh2lsTtrpa11L2qK6tU35dQ';return true;">http://merlintec.com/lsi/st84.txt

Skimming through the code, it looks a little incomplete or is it theoretically working, being a meta-circular implementation?

For comparison, you might have a look at the SOM implementations.
Not sure what the best entry point for code reading is, perhaps the parser:

https://github.com/SOM-st/TruffleSOM/blob/master/src/som/compiler/Parser.java#L546
https://github.com/SOM-st/RTruffleSOM/blob/master/src/som/compiler/parser.py#L390
Either the Java or RPython version, guess it depends on taste. The RPython version is less complete when it comes to optimizations.

(The older bytecode-based versions are here https://github.com/SOM-st/RPySOM and here https://github.com/SOM-st/som-java)
 
In the mid 1990s I had some thoughts about how to implement
that directly in hardware (with a focus on massive parallelism):

<a href="http://merlintec.com/swiki/hardware/19.html" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmerlintec.com%2Fswiki%2Fhardware%2F19.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHYw3JsCe13-h8meYS1rHVhfrB7IQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fmerlintec.com%2Fswiki%2Fhardware%2F19.html\46sa\75D\46sntz\0751\46usg\75AFQjCNHYw3JsCe13-h8meYS1rHVhfrB7IQ';return true;">http://merlintec.com/swiki/hardware/19.html

These are all just vague musings, but I still feel they were
good ideas and intend to explore this direction more in the
future.

Hm, I wonder, is the idea of supporting objects in hardware still something that would be interesting?
With the JIT compilation technology we got today, I feel that many optimizations are better done in software, and constraining hardware by making it object-aware potentially could restrict the possible optimizations?

That's also with respect to the idea of using AST-based program representations.
The reason that the AST-based interpreter are fast is that the compiler either traces through them (RPython) and generates close to ideal native code, or that a partial evaluator optimizes the interpreter code by taking concrete ASTs into account, which contain profiling data from actual program execution and at runtime determine specializations. Thus, the compiler gets plenty of information to generate efficient code. The generated code however does neither represent ASTs anymore nor does it necessarily contain any 'objects'. All objects local to a compilation unit, i.e., non-escaping, are usually optimized out. Objects are usually only reified at the borders of compilation units.

After wrapping up the working on 'getting things reasonably fast', I also intent to get back to the concurrency part of the story.
But that might only happen later this year. There's already a branch with thread support for TruffleSOM, but it still needs much more work. And in the end, I would like to experiment with various different models for instance actors. 
 
Best regards
Stefan

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
Stefan,

> > [1984 AST Smalltalk VM]
> Skimming through the code, it looks a little incomplete or is it
> theoretically working, being a meta-circular implementation?

It is just a sketch on paper. To be an actual VM it would need
an object memory system and primitives at the very least. At that
time I was very interested in dataflow architectures and was
wondering if a Smalltalk VM could be like that instead of a stack
machine.

> [SOM implementations]

Thanks for the references. I have been keeping track of the SOM project
but haven't read the code in detail yet.

> Hm, I wonder, is the idea of supporting objects in hardware still
> something that would be interesting?

There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar
that take a look at what had changed since the SOAR project 10 years
earlier and their conclusion was "no":

https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html

I don't agree, though I think their arguments are very good. But there
is a bit of circular reasoning: they tried to make the best possible
compiler for a machine that was designed to run C, then they compare
the output of their compiler with the output of the C compiler and they
look the same. I would not have expected anything else.

> With the JIT compilation technology we got today, I feel that many
> optimizations are better done in software, and constraining hardware
> by making it object-aware potentially could restrict the possible
> optimizations?

This has been a real problem for many projects. Somebody designs
a fancy MMU chip to sit between the processor and memory to do
garbage collection faster and somebody else comes up with a new
algorithm that makes a normal processor even faster.

But I don't think this is as likely to happen now as 20 years ago since
the state of the art has been stalled for so long. Software implementations
aren't trying to make things even faster but instead the focus is on
making things reasonably fast with a lot less effort.

> [Objects are usually only reified at the borders of compilation units.]

Indeed! I have learned the hard way that "the compiler can know the
future but the hardware is limited to the past". Back in 1990 I was playing
with some exotic architectures trying to get message passing down to
one or two clock cycles. Then the Self group started publishing their papers
and they got message passing to be a negative number of clock cycles!
Yes, this is an accounting trick but in practice it is a fair comparison. So
I changed my project to a slightly modified Sparc processor (which was,
unfortunately, killed by my boss for not being academic enough).

After I had a lot more experience with adaptive compilation (the title of the
PhD thesis I am working on is "Adaptive Compilation for an Object-Oriented,
Parallel and Reconfigurable Architecture", for example) I decided that
mixing that with special hardware would be a very interesting research
path.

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jan Vrany
Hi there,

On Wed, 2014-09-24 at 13:09 -0700, Jecel wrote:

> Stefan,
>
> > > [1984 AST Smalltalk VM]
> > Skimming through the code, it looks a little incomplete or is it
> > theoretically working, being a meta-circular implementation?
>
> It is just a sketch on paper. To be an actual VM it would need
> an object memory system and primitives at the very least. At that
> time I was very interested in dataflow architectures and was
> wondering if a Smalltalk VM could be like that instead of a stack
> machine.
>
> > [SOM implementations]
>
> Thanks for the references. I have been keeping track of the SOM
> project
> but haven't read the code in detail yet.
>
> > Hm, I wonder, is the idea of supporting objects in hardware still
> > something that would be interesting?
>
> There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar
> that take a look at what had changed since the SOAR project 10 years
> earlier and their conclusion was "no":
>
> https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html
>
> I don't agree, though I think their arguments are very good. But there
> is a bit of circular reasoning: they tried to make the best possible
> compiler for a machine that was designed to run C, then they compare
> the output of their compiler with the output of the C compiler and
> they
> look the same. I would not have expected anything else.
>
> > With the JIT compilation technology we got today, I feel that many
> > optimizations are better done in software, and constraining hardware
> > by making it object-aware potentially could restrict the possible
> > optimizations?
>
> This has been a real problem for many projects. Somebody designs
> a fancy MMU chip to sit between the processor and memory to do
> garbage collection faster and somebody else comes up with a new
> algorithm that makes a normal processor even faster.

I don't think it has to be really fancy. Having HW support for tagged
pointers and tagged arithmetic could help, no? I mean having "add to
tagged integers or jump if one is not tagged integer" kind of thing.
Also having support for write barrier could help. SPARC used to have it
in silicon, IIRC...

Jan





--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Andres Valloud-4
See Alan Knight's hardware article here.

http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF

On 9/24/14 14:10 , Jan Vrany wrote:

> Hi there,
>
> On Wed, 2014-09-24 at 13:09 -0700, Jecel wrote:
>> Stefan,
>>
>>>> [1984 AST Smalltalk VM]
>>> Skimming through the code, it looks a little incomplete or is it
>>> theoretically working, being a meta-circular implementation?
>>
>> It is just a sketch on paper. To be an actual VM it would need
>> an object memory system and primitives at the very least. At that
>> time I was very interested in dataflow architectures and was
>> wondering if a Smalltalk VM could be like that instead of a stack
>> machine.
>>
>>> [SOM implementations]
>>
>> Thanks for the references. I have been keeping track of the SOM
>> project
>> but haven't read the code in detail yet.
>>
>>> Hm, I wonder, is the idea of supporting objects in hardware still
>>> something that would be interesting?
>>
>> There is the famous ECOOP 95 paper by Urs Hölzle and David Ungar
>> that take a look at what had changed since the SOAR project 10 years
>> earlier and their conclusion was "no":
>>
>> https://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html
>>
>> I don't agree, though I think their arguments are very good. But there
>> is a bit of circular reasoning: they tried to make the best possible
>> compiler for a machine that was designed to run C, then they compare
>> the output of their compiler with the output of the C compiler and
>> they
>> look the same. I would not have expected anything else.
>>
>>> With the JIT compilation technology we got today, I feel that many
>>> optimizations are better done in software, and constraining hardware
>>> by making it object-aware potentially could restrict the possible
>>> optimizations?
>>
>> This has been a real problem for many projects. Somebody designs
>> a fancy MMU chip to sit between the processor and memory to do
>> garbage collection faster and somebody else comes up with a new
>> algorithm that makes a normal processor even faster.
>
> I don't think it has to be really fancy. Having HW support for tagged
> pointers and tagged arithmetic could help, no? I mean having "add to
> tagged integers or jump if one is not tagged integer" kind of thing.
> Also having support for write barrier could help. SPARC used to have it
> in silicon, IIRC...
>
> Jan
>
>
>
>
>

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
In reply to this post by Jan Vrany
Jan,

> I don't think it has to be really fancy. Having HW support for tagged
> pointers and tagged arithmetic could help, no? I mean having "add to
> tagged integers or jump if one is not tagged integer" kind of thing.
> Also having support for write barrier could help. SPARC used to have it
> in silicon, IIRC...

SOAR had tagged versions of all instructions. When they measured their
benchmarks they found that only the tagged adds were making any
difference. So SPARC kept the tagged add (and tagged subtract for
symmetry), but when Urs measured their impact on his Self compiler
he decided not to use these instructions. The problem is what when the
tag check failed, an exception was raised and it took over a thousand
clock cycles before SunOS started running the actual exception handler
code.

Obviously a tagged add instruction that takes zero clock cycles if the
check works and only as much as a jump or call when it fails would have
been far more useful.

Another simple hardware feature that can help quite a bit is virtually
addressed caches (as opposed to the more common physically addressed
ones) like in the Mushroom Smalltalk computer. Mario Wolczko tried
to get Sun to include this in their processors but they never did. The
idea is that you only have to do an object table lookup on cache misses
and not on every memory access. So you get the performance of
direct pointers with the flexibility of object tables.

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
In reply to this post by Andres Valloud-4
Andres,

> See Alan Knight's hardware article here.
>
> <a href="http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.macqueen.us%2FsmalltalkReport%2FST%2F91_95%2FSMAL0502.PDF\46sa\75D\46sntz\0751\46usg\75AFQjCNHoXiS5-64tZjoNfq3MVGlhxsmLVA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.macqueen.us%2FsmalltalkReport%2FST%2F91_95%2FSMAL0502.PDF\46sa\75D\46sntz\0751\46usg\75AFQjCNHoXiS5-64tZjoNfq3MVGlhxsmLVA';return true;">http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF

Thanks for the reference! In the end he is mostly citing Urs' article
that I mentioned but the first part was the traditional economic
argument against language specific processors.

During the 1990s I think these ideas were very valid and I stopped
my own Smalltalk hardware projects and work on software for PCs
instead. Things had changed by the end of that decade, however, and
we saw no lack of Forth and Java processors from then on.

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

re: A Smalltalk implemented as Self-optimizing Interpreter

ccrraaiigg

Hi Jecel--

> In the end he is mostly citing Urs' article that I mentioned but the
> first part was the traditional economic argument against language
> specific processors.
>
> During the 1990s I think these ideas were very valid and I stopped
> my own Smalltalk hardware projects and work on software for PCs
> instead. Things had changed by the end of that decade, however, and
> we saw no lack of Forth and Java processors from then on.

     I'd also be interested to hear how the situation changed (or
didn't) in the era of cheaper and more powerful FPGAs.


-C

--
Craig Latta
netjam.org
+31   6 2757 7177 (SMS ok)
+ 1 415  287 3547 (no SMS)

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Andres Valloud-4
In reply to this post by Jecel Assumpcao Jr
Strange.  Even "today", Azul went from their custom hardware with very
specialized needs to commodity x86.

On 9/25/14 15:06 , Jecel wrote:

> Andres,
>
>  > See Alan Knight's hardware article here.
>  >
>  > http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF
> <http://www.macqueen.us/smalltalkReport/ST/91_95/SMAL0502.PDF>
>
> Thanks for the reference! In the end he is mostly citing Urs' article
> that I mentioned but the first part was the traditional economic
> argument against language specific processors.
>
> During the 1990s I think these ideas were very valid and I stopped
> my own Smalltalk hardware projects and work on software for PCs
> instead. Things had changed by the end of that decade, however, and
> we saw no lack of Forth and Java processors from then on.
>
> -- Jecel
>
> --
> You received this message because you are subscribed to the Google
> Groups "Smalltalk Research" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [hidden email]
> <mailto:[hidden email]>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
Andres,
> Strange.  Even "today", Azul went from their custom hardware with very
> specialized needs to commodity x86.

That had been my impression as well, but looking at their site it seems
that for now they are offering both options. It is extremely unlikely that
they will make any newer versions of their own chips, however, and this
is the classic argument against language specific processors - they don't
have the volumes to ride Moore's Law like their competitors.

I am cautious about reading too much into a single data point. For
example: while it is true that Lisp Machines died, so did every single
other minicomputer architecture that has ever existed. Was it the
lispyness or the mininess that killed it?

In the same way, I see the Azul architecture computing more against
the likes of Tilera than against Intel or ARM. They had a very neat 64
bit RISC with a nice memory model, but it was less of a Java processor
than an ARM with Jazelle 1 (though slightly more than Jazelle 2), for
example.

Sometimes these business decisions reflect more the result of
executive shuffle than any keen insights into the future of the market.
Like the new guy at ParcPlace-Digitalk feeling it would be cool to
do Java like everyone else instead of this odd Smalltalk or the new
guy at IBM thinking it would be best to drop their own OS/2 to
build WinTel PCs like everyone else. It is scary to be different and if
you bring an MBA from the food industry to run your company he
will probably convince you to be like everyone else.

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A Smalltalk implemented as Self-optimizing Interpreter

Jecel Assumpcao Jr
In reply to this post by ccrraaiigg
Craig,
>     I'd also be interested to hear how the situation changed (or
> didn't) in the era of cheaper and more powerful FPGAs.

The FPGA market has a lot of variation. You can get small ones
for just a few dollars that are only a few times more powerful than
the very first ones introduced in the mid 1980s. You can get
some extremely large ones that cost of ten thousand dollars
a piece (I wouldn't want to make a mistake when soldering one
of these babies on a board!). I will ignore these extremes and just
focus in the meat of the market between $10 and $100 per FPGA.

In the early 1990s FPGAs were too small to implement a whole
processor, but the Mushroom Smalltalk computer used a bunch of
them to make it easier for them to experiment with design changes.
Only one machine was built so the cost of the chips wasn't an
important issue. By the late 1990s it was possible to put a complete
simple RISC processor in an FPGA and by 2003 or so it was
possible to put a reasonably complex processor or a bunch of
simpler ones in a single FPGA. A decade later we have even more
resources to play with.

It is important to have realistic expectations, however. If you
implement something like an ARM processor with a bunch of
peripherals in a $10 FPGA it will probably run at less than 100 MHz
and take up half of the area in the chip. Considering that you can
probably buy an ARM just like that for $0.50 or less and you can
buy a 700MHz dual core ARM for the $5 that your FPGA version
is costing you, it doesn't seem like such a good idea.

The main reason to do something like this is to fill the other half
of your $10 FPGA with something that your application needs
and that the ARMs you can buy off the shelf don't have. What this
is depends on your application. An example from my master's
project: I was able to get real time, full VGA resolution optical
flow calculations from a custom 16MHz circuit running on an
FPGA that doesn't even have a heat sink. You would need a
very high end PC to get even close to this result.

An interesting alternative is to play with your own processor
designs. As Alan Kay likes to point out, people used to do this
all the time. Commercial microprocessors killed this, but FPGAs
give us the opportunity to restart research in this area. For this
to be interesting as research, you only have to compare with
other processors implemented in the same FPGAs and not with
commercial ASIC processors.

One thing that FPGAs can do that ASICs can't is have the
circuit they implement be changed at runtime. We still don't
have many results in this area, but objects can be switched
from software to hardware without any changes to the rest of
the system since they only interact via message passing. So
it might be interesting the extend an adaptive compilation
scheme so that a quick and dirty compiler would generate code
the first time a method was called, a better compiler would
be invoked when a hotspot was detected and for the extreme
hotspots you could compile the object to hardware and place
it in a spare area in the FPGA.

I plan to work on this in the future, but for now have been
doing an extremely limited version instead (using a fixed
design coprocessor instead of compilation to hardware).

-- Jecel

--
You received this message because you are subscribed to the Google Groups "Smalltalk Research" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.