Smalltalk › Squeak › Squeak VM

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

14 messages Options

Eliot Miranda-2

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

Hi All,

On Mon, Sep 15, 2014 at 6:01 AM, Thierry Goubier <[hidden email]> wrote:

2014-09-15 14:39 GMT+02:00 Clément Bera <[hidden email]>:
Hello,

Note that slang is a subset of smalltalk. The Slang compiler does not allow to compile smalltalk to C. It allows to compile a smalltalk with restricted message sends and classes to C.

Yes, I am aware of that. I remember that from the very beginnings of Squeak.

Wasn't Smalltalk/X the one which had a more complete version of that C translation? I did an internship in a French company who had a Smalltalk to C translator done for them a long time ago.

2014-09-15 13:28 GMT+02:00 Thierry Goubier <[hidden email]>:
Hi Phil,

thanks for the update on Slang to C. Allways significant to have that.

Two open questions:

- would a slang to x86 asm via NativeBoost be doable / a nice target?

Yes it would be interesting. However, by having a Slang to C compiler, we're platform-independent, we can compile the C code to x86, x86_64 and ARM quite easily (some part of the VM are already processor dependent, but not so much). Targeting direct machine code implies evolving the Slang compiler for each new assembly code we support. It sounds like a lot of engineering work compared to our resources and the gain.

It would allow JIT-type compilation experiments than a Slang-to-C chain isn't designed for :) With a lot more work doing the various NB ports, of course.

- would targetting LLVM-IR be of interest?

If you compile the C code with Clang instead of gcc, which starts to be the case because of the lack of support for gcc in the latest Mac OS X, you are already using LLVM IR because Clang uses it. As the VM use the GNU C extensions to improve performance, I do not think that targeting directly the LLVM IR would greatly improve performance. So it sounds like quite some engineering work for no gain.

I would not suggest replacing C by LLVM-IR for VM work, in part because LLVM-IR is not what I would call a readable source code format... But I do know that even when doing C to C rewritting for embedded compilation, there is some low-level code that you can't write in C.

I find this whole discussion depressing. It seems people would rather put their energy in chasing quick fixes or other technologies instead of contributing to the work that is being done in the existing VM. People discuss using LLVM as if the code generation capabilities inside Cog were somehow poor or have no chance of competing. Spur is around twice as fast as the current memory manager, has much better support for the FFI. Clément and I, now with help from Ronie, are making excellent progress towards an adaptive optimizer/speculative inliner that will give us similar performance to V8 (the Google JavaScript VM, lead by Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al. We are trying to get person-power for a high-quality FFI and have a prototype for a non-blocking VM. When we succeed C won't be any better and so it won't be an interesting target. One will be able to program entirely in Smalltalk and get excellent performance. But we need effort. Collaboration.

Personally I feel so discouraged when people talk about using LLVM or libffi or whatever instead of having the courage and energy to make our system world-class. I have the confidence in our abilities to compete with the best and am saddened that people in the community don't value the technology we already have and can't show faith in our abilities to improve it further. Show some confidence and express support and above all get involved.

Collaborators

Cog Projects

Spur 1/3 Spur, a new object representa...

Sista: Improving Cog's JIT performance 1/2 Sista: Improving Cog’s JIT pe..

Lowcode: Redoing NativeBoost ...

However, I think Ronie was interested in doing such work. If he succeeds and reports performance improvement, then we can consider using his compiler to compile the VM.

Keep us posted!

Thierry

--
in hope,

Eliot

ccrraaiigg

re: Parsing Pharo syntax to C/C++

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Ronie Salgado

re: Parsing Pharo syntax to C/C++

Hello,

I am segmenting this mail in several sections.

---------------------------------------------------------------

- On Lowcode and Cog

I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.

Lowcode is currently a spec of new bytecode instructions. These instructions can be used for:

- Implementing a C like language compiler.

- Making FFI calls

I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.

These instructions provides features for:

- Int32 and Int64 integer arithmetic without type checking.

- Pointers, with arithmetics.

- Memory access and memory manipulation.

- Single and double precision floating point arithmetics.

- Conversion between primitive types.

- Boxing and unboxing of primitive types.

- Unchecked comparisons.

- Native function call. Direct and indirect calls.

- The atomic operation compare and swap.

- Object pin/unpin (requires Spur).

- VM releasing and grabbing for threaded ffi.

Current I have implemented the following backends:

- A C interpreter plugin.

- A LLVM based backend.

Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.

During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.

After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.

---------------------------------------------------------------
- Smalltalk -> LLVM ?

As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.

To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.

Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.

- Perform a costly static global analysis of the whole program.

- Measure in runtime and hope for the best.

- Extend the language.

In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.

Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.

For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:

- Extend the language to hand optimize some performance critical sections of code.

- Use another language to optimize these critical section.

- Use another language for the whole project.

And of course, you have to perform lot of profiling.

Greetings,

Ronie

2014-09-15 16:38 GMT-03:00 Craig Latta <[hidden email]>:

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)

ccrraaiigg

re: Parsing Pharo syntax to C/C++

Hoi Ronie--

Nice summary. Thanks!

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Eliot Miranda-2

re: Parsing Pharo syntax to C/C++

In reply to this post by Ronie Salgado

Hi Ronie,

On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado <[hidden email]> wrote:

Hello,

I am segmenting this mail in several sections.

---------------------------------------------------------------
- On Lowcode and Cog

I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.

remember to send me code for integration. I'm eagerly waiting to use your code!

Lowcode is currently a spec of new bytecode instructions. These instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls

I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.

These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.

Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.

Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.

During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.

After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.

Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".

I am so happy to have your involvement. You and Clément bring such strength and competence.

---------------------------------------------------------------
- Smalltalk -> LLVM ?

As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.

To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.

Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.

In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.

Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.

Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.

For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.

The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But

a) I've yet to see anybody raise JIT latency as an issue in Cog

b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.

And of course, you have to perform lot of profiling.

Early and often :-).

Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.

Greetings,
Ronie

2014-09-15 16:38 GMT-03:00 Craig Latta <[hidden email]>:

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
best,

Eliot

Göran Krampe

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

In reply to this post by Eliot Miranda-2

Hi Eliot and all!

Since I work with Ron at 3DICC and Cog is vital to us, I wanted to chime
in here.

On 09/15/2014 06:23 PM, Eliot Miranda wrote:

> I find this whole discussion depressing. It seems people would rather
> put their energy in chasing quick fixes or other technologies instead of
> contributing to the work that is being done in the existing VM. People
> discuss using LLVM as if the code generation capabilities inside Cog
> were somehow poor or have no chance of competing. Spur is around twice
> as fast as the current memory manager, has much better support for the
> FFI. Clément and I, now with help from Ronie, are making excellent
> progress towards an adaptive optimizer/speculative inliner that will
> give us similar performance to V8 (the Google JavaScript VM, lead by
> Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al.

One thing you need to understand Eliot is that most of us don't have the
mind power or time to be able to contribute on that level.

But still, a lot of us are tickled by ideas on the low level - and thus
ideas like reusing LLVM, reusing some other base VM, cross compilation
etc - pop up.

Don't put too much into it - I am always toying with similar ideas in my
head for "fun", it doesn't mean we don't also see/know that *real* VM
work like Cog is the main road.

> We are trying to get person-power for a high-quality FFI and have a
> prototype for a non-blocking VM. When we succeed C won't be any better
> and so it won't be an interesting target. One will be able to program
> entirely in Smalltalk and get excellent performance. But we need
> effort. Collaboration.

Let me just mention LuaJIT2 - besides very good performance, among other
things it sports a *very* good FFI. Well, in fact Lua in general has
several FFIs and tons of C++ bindings tools too - so IMHO anyone doing
work in that area should take a sneak peek at LuaJIT2.

And this is a truly "sore" area in Smalltalk since eternity. If we had
something as solid as the stuff in the Lua community - then Cog and
Smalltalk could go places where it haven't been before I suspect.

If we look at the codebase we have at 3DICC - a very large part consists
of complicated plugin code to external libraries and accompanying
complicated Smalltalk glue.

Also, if we compare the Lua community with the Squeak/Pharo community,
it is quite obvious that the lack of really good FFI solutions leads us
to "reinvent" stuff over and over, often poorly, while the Lua people
simply wrap high quality external libraries and that's it. Done.

Of course still also stems from the very different background and
motives behind the two languages and their respective domains, but still.

> Personally I feel so discouraged when people talk about using LLVM or
> libffi or whatever instead of having the courage and energy to make our
> system world-class.

Don't feel discouraged - its just that 99% of the community can't help
you. :) Instead we should feel blessed that we have 1 Eliot, 1 Clement,
1 Igor and 1 Ronie. Do we have more?

> I have the confidence in our abilities to compete
> with the best and am saddened that people in the community don't value
> the technology we already have and can't show faith in our abilities to
> improve it further. Show some confidence and express support and above
> all get involved.

Let me then make sure you know that 3DICC values *all* work in Cog
*tremendously*.

As soon as you have something stable on the Linux side - we would start
trying it. Just let me know, on Linux (server) we run your upstream Cog
"as is". In fact, I should probably update what we use at the moment :)

Every bit of performance makes a big impact for us - but to be honest,
what we would value even more than performance would be ... robustness.
I mean, *really* robust. As in a freaking ROCK.

An example deployment: More than 3000 users running the client on
private laptops (all Windows variants and hw you can imagine, plus some
macs) and the server side running on a SLEW of FAT EC2 servers. We are
talking about a whole BUNCH of Cogs running 24x7 on a bunch of servers.

We experience VM blow ups on the client side, both Win32 and OSX. OSX
may be due to our current VM being built by clang, but I am not sure.
Our Win32 VM is old, we need to rebuild it ASAP. Hard to know if these
are Cog related or more likely 3DICC plugin related, but still.

But the client side is still not the "painful" part - we also experience
Linux server side Cogs going berserk (100% CPU, no response) or just
locking up or suddenly failing to resolve localhost :) etc. I suspect
the networking code in probably all these cases. Here we do NOT have
special 3DICC plugins so no, here we blame Cog or more likely, Socket
plugin. Often? No, but "sometimes" is often enough to be a big problem.
In fact, a whole new networking layer would make sense to me.

Also... we need to be able to use more RAM. We are now deploying to
cloud servers more and more - and using instances with 16Gb RAM or more
is normal. But our Cogs can't utilize it. I am not up to speed what Spur
gives us or if we in fact need to go 64 bit for that.

regards, Göran

Clément Béra

re: Parsing Pharo syntax to C/C++

In reply to this post by Eliot Miranda-2

2014-09-16 1:46 GMT+02:00 Eliot Miranda <[hidden email]>:

Hi Ronie,

On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado <[hidden email]> wrote:

Hello,

I am segmenting this mail in several sections.

---------------------------------------------------------------
- On Lowcode and Cog

I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.

remember to send me code for integration. I'm eagerly waiting to use your code!

Lowcode is currently a spec of new bytecode instructions. These instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls

I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.

These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.

Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.

Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.

During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.

After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.

Lack of documentation ?

About Cog there are these documentation:

Deep into Pharo part 4 about blocks and exceptions

VMIL paper about Cogit

The Cog blog

About Spur: summary and object format

This post

And many useful class and method comments that taught me a lot.

When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and even fewer people writing documentation. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.

Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".

It's also difficult because the first tests to write are the hardest to write.

I am so happy to have your involvement. You and Clément bring such strength and competence.

---------------------------------------------------------------
- Smalltalk -> LLVM ?

As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.

To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.

Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.

In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.

Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.

Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.

For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.

The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But

a) I've yet to see anybody raise JIT latency as an issue in Cog
b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.

And of course, you have to perform lot of profiling.

Early and often :-).

Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.

The solution of Eliot makes sense.

To write a paper about that I need benchs showing result on real time applications.

So there's quite some work to do before.

Greetings,
Ronie

2014-09-15 16:38 GMT-03:00 Craig Latta <[hidden email]>:

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
best,
Eliot

philippeback

Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

What would be valuable is a reading list / path to VM enlightenment.

Bluebook is useful

Then a tour of the Object Engine by Tim

Then plugin articles + Slang

The bytecode set

Primitive...

Context to stack mapping

Blocks

Non local returns

Display/Sensor/event look/timer implementation (like in the porting document).

and only then one would move to more advanced topics.

I saw that Clement had a set of VM related books on his desk at INRIA, maybe posting the list would be great!

All the best,

Phil

On Tue, Sep 16, 2014 at 11:48 AM, Clément Bera <[hidden email]> wrote:

2014-09-16 1:46 GMT+02:00 Eliot Miranda <[hidden email]>:

Hi Ronie,

On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado <[hidden email]> wrote:

Hello,

I am segmenting this mail in several sections.

---------------------------------------------------------------
- On Lowcode and Cog

I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.

remember to send me code for integration. I'm eagerly waiting to use your code!

Lowcode is currently a spec of new bytecode instructions. These instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls

I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.

These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.

Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.

Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.

During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.

After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.

Lack of documentation ?

About Cog there are these documentation:
Back to the future
About VMMaker
Object engine
General information
Blue book part 4
Deep into Pharo part 4 about blocks and exceptions
VMIL paper about Cogit
The Cog blog
About Spur: summary and object format
This post
And many useful class and method comments that taught me a lot.

When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and even fewer people writing documentation. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.

Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".

It's also difficult because the first tests to write are the hardest to write.

I am so happy to have your involvement. You and Clément bring such strength and competence.

---------------------------------------------------------------
- Smalltalk -> LLVM ?

As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.

To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.

Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.

In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.

Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.

Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.

For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.

The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But

a) I've yet to see anybody raise JIT latency as an issue in Cog
b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.

And of course, you have to perform lot of profiling.

Early and often :-).

Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.

The solution of Eliot makes sense.
To write a paper about that I need benchs showing result on real time applications.
So there's quite some work to do before.

Greetings,
Ronie

2014-09-15 16:38 GMT-03:00 Craig Latta <[hidden email]>:

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
best,
Eliot

philippeback

Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

In reply to this post by Ronie Salgado

On Tue, Sep 16, 2014 at 1:48 PM, Thierry Goubier <[hidden email]> wrote:

>
>
>
> 2014-09-16 13:14 GMT+02:00 Ben Coman <[hidden email]>:
>>
>>
>> Don't worry/don't bother with thoses: you will never use Smalltalk or a VM :) It will never be certified by authorities, and the industry will never accept it.
>>
>>
>> You are probably right for those two examples, but there are other not-so-regulated domains where real-time is useful - e.g. industrial automation and robotics.
>
>
> Real-time is usefull there, yes. But Smalltalk and Cog will never get there. Except as a DSL / code generator tool (which means a MDE approach, more or less).
>
> (And code generation is where Pharo to C or LLVM-IR gets us interested)
>
> Dynamic optimisations, lack of static typing: they will laugh you out in any of those fields.
>
> Even if their developpers use Python behind their back.
>

http://www.altreonic.com/ has OpenComRTOSwhich looks the kind of thing we would like to target then.

I know the creator of OpenCOMRTOS, in fact, he lives close to my place.

http://www.altreonic.com/sites/default/files/Altreonic%20OpenComRTOS2013.pdf

They happen to have a "VM"

http://www.altreonic.com/sites/default/files/Altreonic%20Safe%20Virtual%20Machine_C_2013.pdf

"The ultra small target independent Virtual Machine"

Applications:

 Remote diagnostics.

 Fail safe and fault tolerant control.

 Processor independent programming.

Thanks to the use of OpenComRTOS, SafeVM tasks can operate system wide across

all nodes in the network. The user can also put several SafeVM tasks on the same

node. The natively running OpenComRTOS itself acts as a virtual machine for the SVM

tasks, isolating them from the underlying hardware details while providing full access.

Safe Virtual Machine for C

So, looks like we aren't in such a black and white situation.

Phil

>
> Thierry

Clément Béra

Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

In reply to this post by philippeback

2014-09-16 14:55 GMT+02:00 [hidden email] <[hidden email]>:

What would be valuable is a reading list / path to VM enlightenment.

Bluebook is useful
Then a tour of the Object Engine by Tim
Then plugin articles + Slang
The bytecode set
Primitive...
Context to stack mapping
Blocks
Non local returns
Display/Sensor/event look/timer implementation (like in the porting document).
and only then one would move to more advanced topics.

I saw that Clement had a set of VM related books on his desk at INRIA, maybe posting the list would be great!

The book that explains the best how to implement a high performance VM for Smalltalk and why is Urs Holzle phd.

Other relevant books on my office focus on specific topics, such as Advanced Compiler Design and Implementation by Steven Muchnick for optimizing compilers or The garbage collection handbook by Richard Jones, Antony Hosking and Eliot Moss.

All the best,
Phil

On Tue, Sep 16, 2014 at 11:48 AM, Clément Bera <[hidden email]> wrote:

2014-09-16 1:46 GMT+02:00 Eliot Miranda <[hidden email]>:

Hi Ronie,

On Mon, Sep 15, 2014 at 2:37 PM, Ronie Salgado <[hidden email]> wrote:

Hello,

I am segmenting this mail in several sections.

---------------------------------------------------------------
- On Lowcode and Cog

I have been working in the last week with the Cog VM, implementing the Lowcode instructions in Cog.

remember to send me code for integration. I'm eagerly waiting to use your code!

Lowcode is currently a spec of new bytecode instructions. These instructions can be used for:
- Implementing a C like language compiler.
- Making FFI calls

I am implementing these instructions using a feature of the new bytecode set for SistaV1, which is called "inline primitives". Because of this, these new instructions can be mixed freely with the standard VM bytecode set. This also allows the Sista adaptive optimizer to inline FFI calls.

These instructions provides features for:
- Int32 and Int64 integer arithmetic without type checking.
- Pointers, with arithmetics.
- Memory access and memory manipulation.
- Single and double precision floating point arithmetics.
- Conversion between primitive types.
- Boxing and unboxing of primitive types.
- Unchecked comparisons.
- Native function call. Direct and indirect calls.
- The atomic operation compare and swap.
- Object pin/unpin (requires Spur).
- VM releasing and grabbing for threaded ffi.

Current I have implemented the following backends:
- A C interpreter plugin.
- A LLVM based backend.

Currently I am working in getting this working using the Cog code generator. So far I am already generating code for int32/pointer/float32/float64. I am starting to generate C functions calls and object boxing/unboxing.

During this work I learned a lot about Cog. Specially that Cog is missing a better Slang generator, that allows to force better inlining and more code reviews. There is a lot of code duplication in Cog, that can be attributed to limitations of Slang. In my opinion, if we could use Slang not only for building the VM we should end with a better code generator. In addition we, need more people working in Cog. We need people that performs code reviews and documentation of Cog.

After these weeks, I learned that working in Cogit it is not that hard. Our biggest problem is lack of documentation. Our second problem could be the lack of documentation about Slang.

Lack of documentation ?

About Cog there are these documentation:
Back to the future
About VMMaker
Object engine
General information
Blue book part 4
Deep into Pharo part 4 about blocks and exceptions
VMIL paper about Cogit
The Cog blog
About Spur: summary and object format
This post
And many useful class and method comments that taught me a lot.

When I try to work with Pharo frameworks, even recent ones, it is very rare that I see as much documentation than it exists for Cog. Some frameworks are documented in the Pharo books and a few other as Zinc have good documentation, but in general, there are few documentation and even fewer people writing documentation. The website about Cog has existed for over 6 years now. I think Cog is far from the worst documented part of Pharo.

Yes, and that's difficult because it's a moving target and I have been lazy, not writing tests, instead using the Cog VM as "the test".

It's also difficult because the first tests to write are the hardest to write.

I am so happy to have your involvement. You and Clément bring such strength and competence.

---------------------------------------------------------------
- Smalltalk -> LLVM ?

As for having a Smalltalk -> LLVM code generator. The truth is that we will not gain anything. LLVM is a C compiler, which is designed to optimize things such as loops with lot of arithmetics. It is designed to optimize large sections of code. In Smalltalk, most of our code is composed mostly of message sends. LLVM cannot optimize a message send.

To optimize a message send, you have to determine which is the method that is going to respond to the message. Then you have to inline the method. And then you can start performing the actual optimizations, such as constant folding, common subexpressions, dead branch elimination, loop unrolling, and a long etc.

Because we don't have information in the actual language (e.g. static types a la C/C++/Java/C#) that tells us what is going to be the actual method invoked by a message send, we have the following alternatives to determine it:
- Don't optimize anything.
- Perform a costly static global analysis of the whole program.
- Measure in runtime and hope for the best.
- Extend the language.

In other words, our best bet is in the work of Clément in Sista. The only problem with this bet are real time applications.

Ah! But! Sista has an advantage that other adaptive optimizers don't. Because it optimizes from bytecode to bytecode it can be used during a training phase and then switched off.

Real time applications requires an upper bound guarantee in their response time. In some cases, the lack of this guarantee can be just an annoyance, as happens in video games. In some mission critical applications the results can not be good, if this time constraint is not met. An example of a mission critical system could the flight controls of an airplane, or the cooling system of a nuclear reactor.

For these application, it is not possible to rely in an adaptive optimizer that can be triggered sometimes. In these application you have to either:
- Extend the language to hand optimize some performance critical sections of code.
- Use another language to optimize these critical section.
- Use another language for the whole project.

The additional option is to "train" the optimizer by running the application before deploying and capturing the optimised methods. Discuss this with Clément and he'll explain how straight-forward it should be. This still leaves the latency in the Cogit when it compiles from bytecode to machine code. But

a) I've yet to see anybody raise JIT latency as an issue in Cog
b) it would be easy to extend the VM to cause the Cogit to precompile specified methods. We could easily provide a "lock-down" facility that would prevent Cog from discarding specific machine code methods.

And of course, you have to perform lot of profiling.

Early and often :-).

Because we can have complete control over the optimizer, and because Sista is byetcode-to-bytecode and can hence store its results in the image in the form of optimized methods, I believe that Sista is well-positioned for real-time since it can be used before deployment. In fact we should emphasise this in the papers we write on Sista.

The solution of Eliot makes sense.
To write a paper about that I need benchs showing result on real time applications.
So there's quite some work to do before.

Greetings,
Ronie

2014-09-15 16:38 GMT-03:00 Craig Latta <[hidden email]>:

Hear hear!

-C

[1] http://tinyurl.com/m66fx8y (original message)

--
Craig Latta
netjam.org
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177" target="_blank">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547" target="_blank">+ 1 415 287 3547 (no SMS)

--
best,
Eliot

Eliot Miranda-2

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

In reply to this post by Göran Krampe

On Tue, Sep 16, 2014 at 12:56 AM, Göran Krampe <[hidden email]> wrote:

Hi Eliot and all!

Since I work with Ron at 3DICC and Cog is vital to us, I wanted to chime in here.

On 09/15/2014 06:23 PM, Eliot Miranda wrote:

I find this whole discussion depressing. It seems people would rather
put their energy in chasing quick fixes or other technologies instead of
contributing to the work that is being done in the existing VM. People
discuss using LLVM as if the code generation capabilities inside Cog
were somehow poor or have no chance of competing. Spur is around twice
as fast as the current memory manager, has much better support for the
FFI. Clément and I, now with help from Ronie, are making excellent
progress towards an adaptive optimizer/speculative inliner that will
give us similar performance to V8 (the Google JavaScript VM, lead by
Lars Bak, who implemented the HotSpot VM (Smalltalk and Java)) et al.

One thing you need to understand Eliot is that most of us don't have the mind power or time to be able to contribute on that level.

Time is the issue. I'm no brighter than anyone here, but I have my passion. And one can learn. Doug McPherson just contributed the ThreadedARMPlugin having never read the ABI (because he never needed to) before he started the project.

But still, a lot of us are tickled by ideas on the low level - and thus ideas like reusing LLVM, reusing some other base VM, cross compilation etc - pop up.

Don't put too much into it - I am always toying with similar ideas in my head for "fun", it doesn't mean we don't also see/know that *real* VM work like Cog is the main road.

We are trying to get person-power for a high-quality FFI and have a
prototype for a non-blocking VM. When we succeed C won't be any better
and so it won't be an interesting target. One will be able to program
entirely in Smalltalk and get excellent performance. But we need
effort. Collaboration.

Let me just mention LuaJIT2 - besides very good performance, among other things it sports a *very* good FFI. Well, in fact Lua in general has several FFIs and tons of C++ bindings tools too - so IMHO anyone doing work in that area should take a sneak peek at LuaJIT2.

And this is a truly "sore" area in Smalltalk since eternity. If we had something as solid as the stuff in the Lua community - then Cog and Smalltalk could go places where it haven't been before I suspect.

If we look at the codebase we have at 3DICC - a very large part consists of complicated plugin code to external libraries and accompanying complicated Smalltalk glue.

Also, if we compare the Lua community with the Squeak/Pharo community, it is quite obvious that the lack of really good FFI solutions leads us to "reinvent" stuff over and over, often poorly, while the Lua people simply wrap high quality external libraries and that's it. Done.

Well I hear you and think that the FFI is extremely important. That's why I implemented proper callbacks for Squeak, why Spur supports pinning, and why I did the MT prototype, and one of the main areas the Pharo team is working on.

Of course still also stems from the very different background and motives behind the two languages and their respective domains, but still.

Personally I feel so discouraged when people talk about using LLVM or
libffi or whatever instead of having the courage and energy to make our
system world-class.

Don't feel discouraged - its just that 99% of the community can't help you. :) Instead we should feel blessed that we have 1 Eliot, 1 Clement, 1 Igor and 1 Ronie. Do we have more?

Collaborators

I have the confidence in our abilities to compete
with the best and am saddened that people in the community don't value
the technology we already have and can't show faith in our abilities to
improve it further. Show some confidence and express support and above
all get involved.

Let me then make sure you know that 3DICC values *all* work in Cog *tremendously*.

As soon as you have something stable on the Linux side - we would start trying it. Just let me know, on Linux (server) we run your upstream Cog "as is". In fact, I should probably update what we use at the moment :)

Every bit of performance makes a big impact for us - but to be honest, what we would value even more than performance would be ... robustness. I mean, *really* robust. As in a freaking ROCK.

An example deployment: More than 3000 users running the client on private laptops (all Windows variants and hw you can imagine, plus some macs) and the server side running on a SLEW of FAT EC2 servers. We are talking about a whole BUNCH of Cogs running 24x7 on a bunch of servers.

Without error reports, in fact, without an ability to debug in place (run the assert VM for example, using the -blockonerror switch to freeze it when an assert fails) there's nt a lot I can do. We use a CI server to run regressions at Cadence and my boss makes sure I fix VM bugs promptly when the CI system shows them. We deploy on linux and so reliability there-on is important to us. So perhaps we can discuss how to debug your server issues.

We experience VM blow ups on the client side, both Win32 and OSX. OSX may be due to our current VM being built by clang, but I am not sure. Our Win32 VM is old, we need to rebuild it ASAP. Hard to know if these are Cog related or more likely 3DICC plugin related, but still.

There are ways of finding out.

But the client side is still not the "painful" part - we also experience Linux server side Cogs going berserk (100% CPU, no response) or just locking up or suddenly failing to resolve localhost :) etc. I suspect the networking code in probably all these cases. Here we do NOT have special 3DICC plugins so no, here we blame Cog or more likely, Socket plugin. Often? No, but "sometimes" is often enough to be a big problem. In fact, a whole new networking layer would make sense to me.

So we should talk.

Also... we need to be able to use more RAM. We are now deploying to cloud servers more and more - and using instances with 16Gb RAM or more is normal. But our Cogs can't utilize it. I am not up to speed what Spur gives us or if we in fact need to go 64 bit for that.

yes. Spur 32-bit will allow you to use a little more memory than 32-bit Cog, but tens of percent, not large factors. You'll need to go to 64-bit Spur to be able to access more than 2 or perhaps 3 Gb at the outside.

regards, Göran

--
best,

Eliot

stepharo

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

Hi Goran

> Also, if we compare the Lua community with the Squeak/Pharo community,
> it is quite obvious that the lack of really good FFI solutions leads
> us to "reinvent" stuff over and over, often poorly, while the Lua
> people simply wrap high quality external libraries and that's it. Done.

With Pharo ***every*** single day we improve the system. We asked
clement to work since more than a year with Eliot.
If people would understand that we created a consortium so that we can
put more forces on the VM parts including FFI then it would have an impact.
Now comparing lua that has been designed to interact with C and
Smalltalk is not really fair but we will get there.

We are attracting smart guys now in the VM because the spirit of the VM
guys CHANGED. I remember not so long ago Mariano being told to do his
homework.
And Mariano as well as all the smart guys in our team were shocked. How
could we expect smart guys to join and help. Now this period is over and
this is good.
We are already seeing the difference: clement, ronie and other will follow.

I hope that we will be able to edit a book based on clement blogs and
other information but this is taking time.

RMoD invested in the build and the fact that everybody can compile a VM
to attract people too.
We proposed to help at the server infrastructure to push commit
validation and we will see what can be done.

> Every bit of performance makes a big impact for us - but to be honest,
> what we would value even more than performance would be ...
> robustness. I mean, *really* robust. As in a freaking ROCK.

This is why I would like to push more regression testing.
Goran do you have a regression system for your deployement?
I wanted to check the work of Jan Vrany that he proposed to us more than
a year from now.

> Here we do NOT have special 3DICC plugins so no, here we blame Cog or
> more likely, Socket plugin. Often? No, but "sometimes" is often enough
> to be a big problem. In fact, a whole new networking layer would make
> sense to me.
>
For me I found that normal jumping over the dirt catch you after a
while: this is a law of nature. Now the point is how can we inverse the
tendency
as we started to do it.

Do you have money to put on the table for that?
Else do you prey enough to see it happening magically :) Noury and luc
were so fed up with this code that they started to rewrite it and test
it but they got exhausted after a while. Because testing network layer.
Now these are typical points that we want to discuss within the pharo
consortium.
Esteban will work on 64 bits port. This is on his official (inria)
roadmap. But again we will play it with people that want to play it.
1000/2000 Euros to be in the consortium is not even a trip to the US or
Germany.

Stef

Martin McClure-2

Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

In reply to this post by Clément Béra

On 09/16/2014 06:34 AM, Clément Bera wrote:
> The book that explains the best how to implement a high performance VM
> for Smalltalk and why is Urs Holzle phd
> <http://www.cs.ucsb.edu/~urs/oocsb/self/papers/urs-thesis.html>.

Agreed. This is good (almost required) reading for anyone who wants to
understand how to implement dynamic languages in a way that is not slow,
and to understand why performance of dynamic languages does not need to
be much slower than that of statically-typed languages.

After reading this paper, it's also good to think about the fact that it
describes work that was done over 20 years ago, and that hardware has
changed a great deal in the interim, and think hard about what
improvements might be made today over the techniques that Urs and the
Self team came up with back then.

Regards,

-Martin

Alain Rastoul-2

Re: [Pharo-dev] Parsing Pharo syntax to C/C++

In reply to this post by Eliot Miranda-2

Le 15/09/2014 18:23, Eliot Miranda a écrit :

Hi Eliot,

Not everybody has the necessary skills to help and contribute to your
work, my assembly skills are really faraway and outdated now (... little
frustration here :( ... )
but imho your work is unvaluable to pharo and smalltalk community
- just to mention it, I noticed a 30 to 50% gain in a small bench I
wrote for fun recently (a very dumb chess pawn moves generator) with the
last Spur vm
I was shocked :)
64bits + x2 perfs + non blocking (or multi threaded?) vm are giant steps
forward
that makes it possible for pharo smalltalk to compete with mainstream
technologies

Regards,

Alain