Pony for the Pharo VM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Pony for the Pharo VM

Shaping1
 

I have to admit that that language looks quite interesting, but it is not appropriate for writing an actual VM.

 

The point of the Pony language and concurrency model is to be able to write any program, even a VM, at scale and use all of the cores, with minimal programming effort (compared to what the developer must endure when explicitly managing synchronization issues).

 

One thing is having the actor object model for writing an application that scales where it can be a very desirable property, but another thing is writing a  virtual machine where you have time constraints for the just in time compiler in a single CPU to reduce initialization time.

 

A VM is a program.  You can write any program you want with Pony.  The heaps are already actor-centric and usually very tiny, just the way we need them for high-performance, real-time apps:

 

Orca: GC and Type System Co-Design for Actor Languages:

 

https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf

 

Can you be more specific about which program construct cannot be written in Pony or cannot be written to run fast enough?

 

You can write as much asynchronous and synchronous code as you wish, where and when you needed it.  The balance point is yours, as the developer, to choose. 

 

In fact, most of the methods are actually interpreted and they are only compiled after being executed several times for reducing this startup time.

 

Of course.  But this is not a problem peculiar to Pony (or C or Rust or…).  It’s just another programming task needed in the context of VM design.   Note the comment about using both JIT and AOT, as desired, during development.  In any case, if you use Pony, use a state-machine (which we all should do anyway).  If the developer cannot or will not develop the discipline needed to build state-machines, systematically, Pony or any highly concurrent, actor-based programming model will only thwart and frustrate him.  We can’t efficiently use all these cores without such a programming model. 

 

I think I would start by using the Pony compiler from a Smalltalk browser (mod the browser to accommodate the actors too).  Source code in files, searching, and scrolling are too inefficient. 

 

In this problem domain, the actor model and the thread safeness guaranteed by it does not help you at all, and deadlocks can always be produced.

 

This statement is incorrect in the case of Pony (but we have a terminology problem; see below), and this was the main reason for the post.  Here is the gist again:  Deadlocks and data-races are not possible in a Pony program that compiles.  This is mathematically guaranteed.  You can glean this fact from the videos or study the details in Pony papers.

 I did have deadlocks problems with synchronous messages in Erlang

 

Pony is not Erlang.  It is like Erlang is many ways, but vastly improved.  My qualified (“in the round, for starters”) comparison was a mistake.  I should have omitted the whole comment.  The Erlang/Pony comparison never goes well.  Pony is a different animal.  And it’s seriously powerful. Forget about Erlang.  Really.  Just forget about it.  Use what was learnt there, and implement it in Pony. 

 

which forced me to go into using asynchronous messages, but if you do not model your domain state machine you could also end with a deadlock by using asynchronous messages, or even worse, an inconsistent state such as an incomplete credit card transaction in a highly distributed system!

 

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creating is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and tend not to code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view. 

 

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

 

See above concerning the difference between livelock and deadlock. 

 

Going back to the task of developing a VM, you also need to be able to perform dangerous memory accessing operations for at least the following three tasks:

1. Implementing the garbage collector.

 

Actually no; there is nothing dangerous here.  In Pony, a separate heap exists for each actor.  These are generally tiny, and come and go quickly.   If they are not tiny or at least very simple/uniform in structure, they should be made so by refactoring actor scope, until they are.  Smallness and clarity of purpose are the main criteria for determining whether you’ve written an actor well.  Those two properties also greatly ease debugging of the actor.  If you have a big actor not factored as a network of smaller ones, you’ve done something wrong, or you’ve just started your state machine, and have some factoring yet to do.  You still have classes, but these are an organizational tool for synchronous code used by Actors and their asynchronous behaviors.

 

2. Direct access to object slots for implementing the bytecode interpreter.

 

Not a problem.  It’s just a program feature.   So we write it as it needs to be.   

 

3. Copying compiled machine code into executable memory and performing position dependent relocations.

 

Doable.  The Pony FFI works well even at this early stage.

 

The machine code generation and installation can be separated in two stage (the current VM just generates the code directly into the executable memory), and in fact having these two separated stages for compilation and installation is a requirement for operating systems that enforce W^X page level permissions, specially if you want a concurrent VM.

 

Then do that. 

 

You need to install the executable code in an atomic way, so you need to suspend the threads while changing the executable permission into the writable permission. You may get away of this restriction if you are allowed to map the same physical memory into different virtual addresses ranges with different permissions (one writeable, and one executable).

 

…as needed.

 

Pony is changing quickly:  https://ponylang.zulipchat.com/.

 

It’s being improved weekly.  The Pony group are also working on a security model (to deal with attacks via FFI and other sources), but this will be some time coming.  Feel free to contribute.  The language is highly moldable, especially at this stage.  I think the version is 0.33.  If you need a feature or convenience not present, request it.  The group is very responsive, and eager to improve the tool.

If there were no Smalltalk, I would certainly use Pony before C, Rust, or Go, even at this early pre-1.0 stage.

 

For these reasons, you need an unsafe language such as C/C++ for at least these tasks, or a language that allows you to turn off the type and memory safeness net. I heard that Rust has an unsafe pointer that you could also for these purposes.

 

The Pony FFI will work here.  C libs are sometimes needed, and operations in C code are of course not guaranteed to be safe, in any case. 

 

The Pharo team is going with the existing virtual machine for quite a while. You cannot just replace something that is not well documented by something new, specially when you do not have that many resources for making a new vm.

 

One of the main drivers in the choice of Pony is to reduce the resources needed to create a highly performant, simple VM.  One notable, Pharo/Smalltalk-related problem for high-speed apps is stop-the-world GCing in the one large system heap.  This won’t work for high-performance, highly concurrent, highly scalable apps, especially not for for real-time ones, and must go away, if latencies approaching deterministic are to be achieved.  Pony has already solved this problem with per-actor heaps.  That design feature is very interesting because it represents much hard work that need not be done. 

 

 

You first need to document completely the existing one, the semantics of the bytecodes, how they are implemented, and also the same with the primitives.

 

Agreed.  I’m not claiming that VM development is easy or trivial, generally or via Pony.  VMs are arguably one of the most complicated things that humans create (not a compliment).  But Pony solves more concurrency-related problems at compile-time than any other available tool.

 

How complete is the documentation on the current Pharo VM? 

 

As for myself, I am putting my bets on another language that I am developing (Sysmel: https://github.com/ronsaldo/sysmel ), and in full ahead-of-time compilation, but my problem domain is video game programming, low-level operating system, driver development and embedded programming where I actually want to have control of the machine.

 

I’ve read some on Sysmel.  It appears to be an outstanding tool.

 

Does Sysmel implement multicore concurrency, and guarantee no deadlocks/data-races on compile?

Pony, the language and concurrency model, will not stop us from doing anything that can be done with the machine and OS.  The Pony language is not as interesting to me as its concurrency model.  Don’t like Pony syntax (and I don’t)?  Then fork and change it.  You have all the source.  (I’d prefer to see keyword selectors everywhere, even without the usual attendant polymorphism.  Seriously.) 

 

Since I have written my compiler in Pharo, I can just reuse the Opal Compiler for doing AST to AST translation and just compile Pharo (with some limitations, for example no thisContext) into my runtime environment. If I want a more dynamic environment, I can also serialize a Pharo CompiledMethod, send it through a socket and then interpret it on the Sysmel side: https://github.com/ronsaldo/sysmel/blob/master/module-sources/Sysmel.Core/Smalltalk-Bootstrap/InterpretedMethod.sysmel by just reusing the existing language semantics. Currently although I am just supporting Linux, and Windows support is coming in a couple of weeks after getting a proper module system working for reducing compilation times. For the backend I am using LLVM, wasm is not yet supported because I am generating some IR that are not supported by the wasm backend

 

Wasm and WASI will be good when they are ready.

 

(vtable layouts, and some intrinsics that I am using for non-local returns), but they should not be that complicated to fix.

 

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

 

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.

 

 

Best,

 

Shaping

 

 

 

 

El lun., 6 abr. 2020 a las 6:05, Shaping (<[hidden email]>) escribió:

I should have initially posted this to the Pharo-dev list, as well.

 

From: Pharo-users [mailto:[hidden email]] On Behalf Of Shaping
Sent: Friday, 3 April, 2020 14:05
To: 'Any question about pharo is welcome' <[hidden email]>
Subject: Re: [Pharo-users] Latest PharoJS Success Story; Wasm/WASI; very keen on Pony for the Pharo VM

 

All:

> Brain Treats got stuck during launch on my LG.

>

Which android version are you using ?

 

The phone is old and this is likely the problem.

 

Android version:  4.4.2

Kernel version:  3.4.0

 

> Is there a plan to move PharoJS to Wasm/WASI?

>

Dave and I talked about it a long time ago. This sounds like a good idea.

Actually, Dave has a very ambition idea = turn PharoJS into Pharo* where * can be different targets.

But, there's a lot to do before reaching this goal. So, don't expect it any time soon.

 

Not to change the topic too much, but the following is related and I often think of it…

 

Consider writing the pharo VM in Wasm or, better, with Pony (which can emit Wasm, as needed).  Pony’s reference-capability-based (ref-cap) concurrency-model guarantees provably that no data-races or deadlocks can happen if the code compiles; this solves a very large class of extremely ugly concurrency problems that no one ever wants to face.

 

Pony gives high-performance concurrency (5 to 15 ns actor-thread switching time, depending on platform), and solves the most difficult class of synchronization problems at compile time.  It runs as fast as C.  It runs faster than C, as concurrency scales.  You can’t scale a highly concurrent app efficiently in C, and really shouldn’t try if you wish to remain happy and mentally healthy.

 

Pony is still pre-1.0, but the group is very active and competent.  I think we should consider using it to build the VM.  Have a look.  Some videos for your amusement and information:

 

 

https://www.youtube.com/watch?v=ODBd9S1jV2s

https://www.youtube.com/watch?v=u1JfYa413fY

https://www.youtube.com/watch?v=fNdnr1MUXp8

https://www.ponylang.io/

 

There are many others.  I mentioned the Pony concurrency architecture around the holidays, but there was no interest from the list—not a good time perhaps.

 

The tentative plan is to do what Google does with Flutter:  have the JIT in support of the usual dynamicity a Smalltalker needs for rapid development; and have AOT, fully optimized compiling for production or speed-related reality checks, presumably needed less often during development.  There are other possibilities. 

 

Anyone interested?   

 

I have some ideas for simplifying use of the six ref caps in the context of Pharo/Smalltalk.  If this path is chosen, one must commit to strict state-machine-based algorithm development, without exception.  This should have happened anyway by now, broadly in the programming space, but didn’t.  I’m working on a programming graphical tool and associated grammar (in VW) that make state-machine development easy and attractive.  This , besides efficient use of machine resources, is the other reason for pushing in this direction. 

 

A Pony program is built from a net of asynchronously communicating actors.   You change the state of your program with asynchronous messaging between actors.  There is no blocking--no mutexes or semaphores—and therefore no wasted CPU cycles or mushrooming program complexity, as you try to use mutexes in a fine-grained way (a very bad idea).  And as mentioned, there are never deadlocks or data-races.  All cores on all CPUs stay busy, always, until the program goes idle or exits.  The Pony group is also working on extending the model to the network level, so that all machine nodes in the network stay busy.  In the round, as a start, think of Pony as Erlang/OTP, but much faster, with no legacy bugs, and provably no-deadlocking on compile.

 

The asynchronous actor model is the programming pattern that Kay had in mind when he said “object-oriented.”  It’s the one I want to implement in Pharo.  The green threads are light, but don’t efficiently use the cores, and a net of VMs with their respective images still communicate too slowly.

 

I your time permits, please study Pony for a bit, before rejecting the idea as too big a change in direction or too complicated.  Using Pony looks like the ideal VM simplification strategy, if our aim is efficient use of networks of machines, each with at least one CPU (often more), each, in turn, with many cores (whose numbers are still increasing).  This pattern in hardware probably won’t be changing much, now that speeds are topping out.  Winning the performance game is therefore about efficiently using many cores at once, without burdening the programmer.  I don’t see a better way to do this now than with Pony.

 

Thoughts and suggestions are welcome.

 

 

Shaping

 

 

 

 

> -----Original Message-----

> From: Pharo-users [[hidden email]] On

> Behalf Of N. Bouraqadi

> Sent: Tuesday, 28 January, 2020 12:18

> To: Any question about pharo is welcome <[hidden email]>

> Subject: [Pharo-users] Latest PharoJS Success Story

>

> The latest PharoJS-powered smartphone app is now live.

> Development has been made using Pharo.

> Then, javascript code is generated using PharoJS.

> Last, the app is built to target both iOS and Android thanks to Apache

> Cordova.

>

> Learn more and Download at

> https://nootrix.com/projects/brain-treats-app/

>

> Noury

>

>

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Pony for the Pharo VM

Ronie Salgado
 
Dear Shaping,

You seem to misunderstand the problem. This is not a technical problem, but a political problem. The language seems very cool, but it still have to be proven in production by a large company that shows that is actually being used, it is maintained and will be keep being maintained for decades, which is something that can be said about about C/C++ (Many companies) and Rust (Mozilla) (Go (Google) is out of the question because of its mandatory garbage collector). All of the people already working on the VM has their own political and technical agendas, and they will continue working on that. Mathematical and safeness proofs are not enough. The usage of Slang is also bad on this regard, but with the difference that there is already an existing open source VM that works (and used by several companies), and making a new VM is a high risk endeavor from a technical point of view (and political and business also).

This means that if you really want to use Pony for making a Pharo or Squeak VM, then you are going to have to make it yourself, or to pay someone else willing to do it. You will not be able convince the existing people to adopt a new language unless you have something running in that language that actually proves that is worth it to change.

How complete is the documentation on the current Pharo VM? 

 Very incomplete. The best documentation is the code itself, then you have old wiki articles in the Squeak wikis, and some technical articles in Eliot Miranda and Clément Bera blogs. There are some additional ongoing efforts on documenting the VM from the Pharo people: https://github.com/SquareBracketAssociates/Booklet-PharoVirtualMachine and here: https://github.com/pharo-open-documentation/pharo-wiki/tree/master/PharoVirtualMachine

I also wrote a very simplistic vm in pure C that can load a Pharo 64 bits image, interpret the bytecodes here: https://github.com/ronsaldo/crankvm . This cannot be used for actually running Pharo because most of the primitives are not yet implemented. I wrote this mostly for documenting myself on how things work, and for getting into that I had to read several parts of the existing vm sources, and the blue book ( http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf ).

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creating is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and tend not to code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view. 

 

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

 

See above concerning the difference between livelock and deadlock.

 
Thanks for the correct terminology. But from the point of view of the user they are same. And a livelock seems to be far more dangerous than a traditional deadlock for which you already have some debugging tools that are relatively easy to use. You can create a pthread_mutex with deadlock detection for detecting bug, or run your program through valgrind for detecting race conditions and deadlocks. The point of this is that the language safety net is not enough, and in fact it could actually be dangerous because it can instill a false sense of safeness. That false sense of safeness can be very dangerous in real world projects with deadlines that have to be met, and non-technical managers which are going to underestimate the deadline far more on this false sense of safeness. And since in the real world deadlines have to bet met, bugs and incomplete solutions are simply shipped.
 

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

 

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.


I am developing the language, and if I want or need I can create a minimal vm suitable for my purpose (I am not even trying to convince existing people to use it for now, I will when I have the next version of Woden running on it, and they will be able to just use a Pharo subset for scripting), for which I can adapt into my purposes and needs, which are currently only being fulfilled by C and C++ (and perhaps Rust, but I found it too complicated). I would just use C++ if it had runtime reflection, and easy to do, reliable and robust live programming support but it does not have these crucial features. You can implement live programming in C++ by serializing your program state, changing the dll with your code, and then deserialize your program state. Program state serialization is not easy.

Real multi-threading on Sysmel is already supported when you use it without garbage collection (the language is strongly modeled after C++11 when used in this way). When you enable the GC for Smalltalk semantics, it is currently using the Boehm conservative GC (I am using it for getting things running), so the concurrency will be limited by the stop the world semantics of the GC. When I implement a proper GC, I will be able to dissociate threads that only use the native runtime from the GC. But for "many threads", and high parallelism, what I use is the GPU where Sysmel can also be used as a shading language by generating Spir-V for Vulkan consumption (for this the llvm backend is not used, it does not expose all of the required semantics, unless the changed it recently). As for the actor model and by default non-blocking semantics, I just do not care about them (for now) because I know they bring their own problem that can have their impact on other parts of system (e.g. non-blocking IO everywhere and by default, on operating systems that do not support it very well). Instead I prefer explicit semantics, and if the user wants actor, then implement them as just another library in the system.

BTW: how do you implement an actor? with a message queue, and how do you implement a message queue? with a mutex and a condition variable. What about deadlock? you are only taking a single mutex on the queue, so you cannot have a deadlock, since you need at least two mutexes for the dead lock that are taken in a different order. If you have N mutexes, for not having a deadlock you only have to take these N mutexe in the same order. That sounds easy, but in the practice is not and you would have to sort the actual addresses of the mutexes, and in many cases you will realize that you need to take an additional mutex after having already taken other mutexes (I know it from practice). The mutex of the queue is always the one taken last, and that is why it is safe to use the queue for message passing.

So, what is so special about these actor languages? they actually use a M:N threading model where they actually use green threads in order to multiplex multiple cooperative threads into the different cores of the CPU. They only need to create N native threads that are pinned into the different cores of the CPU (sched_setaffinity in Linux), and the green threads are created by allocating stack memory and having a simple trampoline that stores all of the caller saved registers in the stack, switches the stack pointer, restores these same registers and returns. BTW the operating system level context switching machinery is also implemented on the same way, but it could have some additional instructions for returning into unprivileged code.

These 5 to 15 ns figures typically come from switching the stack. And with 64 bits addresses the simplest way for allocating the stack memory is by just allocating large uncommitted memory for the stack memory, or use some other fancier schemes (See here: https://blog.cloudflare.com/how-stacks-are-handled-in-go/ ). You would also need a task queue (or process queue in OS terminology) for not blocking the different actors when they are waiting for messages. There are several papers that discuss how to implement this task queue which tends to be the main bottleneck of these system. In the case of userspace, since the OS is not aware of you green threads, you also need asynchronous IO for everything so that an IO operation does not block all of the tasks that running in a single core (I believe that I read from a different mechanism from a paper from Go that is more flexible, but I forgot about it).

If you do not actually need a coroutine, and model your actors in terms of a single function application that receives a single message and process it, then you only need a single stack. (Fetch message from queue, apply the function, done), and you do not even have that 5-15 ns overhead for stack switching (you may have another overhead on pushing the pending actor on the ready queue that could outweigh the performance advantage). This is similar to a traditional asynchronous job system for which you only need a thread pool. And this can also be implemented in C/C++.

Best regards,
Ronie

El jue., 9 abr. 2020 a las 7:43, Shaping (<[hidden email]>) escribió:
 

I have to admit that that language looks quite interesting, but it is not appropriate for writing an actual VM.

 

The point of the Pony language and concurrency model is to be able to write any program, even a VM, at scale and use all of the cores, with minimal programming effort (compared to what the developer must endure when explicitly managing synchronization issues).

 

One thing is having the actor object model for writing an application that scales where it can be a very desirable property, but another thing is writing a  virtual machine where you have time constraints for the just in time compiler in a single CPU to reduce initialization time.

 

A VM is a program.  You can write any program you want with Pony.  The heaps are already actor-centric and usually very tiny, just the way we need them for high-performance, real-time apps:

 

Orca: GC and Type System Co-Design for Actor Languages:

 

https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf

 

Can you be more specific about which program construct cannot be written in Pony or cannot be written to run fast enough?

 

You can write as much asynchronous and synchronous code as you wish, where and when you needed it.  The balance point is yours, as the developer, to choose. 

 

In fact, most of the methods are actually interpreted and they are only compiled after being executed several times for reducing this startup time.

 

Of course.  But this is not a problem peculiar to Pony (or C or Rust or…).  It’s just another programming task needed in the context of VM design.   Note the comment about using both JIT and AOT, as desired, during development.  In any case, if you use Pony, use a state-machine (which we all should do anyway).  If the developer cannot or will not develop the discipline needed to build state-machines, systematically, Pony or any highly concurrent, actor-based programming model will only thwart and frustrate him.  We can’t efficiently use all these cores without such a programming model. 

 

I think I would start by using the Pony compiler from a Smalltalk browser (mod the browser to accommodate the actors too).  Source code in files, searching, and scrolling are too inefficient. 

 

In this problem domain, the actor model and the thread safeness guaranteed by it does not help you at all, and deadlocks can always be produced.

 

This statement is incorrect in the case of Pony (but we have a terminology problem; see below), and this was the main reason for the post.  Here is the gist again:  Deadlocks and data-races are not possible in a Pony program that compiles.  This is mathematically guaranteed.  You can glean this fact from the videos or study the details in Pony papers.

 I did have deadlocks problems with synchronous messages in Erlang

 

Pony is not Erlang.  It is like Erlang is many ways, but vastly improved.  My qualified (“in the round, for starters”) comparison was a mistake.  I should have omitted the whole comment.  The Erlang/Pony comparison never goes well.  Pony is a different animal.  And it’s seriously powerful. Forget about Erlang.  Really.  Just forget about it.  Use what was learnt there, and implement it in Pony. 

 

which forced me to go into using asynchronous messages, but if you do not model your domain state machine you could also end with a deadlock by using asynchronous messages, or even worse, an inconsistent state such as an incomplete credit card transaction in a highly distributed system!

 

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creating is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and tend not to code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view. 

 

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

 

See above concerning the difference between livelock and deadlock. 

 

Going back to the task of developing a VM, you also need to be able to perform dangerous memory accessing operations for at least the following three tasks:

1. Implementing the garbage collector.

 

Actually no; there is nothing dangerous here.  In Pony, a separate heap exists for each actor.  These are generally tiny, and come and go quickly.   If they are not tiny or at least very simple/uniform in structure, they should be made so by refactoring actor scope, until they are.  Smallness and clarity of purpose are the main criteria for determining whether you’ve written an actor well.  Those two properties also greatly ease debugging of the actor.  If you have a big actor not factored as a network of smaller ones, you’ve done something wrong, or you’ve just started your state machine, and have some factoring yet to do.  You still have classes, but these are an organizational tool for synchronous code used by Actors and their asynchronous behaviors.

 

2. Direct access to object slots for implementing the bytecode interpreter.

 

Not a problem.  It’s just a program feature.   So we write it as it needs to be.   

 

3. Copying compiled machine code into executable memory and performing position dependent relocations.

 

Doable.  The Pony FFI works well even at this early stage.

 

The machine code generation and installation can be separated in two stage (the current VM just generates the code directly into the executable memory), and in fact having these two separated stages for compilation and installation is a requirement for operating systems that enforce W^X page level permissions, specially if you want a concurrent VM.

 

Then do that. 

 

You need to install the executable code in an atomic way, so you need to suspend the threads while changing the executable permission into the writable permission. You may get away of this restriction if you are allowed to map the same physical memory into different virtual addresses ranges with different permissions (one writeable, and one executable).

 

…as needed.

 

Pony is changing quickly:  https://ponylang.zulipchat.com/.

 

It’s being improved weekly.  The Pony group are also working on a security model (to deal with attacks via FFI and other sources), but this will be some time coming.  Feel free to contribute.  The language is highly moldable, especially at this stage.  I think the version is 0.33.  If you need a feature or convenience not present, request it.  The group is very responsive, and eager to improve the tool.

If there were no Smalltalk, I would certainly use Pony before C, Rust, or Go, even at this early pre-1.0 stage.

 

For these reasons, you need an unsafe language such as C/C++ for at least these tasks, or a language that allows you to turn off the type and memory safeness net. I heard that Rust has an unsafe pointer that you could also for these purposes.

 

The Pony FFI will work here.  C libs are sometimes needed, and operations in C code are of course not guaranteed to be safe, in any case. 

 

The Pharo team is going with the existing virtual machine for quite a while. You cannot just replace something that is not well documented by something new, specially when you do not have that many resources for making a new vm.

 

One of the main drivers in the choice of Pony is to reduce the resources needed to create a highly performant, simple VM.  One notable, Pharo/Smalltalk-related problem for high-speed apps is stop-the-world GCing in the one large system heap.  This won’t work for high-performance, highly concurrent, highly scalable apps, especially not for for real-time ones, and must go away, if latencies approaching deterministic are to be achieved.  Pony has already solved this problem with per-actor heaps.  That design feature is very interesting because it represents much hard work that need not be done. 

 

 

You first need to document completely the existing one, the semantics of the bytecodes, how they are implemented, and also the same with the primitives.

 

Agreed.  I’m not claiming that VM development is easy or trivial, generally or via Pony.  VMs are arguably one of the most complicated things that humans create (not a compliment).  But Pony solves more concurrency-related problems at compile-time than any other available tool.

 

How complete is the documentation on the current Pharo VM? 

 

As for myself, I am putting my bets on another language that I am developing (Sysmel: https://github.com/ronsaldo/sysmel ), and in full ahead-of-time compilation, but my problem domain is video game programming, low-level operating system, driver development and embedded programming where I actually want to have control of the machine.

 

I’ve read some on Sysmel.  It appears to be an outstanding tool.

 

Does Sysmel implement multicore concurrency, and guarantee no deadlocks/data-races on compile?

Pony, the language and concurrency model, will not stop us from doing anything that can be done with the machine and OS.  The Pony language is not as interesting to me as its concurrency model.  Don’t like Pony syntax (and I don’t)?  Then fork and change it.  You have all the source.  (I’d prefer to see keyword selectors everywhere, even without the usual attendant polymorphism.  Seriously.) 

 

Since I have written my compiler in Pharo, I can just reuse the Opal Compiler for doing AST to AST translation and just compile Pharo (with some limitations, for example no thisContext) into my runtime environment. If I want a more dynamic environment, I can also serialize a Pharo CompiledMethod, send it through a socket and then interpret it on the Sysmel side: https://github.com/ronsaldo/sysmel/blob/master/module-sources/Sysmel.Core/Smalltalk-Bootstrap/InterpretedMethod.sysmel by just reusing the existing language semantics. Currently although I am just supporting Linux, and Windows support is coming in a couple of weeks after getting a proper module system working for reducing compilation times. For the backend I am using LLVM, wasm is not yet supported because I am generating some IR that are not supported by the wasm backend

 

Wasm and WASI will be good when they are ready.

 

(vtable layouts, and some intrinsics that I am using for non-local returns), but they should not be that complicated to fix.

 

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

 

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.

 

 

Best,

 

Shaping

 

 

 

 

El lun., 6 abr. 2020 a las 6:05, Shaping (<[hidden email]>) escribió:

I should have initially posted this to the Pharo-dev list, as well.

 

From: Pharo-users [mailto:[hidden email]] On Behalf Of Shaping
Sent: Friday, 3 April, 2020 14:05
To: 'Any question about pharo is welcome' <[hidden email]>
Subject: Re: [Pharo-users] Latest PharoJS Success Story; Wasm/WASI; very keen on Pony for the Pharo VM

 

All:

> Brain Treats got stuck during launch on my LG.

>

Which android version are you using ?

 

The phone is old and this is likely the problem.

 

Android version:  4.4.2

Kernel version:  3.4.0

 

> Is there a plan to move PharoJS to Wasm/WASI?

>

Dave and I talked about it a long time ago. This sounds like a good idea.

Actually, Dave has a very ambition idea = turn PharoJS into Pharo* where * can be different targets.

But, there's a lot to do before reaching this goal. So, don't expect it any time soon.

 

Not to change the topic too much, but the following is related and I often think of it…

 

Consider writing the pharo VM in Wasm or, better, with Pony (which can emit Wasm, as needed).  Pony’s reference-capability-based (ref-cap) concurrency-model guarantees provably that no data-races or deadlocks can happen if the code compiles; this solves a very large class of extremely ugly concurrency problems that no one ever wants to face.

 

Pony gives high-performance concurrency (5 to 15 ns actor-thread switching time, depending on platform), and solves the most difficult class of synchronization problems at compile time.  It runs as fast as C.  It runs faster than C, as concurrency scales.  You can’t scale a highly concurrent app efficiently in C, and really shouldn’t try if you wish to remain happy and mentally healthy.

 

Pony is still pre-1.0, but the group is very active and competent.  I think we should consider using it to build the VM.  Have a look.  Some videos for your amusement and information:

 

 

https://www.youtube.com/watch?v=ODBd9S1jV2s

https://www.youtube.com/watch?v=u1JfYa413fY

https://www.youtube.com/watch?v=fNdnr1MUXp8

https://www.ponylang.io/

 

There are many others.  I mentioned the Pony concurrency architecture around the holidays, but there was no interest from the list—not a good time perhaps.

 

The tentative plan is to do what Google does with Flutter:  have the JIT in support of the usual dynamicity a Smalltalker needs for rapid development; and have AOT, fully optimized compiling for production or speed-related reality checks, presumably needed less often during development.  There are other possibilities. 

 

Anyone interested?   

 

I have some ideas for simplifying use of the six ref caps in the context of Pharo/Smalltalk.  If this path is chosen, one must commit to strict state-machine-based algorithm development, without exception.  This should have happened anyway by now, broadly in the programming space, but didn’t.  I’m working on a programming graphical tool and associated grammar (in VW) that make state-machine development easy and attractive.  This , besides efficient use of machine resources, is the other reason for pushing in this direction. 

 

A Pony program is built from a net of asynchronously communicating actors.   You change the state of your program with asynchronous messaging between actors.  There is no blocking--no mutexes or semaphores—and therefore no wasted CPU cycles or mushrooming program complexity, as you try to use mutexes in a fine-grained way (a very bad idea).  And as mentioned, there are never deadlocks or data-races.  All cores on all CPUs stay busy, always, until the program goes idle or exits.  The Pony group is also working on extending the model to the network level, so that all machine nodes in the network stay busy.  In the round, as a start, think of Pony as Erlang/OTP, but much faster, with no legacy bugs, and provably no-deadlocking on compile.

 

The asynchronous actor model is the programming pattern that Kay had in mind when he said “object-oriented.”  It’s the one I want to implement in Pharo.  The green threads are light, but don’t efficiently use the cores, and a net of VMs with their respective images still communicate too slowly.

 

I your time permits, please study Pony for a bit, before rejecting the idea as too big a change in direction or too complicated.  Using Pony looks like the ideal VM simplification strategy, if our aim is efficient use of networks of machines, each with at least one CPU (often more), each, in turn, with many cores (whose numbers are still increasing).  This pattern in hardware probably won’t be changing much, now that speeds are topping out.  Winning the performance game is therefore about efficiently using many cores at once, without burdening the programmer.  I don’t see a better way to do this now than with Pony.

 

Thoughts and suggestions are welcome.

 

 

Shaping

 

 

 

 

> -----Original Message-----

> From: Pharo-users [[hidden email]] On

> Behalf Of N. Bouraqadi

> Sent: Tuesday, 28 January, 2020 12:18

> To: Any question about pharo is welcome <[hidden email]>

> Subject: [Pharo-users] Latest PharoJS Success Story

>

> The latest PharoJS-powered smartphone app is now live.

> Development has been made using Pharo.

> Then, javascript code is generated using PharoJS.

> Last, the app is built to target both iOS and Android thanks to Apache

> Cordova.

>

> Learn more and Download at

> https://nootrix.com/projects/brain-treats-app/

>

> Noury

>

>

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Pony for the Pharo VM

Robert Withers-2
 


On 4/9/20 5:59 AM, Ronie Salgado wrote:
The point of this is that the language safety net is not enough, and in fact it could actually be dangerous because it can instill a false sense of safeness.

I totally agree! This is precisely the foundation of language politics in the Smalltalk world, if I may attempt to speak for everyone! ;) The necessity for proven program safety are sufficient unit tests. There is no other way to detect the errors that matter other than making test assertions. As you said "the language safety net is not enough", whereas robust unit testing can prove the program works correctly. This means that early-binding type systems are an onerous requirement and a complete waste of time as far as ensuring program safety. As such the readability of the code is enhanced through proper naming, not the draconian type specifications and proofings. This late binding feature, really unique to Smalltalk, in combination with a live image makes for a completely different design experience. I specify design as that is where serious errors are primarily introduced, domain errors. Smalltalk empowers the designer to address those errors. Perhaps others could share what about late-binding is a positive for them, in Squeak?

Getting a #doesNotUnderstand: exception is sufficient to detect a type protocol issue.

Another option for you is perhaps you can come up with a Pony bytecode encoder set and run Pony on the Squeak Cog Spur vm...

-- 
Kindly,
Robert
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Pony for the Pharo VM

Shaping1
In reply to this post by Ronie Salgado
 

You seem to misunderstand the problem. This is not a technical problem, but a political problem.

 

Your last e-mail seemed very technical. 

 

I seek a technical solution to a technical problem.  The solution is to write, with as much leverage as possible, a simple, fast, easy to maintain VM (that one person can maintain), using a well- maintained (by other people, mostly) tool (Pony, currently) that provably guarantees type safety and integral multicore, multiCPU, multinode (coming) concurrency, so that we don’t have to trouble ourselves with such potentially taxing matters.  I’ll do it without community assistance if I must, but figure the community would like to be aware of this option, and perhaps learn some new skills and participate.  The Pharo VM crew seem quite fluent and capable with their current problem space, trajectory, and tool set, but are still very busy with a very complicated VM for a very long time.

 

I submit for consideration the idea that conscious, concerted complexity reduction—not making the program work and not making it run fast—is the priority.  The Pharo VM has been going on for a while now, as you mentioned.  Perhaps a change is needed.  Taking some time to study the features and merits of Pony might prove fruitful for the overall VM effort.  Yes, some new skills will be needed.

 

The language seems very cool,

 

Indeed, but I care little about ‘cool’ anymore.  Pony is wicked-powerful compared to the competition (C, Rust, Go).  It has extreme utility, especially for writing a VM that must efficiently use all cores.  I care about that.

 

but it still have to be proven in production by a large company that shows that is actually being used,

 

It has already happened.

 

Production use of Pony:  https://blog.wallaroolabs.com/2017/10/why-we-used-pony-to-write-wallaroo/

 

 

it is maintained and will be keep being maintained for decades, which is something that can be said about about C/C++ (Many companies) and Rust (Mozilla) (Go (Google) is out of the question because of its mandatory garbage collector). All of the people already working on the VM has their own political and technical agendas, and they will continue working on that. Mathematical and safeness proofs are not enough. The usage of Slang is also bad on this regard, but with the difference that there is already an existing open source VM that works (and used by several companies), and making a new VM is a high risk endeavor from a technical point of view (and political and business also).

 

This means that if you really want to use Pony for making a Pharo or Squeak VM, then you are going to have to make it yourself, or to pay someone else willing to do it. You will not be able convince the existing people to adopt a new language

 

Those here in the Pharo lists are likely sharp enough to evaluate Pony on their own, and determine its usefulness in the design of a leaner and faster VM, with little or no help and inspiration from me. 

 

https://www.ponylang.io/

and

https://ponylang.zulipchat.com/

 

are the best places to start learning about Pony.  There are many YT presentations, too. 

 

unless you have something running in that language that actually proves that is worth it to change.

 

 

How complete is the documentation on the current Pharo VM? 

 Very incomplete. The best documentation is the code itself, then you have old wiki articles in the Squeak wikis, and some technical articles in Eliot Miranda and Clément Bera blogs. There are some additional ongoing efforts on documenting the VM from the Pharo people: https://github.com/SquareBracketAssociates/Booklet-PharoVirtualMachine and here: https://github.com/pharo-open-documentation/pharo-wiki/tree/master/PharoVirtualMachine

 

Thanks for the links.

 

I also wrote a very simplistic vm in pure C that can load a Pharo 64 bits image, interpret the bytecodes here: https://github.com/ronsaldo/crankvm .

 

That’s useful.  Thank you.  I’ll have look.  It seems like a good starting point, if the current Pharo VM proves too opaque/crufty. 

 

This cannot be used for actually running Pharo because most of the primitives are not yet implemented. I wrote this mostly for documenting myself on how things work,

 

That would be my approach as well.   Can you say more about what CrankVM can and can’t do?  I suppose there is no GUI.  This is a CLI only?  Debugging?  I suppose not.  Is there any doc on it or how to use it?

 

and for getting into that I had to read several parts of the existing vm sources, and the blue book ( http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf ).

 

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creation is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and won’t code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view. 

 

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

 

See above concerning the difference between livelock and deadlock.

 

Thanks for the correct terminology. But from the point of view of the user they are same. And a livelock seems to be far more dangerous than a traditional deadlock for which you already have some debugging tools that are relatively easy to use.

 

The causes of data-races can be hard to debug because their symptoms can be hard, sometimes impossible, to reproduce reliably.  Such problems I do not want routinely to debug, along with my application logic, no matter what debugging tools I have, especially when the compiler/ Pony provides provable guarantees.  I don’t want to do unneeded work because I know how to do the work and have tools to help me do the work.  I’m trying to factor out and eliminate large blocks of work, categorically.  Pony takes care of synchronization issues at compile-time.  That is one very big problem solved.  Pony has raw speed comparable to C.  That is a second very big problem solved.  Pony runs faster than C as it scales to more cores, and no significant additional effort is needed to make this happen (you are programming with actors, and must learn to do that, anyway).  That is a third big problem solved, as long as you are willing to hone your state-machine writing skills, and learn to use actors. 

 

You can create a pthread_mutex

 

Pony doesn’t use mutexes because they aren’t needed.  They waste bandwidth and increase complexity/maintenance cost, depending on how fine-grained your usage is.  Don’t go there. It’s a mistake, and the practice doesn’t scale, neither for the machine nor in the mind of a human programmer.

 

with deadlock detection for detecting bug, or run your program through valgrind for detecting race conditions and deadlocks. The point of this is that the language safety net is not enough, and in fact it could actually be dangerous because it can instill a false sense of safeness.

 

Pony is far from dangerous (it is designed for safety and speed), based on my own limited use (still learning how to use it) and the reports of others who know it much better and use it regularly, like those at Wallaroo Labs, who use it in production (link above).

 

Pony’s type safety and concurrency model work as claimed.  You still need to write your program as a state-machine, correctly.  Consider studying the ref-caps to see how they provide the stated guarantees (https://www.ponylang.io/).  There is no false sense of anything with Pony.  It is able to do what it claims.   

 

 That false sense of safeness can be very dangerous in real world projects with deadlines that have to be met, and non-technical managers which are going to underestimate the deadline far more on this false sense of safeness. And since in the real world deadlines have to bet met, bugs and incomplete solutions are simply shipped.

 

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

 

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.

 

I am developing the language, and if I want or need I can create a minimal vm suitable for my purpose (I am not even trying to convince existing people to use it for now, I will when I have the next version of Woden

 

Woden (and Vulkan) is actually my main reason for wanting a much faster VM.  Woden is why I found Pony.  Most of my work is simulation.  I’m not willing to live through another quarter-century of slow Smalltalks.  The split dev paradigm used by Google’s Flutter should work for a Smalltalk on a Pony-based VM:  use JIT mode most of the time for highest dev-time speeds; use AOT mode some of the time for highest run-time speeds and reality checks. 

 

running on it, and they will be able to just use a Pharo subset for scripting), for which I can adapt into my purposes and needs, which are currently only being fulfilled by C and C++ (and perhaps Rust, but I found it too complicated). I would just use C++ if it had runtime reflection, and easy to do, reliable and robust live programming support but it does not have these crucial features. You can implement live programming in C++ by serializing your program state, changing the dll with your code, and then deserialize your program state. Program state serialization is not easy.

 

Real multi-threading on Sysmel is already supported when you use it without garbage collection (the language is strongly modeled after C++11 when used in this way). When you enable the GC for Smalltalk semantics, it is currently using the Boehm conservative GC (I am using it for getting things running), so the concurrency will be limited by the stop the world semantics of the GC. When I implement a proper GC, I will be able to dissociate threads that only use the native runtime from the GC.

 

Are you saying that Sysmel will have per-actor/thread heaps?

 

But for "many threads", and high parallelism, what I use is the GPU where Sysmel can also be used as a shading language by generating Spir-V for Vulkan

 

Yes, Vulkan is very interesting.  I’m glad to see someone working in this area.

 

Many thanks for your contributions to Pharo graphics. 

 

3D simulation is my main concern, and is why I’m focused on high-performance programming tools like Pony.

 

Sysmel would still need to be able to use all cores on all CPUs, all of the time, for general application logic that cannot easily use the GPU.  Is that planned?

 

consumption (for this the llvm backend is not used, it does not expose all of the required semantics, unless the changed it recently). As for the actor model and by default non-blocking semantics, I just do not care about them (for now) because I know they bring their own problem that can have their impact on other parts of system (e.g. non-blocking IO everywhere and by default, on operating systems that do not support it very well). Instead I prefer explicit semantics, and if the user wants actor, then implement them as just another library in the system.

 

BTW: how do you implement an actor?

 

See the Pony pages below.

 

with a message queue, and how do you implement a message queue? with a mutex and a condition variable. What about deadlock?

 

No deadlocks can happen.  You are, however, free to screw up your app’s state-machine logic as much as you wish before you finally make it work correctly.  This programming problem is culture-wide, no matter what language(s) you like, and can be fixed with a new tool and a different way of looking at what a program is and how it behaves.  Editing text in a method pane doesn’t cover all that needs to happen during the building of a state-machine, not if you want your state-machine to work the first time you test it.  The system browser is mostly just that--a way of browsing.  I don’t consider it an adequate programming tool if efficient building of state-machines is your aim—and this must be your aim if you want to use multicore CPUs, efficiently.

 

you are only taking a single mutex on the queue, so you cannot have a deadlock, since you need at least two mutexes for the dead lock that are taken in a different order. If you have N mutexes, for not having a deadlock you only have to take these N mutexe in the same order.

 

Pony has no mutexes because they aren’t needed in state-machines whose state-changes are caused by actors asynchronously sending messages to each other. 

 

That sounds easy, but in the practice is not and you would have to sort the actual addresses of the mutexes, and in many cases you will realize that you need to take an additional mutex after having already taken other mutexes (I know it from practice). The mutex of the queue is always the one taken last, and that is why it is safe to use the queue for message passing.

 

So, what is so special about these actor languages?

 

A nice overview of the interesting/unique parts of Pony:  https://www.ponylang.io/discover/.

 

The following page has some meaty bits.  Just skip the code examples and read the prose: https://www.ponylang.io/reference/pony-performance-cheatsheet/.

 

 

they actually use a M:N threading model where they actually use green threads in order to multiplex multiple cooperative threads into the different cores of the CPU. They only need to create N native threads that are pinned into the different cores of the CPU (sched_setaffinity in Linux), and the green threads are created by allocating stack memory and having a simple trampoline that stores all of the caller saved registers in the stack, switches the stack pointer, restores these same registers and returns. BTW the operating system level context switching machinery is also implemented on the same way, but it could have some additional instructions for returning into unprivileged code.

 

These 5 to 15 ns figures typically come from switching the stack. And with 64 bits addresses the simplest way for allocating the stack memory is by just allocating large uncommitted memory for the stack memory, or use some other fancier schemes (See here: https://blog.cloudflare.com/how-stacks-are-handled-in-go/ ). You would also need a task queue (or process queue in OS terminology) for not blocking the different actors when they are waiting for messages. There are several papers that discuss how to implement this task queue which tends to be the main bottleneck of these system. In the case of userspace, since the OS is not aware of you green threads, you also need asynchronous IO for everything so that an IO operation does not block all of the tasks that running in a single core (I believe that I read from a different mechanism from a paper from Go that is more flexible, but I forgot about it).

 

If you do not actually need a coroutine, and model your actors in terms of a single function application that receives a single message and process it, then you only need a single stack. (Fetch message from queue, apply the function, done), and you do not even have that 5-15 ns overhead for stack switching (you may have another overhead on pushing the pending actor on the ready queue that could outweigh the performance advantage). This is similar to a traditional asynchronous job system for which you only need a thread pool. And this can also be implemented in C/C++.

 

I’m happy with the 5 to 15 ns time to switch interleaved actor threads on one core.   This is orders of magnitude faster than what I can accomplish with communicating Smalltalk images.

 

Here again are the main traction points:

 

Pony now gives us:

 

1. Raw speed comparable to C

2. Fast FFI--no marshalling.

2. Strict type safety

3. Guaranteed no deadlocks/data-races.

4. Graceful, low programming-effort scaling-up across more cores, more CPUs, and eventually more machine nodes when the Pony group implement the networking extension of Pony.

5. 5 to 15 ns actor-thread switching time when two or more actors are running on one core.

(There is more, but these are the main points for now.)

 

I need at least these abilities in any programming tool I use to make a VM that will use all cores reliably and smoothly.

 

Can Sysmel do those five things now?  In six months?  In a year?

You seem to be working hard on these abilities.  If, as you contemplate the amount of work needed to realize the above, you cringe, moan, groan, or feel your heart sink from your chest to where you gonads are, then the workload and anxiety are too great.  If you that is so, consider using Pony so that you needn’t do so much of the work by yourself.  Maybe you can just re-implement Sysmel with Pony to take advantage of it unique abilities.

 

Shaping

 

 

El jue., 9 abr. 2020 a las 7:43, Shaping (<[hidden email]>) escribió:

 

I have to admit that that language looks quite interesting, but it is not appropriate for writing an actual VM.

 

The point of the Pony language and concurrency model is to be able to write any program, even a VM, at scale and use all of the cores, with minimal programming effort (compared to what the developer must endure when explicitly managing synchronization issues).

 

One thing is having the actor object model for writing an application that scales where it can be a very desirable property, but another thing is writing a  virtual machine where you have time constraints for the just in time compiler in a single CPU to reduce initialization time.

 

A VM is a program.  You can write any program you want with Pony.  The heaps are already actor-centric and usually very tiny, just the way we need them for high-performance, real-time apps:

 

Orca: GC and Type System Co-Design for Actor Languages:

 

https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf

 

Can you be more specific about which program construct cannot be written in Pony or cannot be written to run fast enough?

 

You can write as much asynchronous and synchronous code as you wish, where and when you needed it.  The balance point is yours, as the developer, to choose. 

 

In fact, most of the methods are actually interpreted and they are only compiled after being executed several times for reducing this startup time.

 

Of course.  But this is not a problem peculiar to Pony (or C or Rust or…).  It’s just another programming task needed in the context of VM design.   Note the comment about using both JIT and AOT, as desired, during development.  In any case, if you use Pony, use a state-machine (which we all should do anyway).  If the developer cannot or will not develop the discipline needed to build state-machines, systematically, Pony or any highly concurrent, actor-based programming model will only thwart and frustrate him.  We can’t efficiently use all these cores without such a programming model. 

 

I think I would start by using the Pony compiler from a Smalltalk browser (mod the browser to accommodate the actors too).  Source code in files, searching, and scrolling are too inefficient. 

 

In this problem domain, the actor model and the thread safeness guaranteed by it does not help you at all, and deadlocks can always be produced.

 

This statement is incorrect in the case of Pony (but we have a terminology problem; see below), and this was the main reason for the post.  Here is the gist again:  Deadlocks and data-races are not possible in a Pony program that compiles.  This is mathematically guaranteed.  You can glean this fact from the videos or study the details in Pony papers.

 I did have deadlocks problems with synchronous messages in Erlang

 

Pony is not Erlang.  It is like Erlang is many ways, but vastly improved.  My qualified (“in the round, for starters”) comparison was a mistake.  I should have omitted the whole comment.  The Erlang/Pony comparison never goes well.  Pony is a different animal.  And it’s seriously powerful. Forget about Erlang.  Really.  Just forget about it.  Use what was learnt there, and implement it in Pony. 

 

which forced me to go into using asynchronous messages, but if you do not model your domain state machine you could also end with a deadlock by using asynchronous messages, or even worse, an inconsistent state such as an incomplete credit card transaction in a highly distributed system!

 

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creating is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and tend not to code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view. 

 

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

 

See above concerning the difference between livelock and deadlock. 

 

Going back to the task of developing a VM, you also need to be able to perform dangerous memory accessing operations for at least the following three tasks:

1. Implementing the garbage collector.

 

Actually no; there is nothing dangerous here.  In Pony, a separate heap exists for each actor.  These are generally tiny, and come and go quickly.   If they are not tiny or at least very simple/uniform in structure, they should be made so by refactoring actor scope, until they are.  Smallness and clarity of purpose are the main criteria for determining whether you’ve written an actor well.  Those two properties also greatly ease debugging of the actor.  If you have a big actor not factored as a network of smaller ones, you’ve done something wrong, or you’ve just started your state machine, and have some factoring yet to do.  You still have classes, but these are an organizational tool for synchronous code used by Actors and their asynchronous behaviors.

 

2. Direct access to object slots for implementing the bytecode interpreter.

 

Not a problem.  It’s just a program feature.   So we write it as it needs to be.   

 

3. Copying compiled machine code into executable memory and performing position dependent relocations.

 

Doable.  The Pony FFI works well even at this early stage.

 

The machine code generation and installation can be separated in two stage (the current VM just generates the code directly into the executable memory), and in fact having these two separated stages for compilation and installation is a requirement for operating systems that enforce W^X page level permissions, specially if you want a concurrent VM.

 

Then do that. 

 

You need to install the executable code in an atomic way, so you need to suspend the threads while changing the executable permission into the writable permission. You may get away of this restriction if you are allowed to map the same physical memory into different virtual addresses ranges with different permissions (one writeable, and one executable).

 

…as needed.

 

Pony is changing quickly:  https://ponylang.zulipchat.com/.

 

It’s being improved weekly.  The Pony group are also working on a security model (to deal with attacks via FFI and other sources), but this will be some time coming.  Feel free to contribute.  The language is highly moldable, especially at this stage.  I think the version is 0.33.  If you need a feature or convenience not present, request it.  The group is very responsive, and eager to improve the tool.

If there were no Smalltalk, I would certainly use Pony before C, Rust, or Go, even at this early pre-1.0 stage.

 

For these reasons, you need an unsafe language such as C/C++ for at least these tasks, or a language that allows you to turn off the type and memory safeness net. I heard that Rust has an unsafe pointer that you could also for these purposes.

 

The Pony FFI will work here.  C libs are sometimes needed, and operations in C code are of course not guaranteed to be safe, in any case. 

 

The Pharo team is going with the existing virtual machine for quite a while. You cannot just replace something that is not well documented by something new, specially when you do not have that many resources for making a new vm.

 

One of the main drivers in the choice of Pony is to reduce the resources needed to create a highly performant, simple VM.  One notable, Pharo/Smalltalk-related problem for high-speed apps is stop-the-world GCing in the one large system heap.  This won’t work for high-performance, highly concurrent, highly scalable apps, especially not for for real-time ones, and must go away, if latencies approaching deterministic are to be achieved.  Pony has already solved this problem with per-actor heaps.  That design feature is very interesting because it represents much hard work that need not be done. 

 

 

You first need to document completely the existing one, the semantics of the bytecodes, how they are implemented, and also the same with the primitives.

 

Agreed.  I’m not claiming that VM development is easy or trivial, generally or via Pony.  VMs are arguably one of the most complicated things that humans create (not a compliment).  But Pony solves more concurrency-related problems at compile-time than any other available tool.

 

How complete is the documentation on the current Pharo VM? 

 

As for myself, I am putting my bets on another language that I am developing (Sysmel: https://github.com/ronsaldo/sysmel ), and in full ahead-of-time compilation, but my problem domain is video game programming, low-level operating system, driver development and embedded programming where I actually want to have control of the machine.

 

I’ve read some on Sysmel.  It appears to be an outstanding tool.

 

Does Sysmel implement multicore concurrency, and guarantee no deadlocks/data-races on compile?

Pony, the language and concurrency model, will not stop us from doing anything that can be done with the machine and OS.  The Pony language is not as interesting to me as its concurrency model.  Don’t like Pony syntax (and I don’t)?  Then fork and change it.  You have all the source.  (I’d prefer to see keyword selectors everywhere, even without the usual attendant polymorphism.  Seriously.) 

 

Since I have written my compiler in Pharo, I can just reuse the Opal Compiler for doing AST to AST translation and just compile Pharo (with some limitations, for example no thisContext) into my runtime environment. If I want a more dynamic environment, I can also serialize a Pharo CompiledMethod, send it through a socket and then interpret it on the Sysmel side: https://github.com/ronsaldo/sysmel/blob/master/module-sources/Sysmel.Core/Smalltalk-Bootstrap/InterpretedMethod.sysmel by just reusing the existing language semantics. Currently although I am just supporting Linux, and Windows support is coming in a couple of weeks after getting a proper module system working for reducing compilation times. For the backend I am using LLVM, wasm is not yet supported because I am generating some IR that are not supported by the wasm backend

 

Wasm and WASI will be good when they are ready.

 

(vtable layouts, and some intrinsics that I am using for non-local returns), but they should not be that complicated to fix.

 

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

 

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.

 

 

Best,

 

Shaping

 

 

 

 

El lun., 6 abr. 2020 a las 6:05, Shaping (<[hidden email]>) escribió:

I should have initially posted this to the Pharo-dev list, as well.

 

From: Pharo-users [mailto:[hidden email]] On Behalf Of Shaping
Sent: Friday, 3 April, 2020 14:05
To: 'Any question about pharo is welcome' <[hidden email]>
Subject: Re: [Pharo-users] Latest PharoJS Success Story; Wasm/WASI; very keen on Pony for the Pharo VM

 

All:

> Brain Treats got stuck during launch on my LG.

>

Which android version are you using ?

 

The phone is old and this is likely the problem.

 

Android version:  4.4.2

Kernel version:  3.4.0

 

> Is there a plan to move PharoJS to Wasm/WASI?

>

Dave and I talked about it a long time ago. This sounds like a good idea.

Actually, Dave has a very ambition idea = turn PharoJS into Pharo* where * can be different targets.

But, there's a lot to do before reaching this goal. So, don't expect it any time soon.

 

Not to change the topic too much, but the following is related and I often think of it…

 

Consider writing the pharo VM in Wasm or, better, with Pony (which can emit Wasm, as needed).  Pony’s reference-capability-based (ref-cap) concurrency-model guarantees provably that no data-races or deadlocks can happen if the code compiles; this solves a very large class of extremely ugly concurrency problems that no one ever wants to face.

 

Pony gives high-performance concurrency (5 to 15 ns actor-thread switching time, depending on platform), and solves the most difficult class of synchronization problems at compile time.  It runs as fast as C.  It runs faster than C, as concurrency scales.  You can’t scale a highly concurrent app efficiently in C, and really shouldn’t try if you wish to remain happy and mentally healthy.

 

Pony is still pre-1.0, but the group is very active and competent.  I think we should consider using it to build the VM.  Have a look.  Some videos for your amusement and information:

 

 

https://www.youtube.com/watch?v=ODBd9S1jV2s

https://www.youtube.com/watch?v=u1JfYa413fY

https://www.youtube.com/watch?v=fNdnr1MUXp8

https://www.ponylang.io/

 

There are many others.  I mentioned the Pony concurrency architecture around the holidays, but there was no interest from the list—not a good time perhaps.

 

The tentative plan is to do what Google does with Flutter:  have the JIT in support of the usual dynamicity a Smalltalker needs for rapid development; and have AOT, fully optimized compiling for production or speed-related reality checks, presumably needed less often during development.  There are other possibilities. 

 

Anyone interested?   

 

I have some ideas for simplifying use of the six ref caps in the context of Pharo/Smalltalk.  If this path is chosen, one must commit to strict state-machine-based algorithm development, without exception.  This should have happened anyway by now, broadly in the programming space, but didn’t.  I’m working on a programming graphical tool and associated grammar (in VW) that make state-machine development easy and attractive.  This , besides efficient use of machine resources, is the other reason for pushing in this direction. 

 

A Pony program is built from a net of asynchronously communicating actors.   You change the state of your program with asynchronous messaging between actors.  There is no blocking--no mutexes or semaphores—and therefore no wasted CPU cycles or mushrooming program complexity, as you try to use mutexes in a fine-grained way (a very bad idea).  And as mentioned, there are never deadlocks or data-races.  All cores on all CPUs stay busy, always, until the program goes idle or exits.  The Pony group is also working on extending the model to the network level, so that all machine nodes in the network stay busy.  In the round, as a start, think of Pony as Erlang/OTP, but much faster, with no legacy bugs, and provably no-deadlocking on compile.

 

The asynchronous actor model is the programming pattern that Kay had in mind when he said “object-oriented.”  It’s the one I want to implement in Pharo.  The green threads are light, but don’t efficiently use the cores, and a net of VMs with their respective images still communicate too slowly.

 

I your time permits, please study Pony for a bit, before rejecting the idea as too big a change in direction or too complicated.  Using Pony looks like the ideal VM simplification strategy, if our aim is efficient use of networks of machines, each with at least one CPU (often more), each, in turn, with many cores (whose numbers are still increasing).  This pattern in hardware probably won’t be changing much, now that speeds are topping out.  Winning the performance game is therefore about efficiently using many cores at once, without burdening the programmer.  I don’t see a better way to do this now than with Pony.

 

Thoughts and suggestions are welcome.

 

 

Shaping

 

 

 

 

> -----Original Message-----

> From: Pharo-users [[hidden email]] On

> Behalf Of N. Bouraqadi

> Sent: Tuesday, 28 January, 2020 12:18

> To: Any question about pharo is welcome <[hidden email]>

> Subject: [Pharo-users] Latest PharoJS Success Story

>

> The latest PharoJS-powered smartphone app is now live.

> Development has been made using Pharo.

> Then, javascript code is generated using PharoJS.

> Last, the app is built to target both iOS and Android thanks to Apache

> Cordova.

>

> Learn more and Download at

> https://nootrix.com/projects/brain-treats-app/

>

> Noury

>

>