Concurrent Futures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
152 messages Options
12345678
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Andreas.Raab
Igor Stasenko wrote:

> There are already some steps done in this direction. A sources for
> RISC architecture generate a foo struct , which holds all interpreter
> globals.
> Also, i did some changes in Exupery to create a single struct of all
> VM globals (not only variables, but functions too).
> This was done to make it easier to get address of any global symbol
> what Exupery needs.
> I'm also experimented to replace all direct calls to function to
> indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused
> about ~1% of speed degradation in tinyBenchmarks :)

Ah, indeed, I forgot about that.

> Also, moving forward on this renders an InterpreterProxy struct
> useless, because we can just pass an address to our 'foo' struct to
> plugins which already contains everything what plugin can reach.

But this isn't quite true. One of the reasons for the proxy is to
abstract from the actual implementation since C doesn't do proper name
lookup for names but rather uses indexes. And so, if you happen to add
or remove a method from that struct, your plugins will be screwed ;-)

>> The above takes care about the interpreter but there are still
>> primitives and plugins that need to be dealt with. What I would do here
>> is define operations like ioLock(struct VM) and ioUnlock(struct VM) that
>> are the effective equivalent of Python's GIL (global interpreter lock)
>> and allow exclusive access to primitives that have not been converted to
>> multi-threading yet. How exactly this conversion should happen is
>> deliberately left open here; maybe changing the VMs major proxy version
>> is the right thing to do to indicate the changed semantics. In any case,
>> the GIL allows us to readily reuse all existing plugins without having
>> to worry about conversion early on.
>>
> Or as i proposed in earlier posts, the other way could be to schedule
> all primitive calls, which currently don't support multi-threading to
> single 'main' thread.
> Then we don't need the GIL.

I had missed that. Yes, that would work just as well.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Rob Withers
In reply to this post by Andreas.Raab
Andreas,

What about using C++?  There would be some degradation of performance.
However, there would be the benefit of structuring the VM classes, of not
having to add VM as an argument everywhere, and it may even be possible to
subclass Thread so we know where the thread-local storage is.

Rob

----- Original Message -----
From: "Andreas Raab" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Tuesday, October 30, 2007 8:53 PM
Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)


> Igor Stasenko wrote:
>> If you have any ideas how such VM would look like i'm glad to hear.
>
> Okay, so Josh convinced me to write up the ideas. The main problem as I
> see it with a *practical* solution to the problem is that all of the
> solutions so far require huge leaps and can't be implemented step-by-step
> (which almost certainly dooms them to failure).
>
> So what do we know and what do we actually all pretty much agree on? It's
> that we need to be able to utilize multiple cores and that we need a
> practical way to get there (if you disagree with the latter this message
> is not meant for you ;-) Running multiple processes is one option but it
> is not always sufficient. For example, some OSes would have trouble firing
> off a couple of thousand processes whereas the same OS may have no problem
> at all with a couple of thousand threads in one process. To give an
> example, starting a thread on Windows cost somewhere in the range of a
> millisecond which is admittedly slow, but still orders of magnitude faster
> than creating a new process. Then there are issues with resource sharing
> (like file handles) which are practically guaranteed not to work across
> process boundaries etc. So while there are perfectly good reasons to run
> multiple processes, there are reasons just as good to wanting to run
> multiple threads in one process.
>
> The question then is, can we find an easy way to extend the Squeak VM to
> run multiple threads and if so how? Given the simplistic nature of the
> Squeak interpreter, there is actually very little global state that is not
> encapsulated in objects on the Squeak heap - basically all the variables
> in class interpreter. So if we would put them into state that is local to
> each thread, we could trivially run multiple instances of the byte code
> interpreter in the same VM. This gets us to the two major questions:
>
> * How do we encapsulate the interpreter state?
> * How do we deal with primitives and plugins?
>
> Let's start with the first one. Obviously, the answer is "make it an
> object". The way how I would go about is by modifying the CCodeGenerator
> such that it generates all functions with an argument of type "struct VM"
> and that variable accesses prefix things properly and that all functions
> calls pass the extra argument along. In short, what used to be translated
> as:
>
> sqInt primitiveAdd(void) {
>   integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>   /* etc. */
> }
>
> will then become something like here:
>
> sqInt primitiveAdd(struct VM *vm) {
>   integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>   /* etc. */
> }
>
> This is a *purely* mechanical step that can be done independent of
> anything else. It should be possible to generate code that is entirely
> equivalent to todays code and with a bit of tweaking it should be possible
> to make that code roughly as fast as we have today (not that I think it
> matters but understanding the speed difference between this and the
> default interpreter is important for judging relative speed improvements
> later).
>
> The above takes care about the interpreter but there are still primitives
> and plugins that need to be dealt with. What I would do here is define
> operations like ioLock(struct VM) and ioUnlock(struct VM) that are the
> effective equivalent of Python's GIL (global interpreter lock) and allow
> exclusive access to primitives that have not been converted to
> multi-threading yet. How exactly this conversion should happen is
> deliberately left open here; maybe changing the VMs major proxy version is
> the right thing to do to indicate the changed semantics. In any case, the
> GIL allows us to readily reuse all existing plugins without having to
> worry about conversion early on.
>
> So now we've taken care of the two major parts of Squeak: We have the
> ability to run new interpreters and we have the ability to use primitives.
> This is when the fun begins, because at this point we have options:
>
> For example, if you are into shared-state concurrency, you might implement
> a primitive that forks a new instance of the interpreter running in the
> same object memory that your previous interpreter is running in.
>
> Or, and that would be the path that I would take, implement a primitive
> that loads an image into a new object memory (I can explain in more detail
> how memory allocation needs to work for that; it is a fairly
> straightforward scheme but a little too long for this message) and run
> that interpreter.
>
> And at this point, the *real* fun begins because we can now start to
> define the communication patterns we'd like to use (initially sockets,
> later shared memory or event queues or whatever else). We can have tiny
> worker images that only do minimal stuff but we can also do a Spoon-like
> thing where we have a "master image" that contains all the code possibly
> needed and fire off micro-images that (via imprinting) swap in just the
> code they need to run.
>
> [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
>
> Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-)
> Anyway, the main thing I'm trying to say in the above is that for a
> *practical* solution to the problem there are some steps that are pretty
> much required whichever way you look at it. And I think that regardless of
> your interest in shared state or message passing concurrency we may be
> able to define a road that leads to interesting experiments without
> sacrificing the practical artifact. A VM built like described in the above
> would be strictly a superset of the current VM so it would be able to run
> any current images and leave room for further experiments.
>
> Cheers,
>   - Andreas
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Andreas.Raab
Rob Withers wrote:
> What about using C++?  There would be some degradation of performance.
> However, there would be the benefit of structuring the VM classes, of
> not having to add VM as an argument everywhere, and it may even be
> possible to subclass Thread so we know where the thread-local storage is.

For the VM internally, I don't really care. Since this is generated code
there is really no difference to me. For plugins it is not feasible to
use C++ since name mangling not standardized so you can't link reliably
to C++ APIs.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Igor Stasenko
In reply to this post by Rob Withers
On 31/10/2007, Rob Withers <[hidden email]> wrote:
> Andreas,
>
> What about using C++?  There would be some degradation of performance.
> However, there would be the benefit of structuring the VM classes, of not
> having to add VM as an argument everywhere, and it may even be possible to
> subclass Thread so we know where the thread-local storage is.
>
I'd rather prefer to make modifications to slang to be able to
generate VM sources for any target language/platform and keep platform
dependent code in image instead in separate file(s). This all to
simplify build process and to keep all things together.


> Rob
>
> ----- Original Message -----
> From: "Andreas Raab" <[hidden email]>
> To: "The general-purpose Squeak developers list"
> <[hidden email]>
> Sent: Tuesday, October 30, 2007 8:53 PM
> Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
>
>
> > Igor Stasenko wrote:
> >> If you have any ideas how such VM would look like i'm glad to hear.
> >
> > Okay, so Josh convinced me to write up the ideas. The main problem as I
> > see it with a *practical* solution to the problem is that all of the
> > solutions so far require huge leaps and can't be implemented step-by-step
> > (which almost certainly dooms them to failure).
> >
> > So what do we know and what do we actually all pretty much agree on? It's
> > that we need to be able to utilize multiple cores and that we need a
> > practical way to get there (if you disagree with the latter this message
> > is not meant for you ;-) Running multiple processes is one option but it
> > is not always sufficient. For example, some OSes would have trouble firing
> > off a couple of thousand processes whereas the same OS may have no problem
> > at all with a couple of thousand threads in one process. To give an
> > example, starting a thread on Windows cost somewhere in the range of a
> > millisecond which is admittedly slow, but still orders of magnitude faster
> > than creating a new process. Then there are issues with resource sharing
> > (like file handles) which are practically guaranteed not to work across
> > process boundaries etc. So while there are perfectly good reasons to run
> > multiple processes, there are reasons just as good to wanting to run
> > multiple threads in one process.
> >
> > The question then is, can we find an easy way to extend the Squeak VM to
> > run multiple threads and if so how? Given the simplistic nature of the
> > Squeak interpreter, there is actually very little global state that is not
> > encapsulated in objects on the Squeak heap - basically all the variables
> > in class interpreter. So if we would put them into state that is local to
> > each thread, we could trivially run multiple instances of the byte code
> > interpreter in the same VM. This gets us to the two major questions:
> >
> > * How do we encapsulate the interpreter state?
> > * How do we deal with primitives and plugins?
> >
> > Let's start with the first one. Obviously, the answer is "make it an
> > object". The way how I would go about is by modifying the CCodeGenerator
> > such that it generates all functions with an argument of type "struct VM"
> > and that variable accesses prefix things properly and that all functions
> > calls pass the extra argument along. In short, what used to be translated
> > as:
> >
> > sqInt primitiveAdd(void) {
> >   integerResult = stackIntegerValue(1) + stackIntegerValue(0)
> >   /* etc. */
> > }
> >
> > will then become something like here:
> >
> > sqInt primitiveAdd(struct VM *vm) {
> >   integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
> >   /* etc. */
> > }
> >
> > This is a *purely* mechanical step that can be done independent of
> > anything else. It should be possible to generate code that is entirely
> > equivalent to todays code and with a bit of tweaking it should be possible
> > to make that code roughly as fast as we have today (not that I think it
> > matters but understanding the speed difference between this and the
> > default interpreter is important for judging relative speed improvements
> > later).
> >
> > The above takes care about the interpreter but there are still primitives
> > and plugins that need to be dealt with. What I would do here is define
> > operations like ioLock(struct VM) and ioUnlock(struct VM) that are the
> > effective equivalent of Python's GIL (global interpreter lock) and allow
> > exclusive access to primitives that have not been converted to
> > multi-threading yet. How exactly this conversion should happen is
> > deliberately left open here; maybe changing the VMs major proxy version is
> > the right thing to do to indicate the changed semantics. In any case, the
> > GIL allows us to readily reuse all existing plugins without having to
> > worry about conversion early on.
> >
> > So now we've taken care of the two major parts of Squeak: We have the
> > ability to run new interpreters and we have the ability to use primitives.
> > This is when the fun begins, because at this point we have options:
> >
> > For example, if you are into shared-state concurrency, you might implement
> > a primitive that forks a new instance of the interpreter running in the
> > same object memory that your previous interpreter is running in.
> >
> > Or, and that would be the path that I would take, implement a primitive
> > that loads an image into a new object memory (I can explain in more detail
> > how memory allocation needs to work for that; it is a fairly
> > straightforward scheme but a little too long for this message) and run
> > that interpreter.
> >
> > And at this point, the *real* fun begins because we can now start to
> > define the communication patterns we'd like to use (initially sockets,
> > later shared memory or event queues or whatever else). We can have tiny
> > worker images that only do minimal stuff but we can also do a Spoon-like
> > thing where we have a "master image" that contains all the code possibly
> > needed and fire off micro-images that (via imprinting) swap in just the
> > code they need to run.
> >
> > [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
> >
> > Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-)
> > Anyway, the main thing I'm trying to say in the above is that for a
> > *practical* solution to the problem there are some steps that are pretty
> > much required whichever way you look at it. And I think that regardless of
> > your interest in shared state or message passing concurrency we may be
> > able to define a road that leads to interesting experiments without
> > sacrificing the practical artifact. A VM built like described in the above
> > would be strictly a superset of the current VM so it would be able to run
> > any current images and leave room for further experiments.
> >
> > Cheers,
> >   - Andreas
> >
> >
> >
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Rob Withers

----- Original Message -----
From: "Igor Stasenko" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Wednesday, October 31, 2007 9:39 AM
Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent
Futures)


> On 31/10/2007, Rob Withers <[hidden email]> wrote:
>> Andreas,
>>
>> What about using C++?  There would be some degradation of performance.
>> However, there would be the benefit of structuring the VM classes, of not
>> having to add VM as an argument everywhere, and it may even be possible
>> to
>> subclass Thread so we know where the thread-local storage is.
>>
> I'd rather prefer to make modifications to slang to be able to
> generate VM sources for any target language/platform and keep platform
> dependent code in image instead in separate file(s). This all to
> simplify build process and to keep all things together.

You mean subclassing a Thread class?  Is that platform dependent?  If so, I
didn't know that and I agree with you - it's should be out in a separate
file, if used at all.

cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Rob Withers
In reply to this post by Andreas.Raab

----- Original Message -----
From: "Andreas Raab" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Wednesday, October 31, 2007 9:37 AM
Subject: Re: Thoughts on a concurrent Squeak VM


> Rob Withers wrote:
>> What about using C++?  There would be some degradation of performance.
>> However, there would be the benefit of structuring the VM classes, of not
>> having to add VM as an argument everywhere, and it may even be possible
>> to subclass Thread so we know where the thread-local storage is.
>
> For the VM internally, I don't really care. Since this is generated code
> there is really no difference to me. For plugins it is not feasible to use
> C++ since name mangling not standardized so you can't link reliably to C++
> APIs.

That's true that it's internal to the VM so it shouldn't matter.  I suppose
the benefi of structuring the classes was more of an in image issue with me.
Even using C, we could separate off the primitives into a Primitives class
and compile with ObjectMemory, Interpreter, and Primitives so they are all
generated in the same file.  Then we would just need to make sure the
InterpreterSimulator knew about the Primitives class.  The same issue as
would apply if ObjectMemory and Interpreter were no longer part of the same
hierarchy.

It makes sense that primitives would have a problem with name mangling, so
named primitives can't be in C++ classes...indexed could be, though, as long
as the primitive table were initialized with the mangled names.

cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Joshua Gargus-2
In reply to this post by Igor Stasenko

On Oct 31, 2007, at 6:09 AM, Igor Stasenko wrote:
>
> If we look at multi-core problem as networking problem then, to what i
> see, a shared memory helps us in minimizing traffic between cores.

Shared memory is an abstraction that pretends that there is no  
traffic between cores, but of course there really is.  Letting  
hardware threads access objects "at random" (i.e. with no regard to  
their location in memory) will certainly not help us minimize traffic  
between cores; why do you think it will?

> Because we don't need to spend time of serializing data and transfer
> it between cores if its located in shared memory and can be easily
> accessed from both ends.
> But share-nothing model proposes to not use shared memory, which in
> own turn means that there will be a much higher traffic between cores
> comparing to model which uses shared memory.

It implies nothing of the sort.  The shared-nothing model gives you  
control over this traffic.  The model that you propose gives you no  
control; I think it will probably give degenerate results in  
practice, with lots of needless cache overhead.  Do you think that  
the performance will scale linearly w/ each processor added?  It  
seems unlikely to me.  If you disagree, please explain why.

BTW the time spent serializing data is completely irrelevant when  
considering traffic between cores.  Also, I think it will be a small  
overhead on overall performance.  The reason is the, in practice, the  
amount of data sent between cores/images will be small.  It will be  
trivial for the application programmer to measure the number and size  
of messages set between images, and to design the computation so that  
the overhead is low (i.e. lots of computation happens in-image for  
each message between images).

> So, there a balance should be found between network load and using
> shared resources. We can't win if we choose one of opposite sides,
> only something in the middle.

There are some cases where it doesn't make sense to serialize data  
into a message.  If I have a large video "file" in a ByteArray in one  
image, and I want to play it (decode, upload to OpenGL, etc.), I  
don't want to serialize the whole thing.  It would be much more  
efficient to ensure that GC won't move it, and then just pass a  
pointer to the data.  I don't think that this sort of thing should be  
disallowed.

I think we agree on this point.

Thanks,
Josh

> I am still wrong here?
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Joshua Gargus-2
In reply to this post by Jason Johnson-5
Thanks for this interesting list of your relevant work.  I look  
forward to any other thoughts that you could add to this thread, in  
particular to provide a reality-check where your real-world  
experience disagrees with my theoretical understanding :-)

Best,
Josh


On Oct 30, 2007, at 4:24 PM, Jecel Assumpcao Jr wrote:

> I would like to mention some of my previous work in this area:
>
> - tinySelf 1 (1996)
> http://www.lsi.usp.br/~jecel/tiny.html#rel1
>
> This was a Self interepreter written in Self which implemented the one
> thread per object model. All messages were future messages but since
> sending a message to an unresolved future would block, you would have
> deadlock on any recursion (direct or indirect). This problem was  
> solved
> by detecting the cycles and preempting the blocked mesasge with the  
> one
> it depends on. This results in interleaved execution, but since the
> semantics are exactly the same as in a sequential execution of the
> recursive code any bugs that appear won't be due to concurrency.
>
> I was able to test simple expressions and was very happy with how much
> parallelism I was able to extract from seemingly sequential code,  
> but I
> made the mistake of introducing a significant optimization (tail send
> elimination) that made debugging so much harder that I was unable to
> finish in the two weeks that I was able to dedicate to this project.
>
> - 64 node Smalltalk machine (1992)
> http://www.lsi.usp.br/~jecel/ms8702.html
>
> The most interesting result in this project was the notion that most
> objects in the system are immutable at any given time and that a
> security system might be used to detect this. For example, just  
> because
> you can edit some font today doesn't mean that you will do it. And if
> you and everyone currently logged on the local system only have read
> permission for that font then it is effectively immutable. Only  
> when the
> font's owner logs in is this assumption invalid.
>
> The advantage of knowing that an object is immutable is that you can
> replicate it and you can allow multiple threads to access it at the  
> same
> time.
>
> The only paper in English from this project describes how adaptive
> compilation could be used to trim away excessive concurrency by
> transforming future message passing into sequential message passing  
> (the
> semantics allow this) and then inlining them away. So if a machine has
> 64 processors and the application initially starts out with 10  
> thousand
> threads, the compiler will eventually change this into code with  
> 200 or
> so threads (some are blocked at any given instant, so going down to 64
> threads would not be good)..
> http://www.lsi.usp.br/~jecel/jabs1.html
>
> - operating system in an Objective-C like language (1988)
> http://www.lsi.usp.br/~jecel/atos.html (this page has download  
> links but
> the text still hasn't been written)
>
> This operating system for 286 machine used the virtual memory of that
> hardware to isolate groups of objects, with one thread per group. This
> would be similar to the vat/island model. All messages were sent in
> exactly the same way and if the receiver was a local object then it  
> was
> just a fancy subroutine call but for remote objects you got a "segment
> not present" fault and the message was packed up and sent to the other
> task (possibly over the network). All messages were synchronous  
> since I
> was not aware of futures at that time.
>
> -- current model --
>
> I moved back to the one thread per object group model since I feel  
> that
> makes it easier for programmers to control things without having to
> worry to much about details most of the time. Since my target is
> children this is particularly important. An alternative that I
> experimented with was having a separation between active and passive
> objects. A passive object could be known only to a single active one,
> but it is just too hard to program without ever accidentally letting
> references to passive objects "leak". With the group/vat/island model
> there is just one kind of object and things are simpler for the
> programmer (but more complicated for the implementor). I have a
> limitation that you can only create new objects in your own group  
> or in
> an entirely new group - I think forcing some other random group to
> create an object for you is rude, though of course you can always ask
> for an object there to please do it.
>
> Some of the loaded groups are read/write but many are read-only. The
> latter don't actually have their own threads but instead their code
> executes in the thread of the calling group. I have hardware  
> support for
> this.
>
> Speaking of hardware, I would like to stress how fantastically slow
> (relatively speaking) main memory is these days. If I have a good
> network connecting processor cores in a single chip then I can  
> probably
> send a message from one to another, get a reply, send a second message
> and get another reply in the time that it takes to read a byte from
> external RAM. So we should start thinking of DDR SDRAM as a really  
> fast
> disk to swap objects to/from and not as a shared memory. We should  
> start
> to take message passing seriously.
>
> -- Jecel
>


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Bert Freudenberg
In reply to this post by Rob Withers
On Oct 31, 2007, at 17:57 , Rob Withers wrote:

> ----- Original Message ----- From: "Andreas Raab"  
> <[hidden email]>
> To: "The general-purpose Squeak developers list" <squeak-
> [hidden email]>
> Sent: Wednesday, October 31, 2007 9:37 AM
> Subject: Re: Thoughts on a concurrent Squeak VM
>
>
>> Rob Withers wrote:
>>> What about using C++?  There would be some degradation of  
>>> performance. However, there would be the benefit of structuring  
>>> the VM classes, of not having to add VM as an argument  
>>> everywhere, and it may even be possible to subclass Thread so we  
>>> know where the thread-local storage is.
>>
>> For the VM internally, I don't really care. Since this is  
>> generated code there is really no difference to me. For plugins it  
>> is not feasible to use C++ since name mangling not standardized so  
>> you can't link reliably to C++ APIs.
>
> That's true that it's internal to the VM so it shouldn't matter.  I  
> suppose the benefi of structuring the classes was more of an in  
> image issue with me. Even using C, we could separate off the  
> primitives into a Primitives class and compile with ObjectMemory,  
> Interpreter, and Primitives so they are all generated in the same  
> file.  Then we would just need to make sure the  
> InterpreterSimulator knew about the Primitives class.  The same  
> issue as would apply if ObjectMemory and Interpreter were no longer  
> part of the same hierarchy.
>
> It makes sense that primitives would have a problem with name  
> mangling, so named primitives can't be in C++ classes...indexed  
> could be, though, as long as the primitive table were initialized  
> with the mangled names.

I don't see any point in switching to C++.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Rob Withers

----- Original Message -----
From: "Bert Freudenberg" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Wednesday, October 31, 2007 10:16 AM
Subject: Re: Thoughts on a concurrent Squeak VM


> On Oct 31, 2007, at 17:57 , Rob Withers wrote:
>
>> ----- Original Message ----- From: "Andreas Raab"  <[hidden email]>
>> To: "The general-purpose Squeak developers list" <squeak-
>> [hidden email]>
>> Sent: Wednesday, October 31, 2007 9:37 AM
>> Subject: Re: Thoughts on a concurrent Squeak VM
>>
>>
>>> Rob Withers wrote:
>>>> What about using C++?  There would be some degradation of  performance.
>>>> However, there would be the benefit of structuring  the VM classes, of
>>>> not having to add VM as an argument  everywhere, and it may even be
>>>> possible to subclass Thread so we  know where the thread-local storage
>>>> is.
>>>
>>> For the VM internally, I don't really care. Since this is  generated
>>> code there is really no difference to me. For plugins it  is not
>>> feasible to use C++ since name mangling not standardized so  you can't
>>> link reliably to C++ APIs.
>>
>> That's true that it's internal to the VM so it shouldn't matter.  I
>> suppose the benefi of structuring the classes was more of an in  image
>> issue with me. Even using C, we could separate off the  primitives into a
>> Primitives class and compile with ObjectMemory,  Interpreter, and
>> Primitives so they are all generated in the same  file.  Then we would
>> just need to make sure the  InterpreterSimulator knew about the
>> Primitives class.  The same  issue as would apply if ObjectMemory and
>> Interpreter were no longer  part of the same hierarchy.
>>
>> It makes sense that primitives would have a problem with name  mangling,
>> so named primitives can't be in C++ classes...indexed  could be, though,
>> as long as the primitive table were initialized  with the mangled names.
>
> I don't see any point in switching to C++.

I'm convinced.  It was a little hard to let go since I like an OO
representation, but as Andraes observed, the VM being generated means I
don't really need to look at it too closely.  For me it is more about the
class representation of the VM in the image.  Interpreter is a busy class
and some of it's methods could be broken out in separate Squeak classes.

cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Preliminary new Yaxo version (was: Re: Question about YAXO-XML and possible bug)

Michael Rueger-4
In reply to this post by Michael Rueger-4
Hi all,

there is a new version of Yaxo up at
http://source.impara.de/infrastructure/XML-Parser-mir.10.mcz

You need the two attached 3.8.2/3.10 fixes for the new package to work.

Please test the new version. For now I only tried to verify against some
examples I had readily available.

Once declared stable I'll officially release it on SqueakSource and also
push the 3.8.2 changes and new release images.

Michael

----------

Fixed a number of issues (see below) and converted _ to :=.

There are two major changes in this version:
whitespace handling and the unification of elements and contents.
For backward compatibility elements and contents methods preserve their
semantics.
elementsAndContents and elementsAndContentsDo: access the new unified
collection

Some of the fixes rely on fixes in 3.8.2 or 3.10, most prominently
String class>>findFirstInString:inSet:startingAt:

http://bugs.squeak.org/view.php?id=32
http://bugs.squeak.org/view.php?id=33
http://bugs.squeak.org/view.php?id=34
http://bugs.squeak.org/view.php?id=547
http://bugs.squeak.org/view.php?id=888
http://bugs.squeak.org/view.php?id=928
http://bugs.squeak.org/view.php?id=3082
http://bugs.squeak.org/view.php?id=3083
http://bugs.squeak.org/view.php?id=6746




6750stringAndCharFixes-mir.zip (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Preliminary new Yaxo version

Michael Rueger-4
Michael Rueger wrote:
> Hi all,
>
> there is a new version of Yaxo up at

Kudos to the people who submitted bug reports and fixes!!

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Igor Stasenko
In reply to this post by Rob Withers
On 31/10/2007, Rob Withers <[hidden email]> wrote:

>
> ----- Original Message -----
> From: "Igor Stasenko" <[hidden email]>
> To: "The general-purpose Squeak developers list"
> <[hidden email]>
> Sent: Wednesday, October 31, 2007 9:39 AM
> Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent
> Futures)
>
>
> > On 31/10/2007, Rob Withers <[hidden email]> wrote:
> >> Andreas,
> >>
> >> What about using C++?  There would be some degradation of performance.
> >> However, there would be the benefit of structuring the VM classes, of not
> >> having to add VM as an argument everywhere, and it may even be possible
> >> to
> >> subclass Thread so we know where the thread-local storage is.
> >>
> > I'd rather prefer to make modifications to slang to be able to
> > generate VM sources for any target language/platform and keep platform
> > dependent code in image instead in separate file(s). This all to
> > simplify build process and to keep all things together.
>
> You mean subclassing a Thread class?  Is that platform dependent?  If so, I
> didn't know that and I agree with you - it's should be out in a separate
> file, if used at all.
>
No, i mean to keep ALL plugins code in corresponding methods, and
never use external sources.
For example, a SocketPlugin can have subclasses Win32SocketPlugin,
UnixSocketPlugin
and in these subclasses we should keep a code for different platforms.
But not in .c sources.

> cheers,
> Rob
>
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jason Johnson-5
In reply to this post by Igor Stasenko
If this is the case, then I wonder how far we could get by just making
all I/O async.

On 10/31/07, Igor Stasenko <[hidden email]> wrote:

> On 31/10/2007, Jason Johnson <[hidden email]> wrote:
> > On 10/30/07, Igor Stasenko <[hidden email]> wrote:
> > >
> > > Most of reasons why CPU not utilized at 100% is using a blocking I/O
> > > calls. Then a simplest solution to not use them and instead of blowing
> > > up the number of threads use asynchronous I/O . Most major platforms
> > > support asynchronous I/O and there are many libraries which support
> > > async data handling almost in each area we need for. We just need to
> > > build on top of them.
> >
> > Good point.  How many kinds of I/O in Squeak is currently blocking?  I
> > think I heard networking blocks, what about disk?
> >
> All socket/file IO primitives using blocking calls.
> To what i see, there is only one set of async primitives -
> AsyncFilePlugin. But i'm not sure if it used at first place (i.e.
> replaces a FilePlugin).
> I think Andreas could answer on this more precisely.
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Jason Johnson-5
In reply to this post by Igor Stasenko
I agree with Igor.  Slang is a powerful concept that has helped Squeak a lot.

On 10/31/07, Igor Stasenko <[hidden email]> wrote:

> On 31/10/2007, Rob Withers <[hidden email]> wrote:
> >
> > ----- Original Message -----
> > From: "Igor Stasenko" <[hidden email]>
> > To: "The general-purpose Squeak developers list"
> > <[hidden email]>
> > Sent: Wednesday, October 31, 2007 9:39 AM
> > Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent
> > Futures)
> >
> >
> > > On 31/10/2007, Rob Withers <[hidden email]> wrote:
> > >> Andreas,
> > >>
> > >> What about using C++?  There would be some degradation of performance.
> > >> However, there would be the benefit of structuring the VM classes, of not
> > >> having to add VM as an argument everywhere, and it may even be possible
> > >> to
> > >> subclass Thread so we know where the thread-local storage is.
> > >>
> > > I'd rather prefer to make modifications to slang to be able to
> > > generate VM sources for any target language/platform and keep platform
> > > dependent code in image instead in separate file(s). This all to
> > > simplify build process and to keep all things together.
> >
> > You mean subclassing a Thread class?  Is that platform dependent?  If so, I
> > didn't know that and I agree with you - it's should be out in a separate
> > file, if used at all.
> >
> No, i mean to keep ALL plugins code in corresponding methods, and
> never use external sources.
> For example, a SocketPlugin can have subclasses Win32SocketPlugin,
> UnixSocketPlugin
> and in these subclasses we should keep a code for different platforms.
> But not in .c sources.
>
> > cheers,
> > Rob
> >
> >
> >
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Igor Stasenko
In reply to this post by Andreas.Raab
On 31/10/2007, Andreas Raab <[hidden email]> wrote:

> Igor Stasenko wrote:
> > There are already some steps done in this direction. A sources for
> > RISC architecture generate a foo struct , which holds all interpreter
> > globals.
> > Also, i did some changes in Exupery to create a single struct of all
> > VM globals (not only variables, but functions too).
> > This was done to make it easier to get address of any global symbol
> > what Exupery needs.
> > I'm also experimented to replace all direct calls to function to
> > indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused
> > about ~1% of speed degradation in tinyBenchmarks :)
>
> Ah, indeed, I forgot about that.
>
> > Also, moving forward on this renders an InterpreterProxy struct
> > useless, because we can just pass an address to our 'foo' struct to
> > plugins which already contains everything what plugin can reach.
>
> But this isn't quite true. One of the reasons for the proxy is to
> abstract from the actual implementation since C doesn't do proper name
> lookup for names but rather uses indexes. And so, if you happen to add
> or remove a method from that struct, your plugins will be screwed ;-)
>

You mean for dynamically linked plugins? Yes, that can be a problem.
But again, a struct which is generated contains not only address of a
variable, but names too (in string literals).

a single entry for function:
        accessibleObjectAfter, "accessibleObjectAfter:", "sqInt
(*accessibleObjectAfter)(sqInt oop)",

a single entry for a var:
        &activeContext, "activeContext" , "<var>".

So, there is enough info to get everything you need even with dynamic linkage.
And even without linkage, you can parse a function prototypes and use
some FFI to call them :)
All you need is to have a pointer to that struct and number of entries.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

Jecel Assumpcao Jr
In reply to this post by Joshua Gargus-2
Joshua Gargus wrote:
> Thanks for this interesting list of your relevant work.  I look  
> forward to any other thoughts that you could add to this thread, in  
> particular to provide a reality-check where your real-world  
> experience disagrees with my theoretical understanding :-)

Sadly, my practical experience is far more limited than it should be,
given all the years I have spent on this, since often a project would be
abandoned after only preliminary results in favor of a "much better"
one.

One thing that I haven't tried yet is making messages to futures return
new futures instead of blocking (which I understood from this dicussion
to be the E way). I had thought about it but imagined it might lead to
ever growing graphs of pending messages with no actual work being done.
I see now that in practice the overhead might be comparable to what my
deadlock detector had. This would probably also make my "tail send"
optimization mostly useless, which is a good thing.

In all my projects I took a very conservative path regarding blocks: I
simply defined "." as a kind of barrier where all previous instructions
must finish before any of the following instructions can be started.
Since I was getting a lot of parallelism even with this I didn't worry
too much about it and this allowed code like this to work just fine:

| a |
a := 1.
1 to: 20 do: [ :i | a := a + i ].
a := a - 1.
1 to: 20 do: [ :x | a := a - x ].
^ a

Having "." as a barrier is unecessary in the code below, but at least
the results will be correct even if at a much reduced performance:

| a b |
a := (1 to: 20) collect: [ :i | i * i ].
b := (1 to: 20) collect: [ :x | x + 7 ].
^ a + b

It isn't very hard for the compiler to know which is the case for each
example.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Futures

David T. Lewis
In reply to this post by Igor Stasenko
On Wed, Oct 31, 2007 at 03:44:10PM +0200, Igor Stasenko wrote:
> All socket/file IO primitives using blocking calls.
> To what i see, there is only one set of async primitives -
> AsyncFilePlugin. But i'm not sure if it used at first place (i.e.
> replaces a FilePlugin).
> I think Andreas could answer on this more precisely.

The SocketPlugin implements asynchronous I/O for all platforms, so
socket operations are nonblocking.

OSProcessPlugin also provides nonblocking I/O, but only on unix/mac
platforms at the moment. AioPlugin implements the aio interface to
enable notification of a Squeak semaphore on data availability.
These are used in OSProcess and CommandShell for nonblocking I/O
on files and pipes, especially for interprocess communication using
OS pipes.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM

Bryce Kampjes
In reply to this post by Andreas.Raab
Andreas Raab writes:
 > Ralph Johnson wrote:
 > > That is a very interesting plan, Andreas.  However, I don't see
 > > garbage collection on the list.  Won't you have to make a concurrent
 > > garbage collecter?
 >
 > I don't think so. First, you don't need a new garbage collector for the
 > direction that I would take. Since the threads don't operate on the same
 > object memory, no change to the collector is needed. And for shared
 > state concurrency, simple solutions (like a gc request flag per thread
 > which is checked on each send) can be used to ensure atomic operation of
 > the collector.

You'd need to serialise object creation and accessing the root table
in the write barrier. That may be possible without too much work but
there's likely to be some overhead.

Providing a parallel object memory as part of a garbage collector
rewrite that speed up single CPU code should be possible. The major
design change would be changing the write barrier from a remembered
set to card marking. That unfortunately might make it necessary to
separate pointer object space from byte storage space.

>From the reading I did when tuning Exupery's memory access, it looks
like a mostly parallel old space collector should be about the same
amount of work as writing an incremental collector. The trick is to
only run the big mark phase and the big sweep phase in parallel with
the interpreter then stop the interpreter to do the final marks.

That said, share nothing scales to multiple computers. If you
really need CPU power it's often cheaper to buy many smaller
boxes than a few big ones.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

David T. Lewis
In reply to this post by Igor Stasenko
On Wed, Oct 31, 2007 at 08:10:25PM +0200, Igor Stasenko wrote:

> On 31/10/2007, Rob Withers <[hidden email]> wrote:
> >
> > ----- Original Message -----
> > From: "Igor Stasenko" <[hidden email]>
> > > I'd rather prefer to make modifications to slang to be able to
> > > generate VM sources for any target language/platform and keep platform
> > > dependent code in image instead in separate file(s). This all to
> > > simplify build process and to keep all things together.
> >
> > You mean subclassing a Thread class?  Is that platform dependent?  If so, I
> > didn't know that and I agree with you - it's should be out in a separate
> > file, if used at all.
> >
> No, i mean to keep ALL plugins code in corresponding methods, and
> never use external sources.
> For example, a SocketPlugin can have subclasses Win32SocketPlugin,
> UnixSocketPlugin
> and in these subclasses we should keep a code for different platforms.
> But not in .c sources.

OSProcessPlugin is organized like this, primarily because I did not
want to have external file dependencies and platform code for OSPP.
The approach works well in the case where all or most of the code
can be done in Slang (there is no external C support code for OSPP).
I don't know that it helps in the case where the intent of the plugin
is to wrap some external library.

There are certainly other areas of the VM and plugins where external
support code could be moved back into Slang, although I think this is
largely a matter of preference for the folks doing the platform code
support.

Dave
 

12345678