Igor Stasenko wrote:
> There are already some steps done in this direction. A sources for > RISC architecture generate a foo struct , which holds all interpreter > globals. > Also, i did some changes in Exupery to create a single struct of all > VM globals (not only variables, but functions too). > This was done to make it easier to get address of any global symbol > what Exupery needs. > I'm also experimented to replace all direct calls to function to > indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused > about ~1% of speed degradation in tinyBenchmarks :) Ah, indeed, I forgot about that. > Also, moving forward on this renders an InterpreterProxy struct > useless, because we can just pass an address to our 'foo' struct to > plugins which already contains everything what plugin can reach. But this isn't quite true. One of the reasons for the proxy is to abstract from the actual implementation since C doesn't do proper name lookup for names but rather uses indexes. And so, if you happen to add or remove a method from that struct, your plugins will be screwed ;-) >> The above takes care about the interpreter but there are still >> primitives and plugins that need to be dealt with. What I would do here >> is define operations like ioLock(struct VM) and ioUnlock(struct VM) that >> are the effective equivalent of Python's GIL (global interpreter lock) >> and allow exclusive access to primitives that have not been converted to >> multi-threading yet. How exactly this conversion should happen is >> deliberately left open here; maybe changing the VMs major proxy version >> is the right thing to do to indicate the changed semantics. In any case, >> the GIL allows us to readily reuse all existing plugins without having >> to worry about conversion early on. >> > Or as i proposed in earlier posts, the other way could be to schedule > all primitive calls, which currently don't support multi-threading to > single 'main' thread. > Then we don't need the GIL. I had missed that. Yes, that would work just as well. Cheers, - Andreas |
In reply to this post by Andreas.Raab
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is. Rob ----- Original Message ----- From: "Andreas Raab" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Tuesday, October 30, 2007 8:53 PM Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures) > Igor Stasenko wrote: >> If you have any ideas how such VM would look like i'm glad to hear. > > Okay, so Josh convinced me to write up the ideas. The main problem as I > see it with a *practical* solution to the problem is that all of the > solutions so far require huge leaps and can't be implemented step-by-step > (which almost certainly dooms them to failure). > > So what do we know and what do we actually all pretty much agree on? It's > that we need to be able to utilize multiple cores and that we need a > practical way to get there (if you disagree with the latter this message > is not meant for you ;-) Running multiple processes is one option but it > is not always sufficient. For example, some OSes would have trouble firing > off a couple of thousand processes whereas the same OS may have no problem > at all with a couple of thousand threads in one process. To give an > example, starting a thread on Windows cost somewhere in the range of a > millisecond which is admittedly slow, but still orders of magnitude faster > than creating a new process. Then there are issues with resource sharing > (like file handles) which are practically guaranteed not to work across > process boundaries etc. So while there are perfectly good reasons to run > multiple processes, there are reasons just as good to wanting to run > multiple threads in one process. > > The question then is, can we find an easy way to extend the Squeak VM to > run multiple threads and if so how? Given the simplistic nature of the > Squeak interpreter, there is actually very little global state that is not > encapsulated in objects on the Squeak heap - basically all the variables > in class interpreter. So if we would put them into state that is local to > each thread, we could trivially run multiple instances of the byte code > interpreter in the same VM. This gets us to the two major questions: > > * How do we encapsulate the interpreter state? > * How do we deal with primitives and plugins? > > Let's start with the first one. Obviously, the answer is "make it an > object". The way how I would go about is by modifying the CCodeGenerator > such that it generates all functions with an argument of type "struct VM" > and that variable accesses prefix things properly and that all functions > calls pass the extra argument along. In short, what used to be translated > as: > > sqInt primitiveAdd(void) { > integerResult = stackIntegerValue(1) + stackIntegerValue(0) > /* etc. */ > } > > will then become something like here: > > sqInt primitiveAdd(struct VM *vm) { > integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) > /* etc. */ > } > > This is a *purely* mechanical step that can be done independent of > anything else. It should be possible to generate code that is entirely > equivalent to todays code and with a bit of tweaking it should be possible > to make that code roughly as fast as we have today (not that I think it > matters but understanding the speed difference between this and the > default interpreter is important for judging relative speed improvements > later). > > The above takes care about the interpreter but there are still primitives > and plugins that need to be dealt with. What I would do here is define > operations like ioLock(struct VM) and ioUnlock(struct VM) that are the > effective equivalent of Python's GIL (global interpreter lock) and allow > exclusive access to primitives that have not been converted to > multi-threading yet. How exactly this conversion should happen is > deliberately left open here; maybe changing the VMs major proxy version is > the right thing to do to indicate the changed semantics. In any case, the > GIL allows us to readily reuse all existing plugins without having to > worry about conversion early on. > > So now we've taken care of the two major parts of Squeak: We have the > ability to run new interpreters and we have the ability to use primitives. > This is when the fun begins, because at this point we have options: > > For example, if you are into shared-state concurrency, you might implement > a primitive that forks a new instance of the interpreter running in the > same object memory that your previous interpreter is running in. > > Or, and that would be the path that I would take, implement a primitive > that loads an image into a new object memory (I can explain in more detail > how memory allocation needs to work for that; it is a fairly > straightforward scheme but a little too long for this message) and run > that interpreter. > > And at this point, the *real* fun begins because we can now start to > define the communication patterns we'd like to use (initially sockets, > later shared memory or event queues or whatever else). We can have tiny > worker images that only do minimal stuff but we can also do a Spoon-like > thing where we have a "master image" that contains all the code possibly > needed and fire off micro-images that (via imprinting) swap in just the > code they need to run. > > [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away] > > Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) > Anyway, the main thing I'm trying to say in the above is that for a > *practical* solution to the problem there are some steps that are pretty > much required whichever way you look at it. And I think that regardless of > your interest in shared state or message passing concurrency we may be > able to define a road that leads to interesting experiments without > sacrificing the practical artifact. A VM built like described in the above > would be strictly a superset of the current VM so it would be able to run > any current images and leave room for further experiments. > > Cheers, > - Andreas > > > |
Rob Withers wrote:
> What about using C++? There would be some degradation of performance. > However, there would be the benefit of structuring the VM classes, of > not having to add VM as an argument everywhere, and it may even be > possible to subclass Thread so we know where the thread-local storage is. For the VM internally, I don't really care. Since this is generated code there is really no difference to me. For plugins it is not feasible to use C++ since name mangling not standardized so you can't link reliably to C++ APIs. Cheers, - Andreas |
In reply to this post by Rob Withers
On 31/10/2007, Rob Withers <[hidden email]> wrote:
> Andreas, > > What about using C++? There would be some degradation of performance. > However, there would be the benefit of structuring the VM classes, of not > having to add VM as an argument everywhere, and it may even be possible to > subclass Thread so we know where the thread-local storage is. > I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together. > Rob > > ----- Original Message ----- > From: "Andreas Raab" <[hidden email]> > To: "The general-purpose Squeak developers list" > <[hidden email]> > Sent: Tuesday, October 30, 2007 8:53 PM > Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures) > > > > Igor Stasenko wrote: > >> If you have any ideas how such VM would look like i'm glad to hear. > > > > Okay, so Josh convinced me to write up the ideas. The main problem as I > > see it with a *practical* solution to the problem is that all of the > > solutions so far require huge leaps and can't be implemented step-by-step > > (which almost certainly dooms them to failure). > > > > So what do we know and what do we actually all pretty much agree on? It's > > that we need to be able to utilize multiple cores and that we need a > > practical way to get there (if you disagree with the latter this message > > is not meant for you ;-) Running multiple processes is one option but it > > is not always sufficient. For example, some OSes would have trouble firing > > off a couple of thousand processes whereas the same OS may have no problem > > at all with a couple of thousand threads in one process. To give an > > example, starting a thread on Windows cost somewhere in the range of a > > millisecond which is admittedly slow, but still orders of magnitude faster > > than creating a new process. Then there are issues with resource sharing > > (like file handles) which are practically guaranteed not to work across > > process boundaries etc. So while there are perfectly good reasons to run > > multiple processes, there are reasons just as good to wanting to run > > multiple threads in one process. > > > > The question then is, can we find an easy way to extend the Squeak VM to > > run multiple threads and if so how? Given the simplistic nature of the > > Squeak interpreter, there is actually very little global state that is not > > encapsulated in objects on the Squeak heap - basically all the variables > > in class interpreter. So if we would put them into state that is local to > > each thread, we could trivially run multiple instances of the byte code > > interpreter in the same VM. This gets us to the two major questions: > > > > * How do we encapsulate the interpreter state? > > * How do we deal with primitives and plugins? > > > > Let's start with the first one. Obviously, the answer is "make it an > > object". The way how I would go about is by modifying the CCodeGenerator > > such that it generates all functions with an argument of type "struct VM" > > and that variable accesses prefix things properly and that all functions > > calls pass the extra argument along. In short, what used to be translated > > as: > > > > sqInt primitiveAdd(void) { > > integerResult = stackIntegerValue(1) + stackIntegerValue(0) > > /* etc. */ > > } > > > > will then become something like here: > > > > sqInt primitiveAdd(struct VM *vm) { > > integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) > > /* etc. */ > > } > > > > This is a *purely* mechanical step that can be done independent of > > anything else. It should be possible to generate code that is entirely > > equivalent to todays code and with a bit of tweaking it should be possible > > to make that code roughly as fast as we have today (not that I think it > > matters but understanding the speed difference between this and the > > default interpreter is important for judging relative speed improvements > > later). > > > > The above takes care about the interpreter but there are still primitives > > and plugins that need to be dealt with. What I would do here is define > > operations like ioLock(struct VM) and ioUnlock(struct VM) that are the > > effective equivalent of Python's GIL (global interpreter lock) and allow > > exclusive access to primitives that have not been converted to > > multi-threading yet. How exactly this conversion should happen is > > deliberately left open here; maybe changing the VMs major proxy version is > > the right thing to do to indicate the changed semantics. In any case, the > > GIL allows us to readily reuse all existing plugins without having to > > worry about conversion early on. > > > > So now we've taken care of the two major parts of Squeak: We have the > > ability to run new interpreters and we have the ability to use primitives. > > This is when the fun begins, because at this point we have options: > > > > For example, if you are into shared-state concurrency, you might implement > > a primitive that forks a new instance of the interpreter running in the > > same object memory that your previous interpreter is running in. > > > > Or, and that would be the path that I would take, implement a primitive > > that loads an image into a new object memory (I can explain in more detail > > how memory allocation needs to work for that; it is a fairly > > straightforward scheme but a little too long for this message) and run > > that interpreter. > > > > And at this point, the *real* fun begins because we can now start to > > define the communication patterns we'd like to use (initially sockets, > > later shared memory or event queues or whatever else). We can have tiny > > worker images that only do minimal stuff but we can also do a Spoon-like > > thing where we have a "master image" that contains all the code possibly > > needed and fire off micro-images that (via imprinting) swap in just the > > code they need to run. > > > > [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away] > > > > Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) > > Anyway, the main thing I'm trying to say in the above is that for a > > *practical* solution to the problem there are some steps that are pretty > > much required whichever way you look at it. And I think that regardless of > > your interest in shared state or message passing concurrency we may be > > able to define a road that leads to interesting experiments without > > sacrificing the practical artifact. A VM built like described in the above > > would be strictly a superset of the current VM so it would be able to run > > any current images and leave room for further experiments. > > > > Cheers, > > - Andreas > > > > > > > > > -- Best regards, Igor Stasenko AKA sig. |
----- Original Message ----- From: "Igor Stasenko" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Wednesday, October 31, 2007 9:39 AM Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures) > On 31/10/2007, Rob Withers <[hidden email]> wrote: >> Andreas, >> >> What about using C++? There would be some degradation of performance. >> However, there would be the benefit of structuring the VM classes, of not >> having to add VM as an argument everywhere, and it may even be possible >> to >> subclass Thread so we know where the thread-local storage is. >> > I'd rather prefer to make modifications to slang to be able to > generate VM sources for any target language/platform and keep platform > dependent code in image instead in separate file(s). This all to > simplify build process and to keep all things together. You mean subclassing a Thread class? Is that platform dependent? If so, I didn't know that and I agree with you - it's should be out in a separate file, if used at all. cheers, Rob |
In reply to this post by Andreas.Raab
----- Original Message ----- From: "Andreas Raab" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Wednesday, October 31, 2007 9:37 AM Subject: Re: Thoughts on a concurrent Squeak VM > Rob Withers wrote: >> What about using C++? There would be some degradation of performance. >> However, there would be the benefit of structuring the VM classes, of not >> having to add VM as an argument everywhere, and it may even be possible >> to subclass Thread so we know where the thread-local storage is. > > For the VM internally, I don't really care. Since this is generated code > there is really no difference to me. For plugins it is not feasible to use > C++ since name mangling not standardized so you can't link reliably to C++ > APIs. That's true that it's internal to the VM so it shouldn't matter. I suppose the benefi of structuring the classes was more of an in image issue with me. Even using C, we could separate off the primitives into a Primitives class and compile with ObjectMemory, Interpreter, and Primitives so they are all generated in the same file. Then we would just need to make sure the InterpreterSimulator knew about the Primitives class. The same issue as would apply if ObjectMemory and Interpreter were no longer part of the same hierarchy. It makes sense that primitives would have a problem with name mangling, so named primitives can't be in C++ classes...indexed could be, though, as long as the primitive table were initialized with the mangled names. cheers, Rob |
In reply to this post by Igor Stasenko
On Oct 31, 2007, at 6:09 AM, Igor Stasenko wrote: > > If we look at multi-core problem as networking problem then, to what i > see, a shared memory helps us in minimizing traffic between cores. Shared memory is an abstraction that pretends that there is no traffic between cores, but of course there really is. Letting hardware threads access objects "at random" (i.e. with no regard to their location in memory) will certainly not help us minimize traffic between cores; why do you think it will? > Because we don't need to spend time of serializing data and transfer > it between cores if its located in shared memory and can be easily > accessed from both ends. > But share-nothing model proposes to not use shared memory, which in > own turn means that there will be a much higher traffic between cores > comparing to model which uses shared memory. It implies nothing of the sort. The shared-nothing model gives you control over this traffic. The model that you propose gives you no control; I think it will probably give degenerate results in practice, with lots of needless cache overhead. Do you think that the performance will scale linearly w/ each processor added? It seems unlikely to me. If you disagree, please explain why. BTW the time spent serializing data is completely irrelevant when considering traffic between cores. Also, I think it will be a small overhead on overall performance. The reason is the, in practice, the amount of data sent between cores/images will be small. It will be trivial for the application programmer to measure the number and size of messages set between images, and to design the computation so that the overhead is low (i.e. lots of computation happens in-image for each message between images). > So, there a balance should be found between network load and using > shared resources. We can't win if we choose one of opposite sides, > only something in the middle. There are some cases where it doesn't make sense to serialize data into a message. If I have a large video "file" in a ByteArray in one image, and I want to play it (decode, upload to OpenGL, etc.), I don't want to serialize the whole thing. It would be much more efficient to ensure that GC won't move it, and then just pass a pointer to the data. I don't think that this sort of thing should be disallowed. I think we agree on this point. Thanks, Josh > I am still wrong here? > > -- > Best regards, > Igor Stasenko AKA sig. > |
In reply to this post by Jason Johnson-5
Thanks for this interesting list of your relevant work. I look
forward to any other thoughts that you could add to this thread, in particular to provide a reality-check where your real-world experience disagrees with my theoretical understanding :-) Best, Josh On Oct 30, 2007, at 4:24 PM, Jecel Assumpcao Jr wrote: > I would like to mention some of my previous work in this area: > > - tinySelf 1 (1996) > http://www.lsi.usp.br/~jecel/tiny.html#rel1 > > This was a Self interepreter written in Self which implemented the one > thread per object model. All messages were future messages but since > sending a message to an unresolved future would block, you would have > deadlock on any recursion (direct or indirect). This problem was > solved > by detecting the cycles and preempting the blocked mesasge with the > one > it depends on. This results in interleaved execution, but since the > semantics are exactly the same as in a sequential execution of the > recursive code any bugs that appear won't be due to concurrency. > > I was able to test simple expressions and was very happy with how much > parallelism I was able to extract from seemingly sequential code, > but I > made the mistake of introducing a significant optimization (tail send > elimination) that made debugging so much harder that I was unable to > finish in the two weeks that I was able to dedicate to this project. > > - 64 node Smalltalk machine (1992) > http://www.lsi.usp.br/~jecel/ms8702.html > > The most interesting result in this project was the notion that most > objects in the system are immutable at any given time and that a > security system might be used to detect this. For example, just > because > you can edit some font today doesn't mean that you will do it. And if > you and everyone currently logged on the local system only have read > permission for that font then it is effectively immutable. Only > when the > font's owner logs in is this assumption invalid. > > The advantage of knowing that an object is immutable is that you can > replicate it and you can allow multiple threads to access it at the > same > time. > > The only paper in English from this project describes how adaptive > compilation could be used to trim away excessive concurrency by > transforming future message passing into sequential message passing > (the > semantics allow this) and then inlining them away. So if a machine has > 64 processors and the application initially starts out with 10 > thousand > threads, the compiler will eventually change this into code with > 200 or > so threads (some are blocked at any given instant, so going down to 64 > threads would not be good).. > http://www.lsi.usp.br/~jecel/jabs1.html > > - operating system in an Objective-C like language (1988) > http://www.lsi.usp.br/~jecel/atos.html (this page has download > links but > the text still hasn't been written) > > This operating system for 286 machine used the virtual memory of that > hardware to isolate groups of objects, with one thread per group. This > would be similar to the vat/island model. All messages were sent in > exactly the same way and if the receiver was a local object then it > was > just a fancy subroutine call but for remote objects you got a "segment > not present" fault and the message was packed up and sent to the other > task (possibly over the network). All messages were synchronous > since I > was not aware of futures at that time. > > -- current model -- > > I moved back to the one thread per object group model since I feel > that > makes it easier for programmers to control things without having to > worry to much about details most of the time. Since my target is > children this is particularly important. An alternative that I > experimented with was having a separation between active and passive > objects. A passive object could be known only to a single active one, > but it is just too hard to program without ever accidentally letting > references to passive objects "leak". With the group/vat/island model > there is just one kind of object and things are simpler for the > programmer (but more complicated for the implementor). I have a > limitation that you can only create new objects in your own group > or in > an entirely new group - I think forcing some other random group to > create an object for you is rude, though of course you can always ask > for an object there to please do it. > > Some of the loaded groups are read/write but many are read-only. The > latter don't actually have their own threads but instead their code > executes in the thread of the calling group. I have hardware > support for > this. > > Speaking of hardware, I would like to stress how fantastically slow > (relatively speaking) main memory is these days. If I have a good > network connecting processor cores in a single chip then I can > probably > send a message from one to another, get a reply, send a second message > and get another reply in the time that it takes to read a byte from > external RAM. So we should start thinking of DDR SDRAM as a really > fast > disk to swap objects to/from and not as a shared memory. We should > start > to take message passing seriously. > > -- Jecel > |
In reply to this post by Rob Withers
On Oct 31, 2007, at 17:57 , Rob Withers wrote:
> ----- Original Message ----- From: "Andreas Raab" > <[hidden email]> > To: "The general-purpose Squeak developers list" <squeak- > [hidden email]> > Sent: Wednesday, October 31, 2007 9:37 AM > Subject: Re: Thoughts on a concurrent Squeak VM > > >> Rob Withers wrote: >>> What about using C++? There would be some degradation of >>> performance. However, there would be the benefit of structuring >>> the VM classes, of not having to add VM as an argument >>> everywhere, and it may even be possible to subclass Thread so we >>> know where the thread-local storage is. >> >> For the VM internally, I don't really care. Since this is >> generated code there is really no difference to me. For plugins it >> is not feasible to use C++ since name mangling not standardized so >> you can't link reliably to C++ APIs. > > That's true that it's internal to the VM so it shouldn't matter. I > suppose the benefi of structuring the classes was more of an in > image issue with me. Even using C, we could separate off the > primitives into a Primitives class and compile with ObjectMemory, > Interpreter, and Primitives so they are all generated in the same > file. Then we would just need to make sure the > InterpreterSimulator knew about the Primitives class. The same > issue as would apply if ObjectMemory and Interpreter were no longer > part of the same hierarchy. > > It makes sense that primitives would have a problem with name > mangling, so named primitives can't be in C++ classes...indexed > could be, though, as long as the primitive table were initialized > with the mangled names. I don't see any point in switching to C++. - Bert - |
----- Original Message ----- From: "Bert Freudenberg" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Wednesday, October 31, 2007 10:16 AM Subject: Re: Thoughts on a concurrent Squeak VM > On Oct 31, 2007, at 17:57 , Rob Withers wrote: > >> ----- Original Message ----- From: "Andreas Raab" <[hidden email]> >> To: "The general-purpose Squeak developers list" <squeak- >> [hidden email]> >> Sent: Wednesday, October 31, 2007 9:37 AM >> Subject: Re: Thoughts on a concurrent Squeak VM >> >> >>> Rob Withers wrote: >>>> What about using C++? There would be some degradation of performance. >>>> However, there would be the benefit of structuring the VM classes, of >>>> not having to add VM as an argument everywhere, and it may even be >>>> possible to subclass Thread so we know where the thread-local storage >>>> is. >>> >>> For the VM internally, I don't really care. Since this is generated >>> code there is really no difference to me. For plugins it is not >>> feasible to use C++ since name mangling not standardized so you can't >>> link reliably to C++ APIs. >> >> That's true that it's internal to the VM so it shouldn't matter. I >> suppose the benefi of structuring the classes was more of an in image >> issue with me. Even using C, we could separate off the primitives into a >> Primitives class and compile with ObjectMemory, Interpreter, and >> Primitives so they are all generated in the same file. Then we would >> just need to make sure the InterpreterSimulator knew about the >> Primitives class. The same issue as would apply if ObjectMemory and >> Interpreter were no longer part of the same hierarchy. >> >> It makes sense that primitives would have a problem with name mangling, >> so named primitives can't be in C++ classes...indexed could be, though, >> as long as the primitive table were initialized with the mangled names. > > I don't see any point in switching to C++. I'm convinced. It was a little hard to let go since I like an OO representation, but as Andraes observed, the VM being generated means I don't really need to look at it too closely. For me it is more about the class representation of the VM in the image. Interpreter is a busy class and some of it's methods could be broken out in separate Squeak classes. cheers, Rob |
In reply to this post by Michael Rueger-4
Hi all,
there is a new version of Yaxo up at http://source.impara.de/infrastructure/XML-Parser-mir.10.mcz You need the two attached 3.8.2/3.10 fixes for the new package to work. Please test the new version. For now I only tried to verify against some examples I had readily available. Once declared stable I'll officially release it on SqueakSource and also push the 3.8.2 changes and new release images. Michael ---------- Fixed a number of issues (see below) and converted _ to :=. There are two major changes in this version: whitespace handling and the unification of elements and contents. For backward compatibility elements and contents methods preserve their semantics. elementsAndContents and elementsAndContentsDo: access the new unified collection Some of the fixes rely on fixes in 3.8.2 or 3.10, most prominently String class>>findFirstInString:inSet:startingAt: http://bugs.squeak.org/view.php?id=32 http://bugs.squeak.org/view.php?id=33 http://bugs.squeak.org/view.php?id=34 http://bugs.squeak.org/view.php?id=547 http://bugs.squeak.org/view.php?id=888 http://bugs.squeak.org/view.php?id=928 http://bugs.squeak.org/view.php?id=3082 http://bugs.squeak.org/view.php?id=3083 http://bugs.squeak.org/view.php?id=6746 6750stringAndCharFixes-mir.zip (3K) Download Attachment |
Michael Rueger wrote:
> Hi all, > > there is a new version of Yaxo up at Kudos to the people who submitted bug reports and fixes!! Michael |
In reply to this post by Rob Withers
On 31/10/2007, Rob Withers <[hidden email]> wrote:
> > ----- Original Message ----- > From: "Igor Stasenko" <[hidden email]> > To: "The general-purpose Squeak developers list" > <[hidden email]> > Sent: Wednesday, October 31, 2007 9:39 AM > Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent > Futures) > > > > On 31/10/2007, Rob Withers <[hidden email]> wrote: > >> Andreas, > >> > >> What about using C++? There would be some degradation of performance. > >> However, there would be the benefit of structuring the VM classes, of not > >> having to add VM as an argument everywhere, and it may even be possible > >> to > >> subclass Thread so we know where the thread-local storage is. > >> > > I'd rather prefer to make modifications to slang to be able to > > generate VM sources for any target language/platform and keep platform > > dependent code in image instead in separate file(s). This all to > > simplify build process and to keep all things together. > > You mean subclassing a Thread class? Is that platform dependent? If so, I > didn't know that and I agree with you - it's should be out in a separate > file, if used at all. > never use external sources. For example, a SocketPlugin can have subclasses Win32SocketPlugin, UnixSocketPlugin and in these subclasses we should keep a code for different platforms. But not in .c sources. > cheers, > Rob > > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
If this is the case, then I wonder how far we could get by just making
all I/O async. On 10/31/07, Igor Stasenko <[hidden email]> wrote: > On 31/10/2007, Jason Johnson <[hidden email]> wrote: > > On 10/30/07, Igor Stasenko <[hidden email]> wrote: > > > > > > Most of reasons why CPU not utilized at 100% is using a blocking I/O > > > calls. Then a simplest solution to not use them and instead of blowing > > > up the number of threads use asynchronous I/O . Most major platforms > > > support asynchronous I/O and there are many libraries which support > > > async data handling almost in each area we need for. We just need to > > > build on top of them. > > > > Good point. How many kinds of I/O in Squeak is currently blocking? I > > think I heard networking blocks, what about disk? > > > All socket/file IO primitives using blocking calls. > To what i see, there is only one set of async primitives - > AsyncFilePlugin. But i'm not sure if it used at first place (i.e. > replaces a FilePlugin). > I think Andreas could answer on this more precisely. > > -- > Best regards, > Igor Stasenko AKA sig. > > |
In reply to this post by Igor Stasenko
I agree with Igor. Slang is a powerful concept that has helped Squeak a lot.
On 10/31/07, Igor Stasenko <[hidden email]> wrote: > On 31/10/2007, Rob Withers <[hidden email]> wrote: > > > > ----- Original Message ----- > > From: "Igor Stasenko" <[hidden email]> > > To: "The general-purpose Squeak developers list" > > <[hidden email]> > > Sent: Wednesday, October 31, 2007 9:39 AM > > Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent > > Futures) > > > > > > > On 31/10/2007, Rob Withers <[hidden email]> wrote: > > >> Andreas, > > >> > > >> What about using C++? There would be some degradation of performance. > > >> However, there would be the benefit of structuring the VM classes, of not > > >> having to add VM as an argument everywhere, and it may even be possible > > >> to > > >> subclass Thread so we know where the thread-local storage is. > > >> > > > I'd rather prefer to make modifications to slang to be able to > > > generate VM sources for any target language/platform and keep platform > > > dependent code in image instead in separate file(s). This all to > > > simplify build process and to keep all things together. > > > > You mean subclassing a Thread class? Is that platform dependent? If so, I > > didn't know that and I agree with you - it's should be out in a separate > > file, if used at all. > > > No, i mean to keep ALL plugins code in corresponding methods, and > never use external sources. > For example, a SocketPlugin can have subclasses Win32SocketPlugin, > UnixSocketPlugin > and in these subclasses we should keep a code for different platforms. > But not in .c sources. > > > cheers, > > Rob > > > > > > > > > -- > Best regards, > Igor Stasenko AKA sig. > > |
In reply to this post by Andreas.Raab
On 31/10/2007, Andreas Raab <[hidden email]> wrote:
> Igor Stasenko wrote: > > There are already some steps done in this direction. A sources for > > RISC architecture generate a foo struct , which holds all interpreter > > globals. > > Also, i did some changes in Exupery to create a single struct of all > > VM globals (not only variables, but functions too). > > This was done to make it easier to get address of any global symbol > > what Exupery needs. > > I'm also experimented to replace all direct calls to function to > > indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused > > about ~1% of speed degradation in tinyBenchmarks :) > > Ah, indeed, I forgot about that. > > > Also, moving forward on this renders an InterpreterProxy struct > > useless, because we can just pass an address to our 'foo' struct to > > plugins which already contains everything what plugin can reach. > > But this isn't quite true. One of the reasons for the proxy is to > abstract from the actual implementation since C doesn't do proper name > lookup for names but rather uses indexes. And so, if you happen to add > or remove a method from that struct, your plugins will be screwed ;-) > You mean for dynamically linked plugins? Yes, that can be a problem. But again, a struct which is generated contains not only address of a variable, but names too (in string literals). a single entry for function: accessibleObjectAfter, "accessibleObjectAfter:", "sqInt (*accessibleObjectAfter)(sqInt oop)", a single entry for a var: &activeContext, "activeContext" , "<var>". So, there is enough info to get everything you need even with dynamic linkage. And even without linkage, you can parse a function prototypes and use some FFI to call them :) All you need is to have a pointer to that struct and number of entries. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Joshua Gargus-2
Joshua Gargus wrote:
> Thanks for this interesting list of your relevant work. I look > forward to any other thoughts that you could add to this thread, in > particular to provide a reality-check where your real-world > experience disagrees with my theoretical understanding :-) Sadly, my practical experience is far more limited than it should be, given all the years I have spent on this, since often a project would be abandoned after only preliminary results in favor of a "much better" one. One thing that I haven't tried yet is making messages to futures return new futures instead of blocking (which I understood from this dicussion to be the E way). I had thought about it but imagined it might lead to ever growing graphs of pending messages with no actual work being done. I see now that in practice the overhead might be comparable to what my deadlock detector had. This would probably also make my "tail send" optimization mostly useless, which is a good thing. In all my projects I took a very conservative path regarding blocks: I simply defined "." as a kind of barrier where all previous instructions must finish before any of the following instructions can be started. Since I was getting a lot of parallelism even with this I didn't worry too much about it and this allowed code like this to work just fine: | a | a := 1. 1 to: 20 do: [ :i | a := a + i ]. a := a - 1. 1 to: 20 do: [ :x | a := a - x ]. ^ a Having "." as a barrier is unecessary in the code below, but at least the results will be correct even if at a much reduced performance: | a b | a := (1 to: 20) collect: [ :i | i * i ]. b := (1 to: 20) collect: [ :x | x + 7 ]. ^ a + b It isn't very hard for the compiler to know which is the case for each example. -- Jecel |
In reply to this post by Igor Stasenko
On Wed, Oct 31, 2007 at 03:44:10PM +0200, Igor Stasenko wrote:
> All socket/file IO primitives using blocking calls. > To what i see, there is only one set of async primitives - > AsyncFilePlugin. But i'm not sure if it used at first place (i.e. > replaces a FilePlugin). > I think Andreas could answer on this more precisely. The SocketPlugin implements asynchronous I/O for all platforms, so socket operations are nonblocking. OSProcessPlugin also provides nonblocking I/O, but only on unix/mac platforms at the moment. AioPlugin implements the aio interface to enable notification of a Squeak semaphore on data availability. These are used in OSProcess and CommandShell for nonblocking I/O on files and pipes, especially for interprocess communication using OS pipes. Dave |
In reply to this post by Andreas.Raab
Andreas Raab writes:
> Ralph Johnson wrote: > > That is a very interesting plan, Andreas. However, I don't see > > garbage collection on the list. Won't you have to make a concurrent > > garbage collecter? > > I don't think so. First, you don't need a new garbage collector for the > direction that I would take. Since the threads don't operate on the same > object memory, no change to the collector is needed. And for shared > state concurrency, simple solutions (like a gc request flag per thread > which is checked on each send) can be used to ensure atomic operation of > the collector. You'd need to serialise object creation and accessing the root table in the write barrier. That may be possible without too much work but there's likely to be some overhead. Providing a parallel object memory as part of a garbage collector rewrite that speed up single CPU code should be possible. The major design change would be changing the write barrier from a remembered set to card marking. That unfortunately might make it necessary to separate pointer object space from byte storage space. >From the reading I did when tuning Exupery's memory access, it looks like a mostly parallel old space collector should be about the same amount of work as writing an incremental collector. The trick is to only run the big mark phase and the big sweep phase in parallel with the interpreter then stop the interpreter to do the final marks. That said, share nothing scales to multiple computers. If you really need CPU power it's often cheaper to buy many smaller boxes than a few big ones. Bryce |
In reply to this post by Igor Stasenko
On Wed, Oct 31, 2007 at 08:10:25PM +0200, Igor Stasenko wrote:
> On 31/10/2007, Rob Withers <[hidden email]> wrote: > > > > ----- Original Message ----- > > From: "Igor Stasenko" <[hidden email]> > > > I'd rather prefer to make modifications to slang to be able to > > > generate VM sources for any target language/platform and keep platform > > > dependent code in image instead in separate file(s). This all to > > > simplify build process and to keep all things together. > > > > You mean subclassing a Thread class? Is that platform dependent? If so, I > > didn't know that and I agree with you - it's should be out in a separate > > file, if used at all. > > > No, i mean to keep ALL plugins code in corresponding methods, and > never use external sources. > For example, a SocketPlugin can have subclasses Win32SocketPlugin, > UnixSocketPlugin > and in these subclasses we should keep a code for different platforms. > But not in .c sources. OSProcessPlugin is organized like this, primarily because I did not want to have external file dependencies and platform code for OSPP. The approach works well in the case where all or most of the code can be done in Slang (there is no external C support code for OSPP). I don't know that it helps in the case where the intent of the plugin is to wrap some external library. There are certainly other areas of the VM and plugins where external support code could be moved back into Slang, although I think this is largely a matter of preference for the folks doing the platform code support. Dave |
Free forum by Nabble | Edit this page |