Hi,
among other less interesting things, I spent some time on existing PosixSharedMemory project. It is a UFFI binding for the LibC methods that provide support for the memory allocation between several separate processes. I significantly improved the performance by implementing the block access. Writing of 10MB byte array takes about 1 millisecond, reading of it from other image took me about 4 milliseconds. While serialization with Fuel is very fast, it opens interesting possibilities. To have a shared memory without synchronization tools is not very useful so I wrote a basic UFFI interface for the POSIX named semaphores. They are quite easy to use and work nicely with Pharo. The VM can all wait on the semaphore or it can check the status of it periodically in an image thread. It has two small disadvantages. It requires to dynamically link the next library (pthread) and they must be cleaned manually. I plan to look at System V alternative in future. Now we should write a nice framework for inter-image communication on top of it or/and adopt Seamless for it ;-) Cheers, -- Pavel |
Thanks Pavel this looks quite fast :).
Do you have a scenario in mind that could take advantage of this? Stef On Mon, Feb 12, 2018 at 10:33 AM, Pavel Krivanek <[hidden email]> wrote: > Hi, > > among other less interesting things, I spent some time on existing > PosixSharedMemory project. It is a UFFI binding for the LibC methods > that provide support for the memory allocation between several > separate processes. I significantly improved the performance by > implementing the block access. Writing of 10MB byte array takes about > 1 millisecond, reading of it from other image took me about 4 > milliseconds. While serialization with Fuel is very fast, it opens > interesting possibilities. > To have a shared memory without synchronization tools is not very > useful so I wrote a basic UFFI interface for the POSIX named > semaphores. They are quite easy to use and work nicely with Pharo. The > VM can all wait on the semaphore or it can check the status of it > periodically in an image thread. It has two small disadvantages. It > requires to dynamically link the next library (pthread) and they must > be cleaned manually. I plan to look at System V alternative in future. > Now we should write a nice framework for inter-image communication on > top of it or/and adopt Seamless for it ;-) > > Cheers, > -- Pavel > |
Le 14/02/2018 à 20:19, Stephane Ducasse a écrit :
> Thanks Pavel this looks quite fast :). > Do you have a scenario in mind that could take advantage of this? I'd be very interested to see that used with image segments for objects migration between images running in different processes (or coupled with a distributed shared memory implementation like [1]). If latency is low enough, I think multi-window applications could be developped as multi-process single window images (and we could get scalability to thousands of cores, if the application design is right. Even code synchronisation between images would be easy to do). Thierry [1] https://hal.archives-ouvertes.fr/hal-01679052 > Stef > > On Mon, Feb 12, 2018 at 10:33 AM, Pavel Krivanek > <[hidden email]> wrote: >> Hi, >> >> among other less interesting things, I spent some time on existing >> PosixSharedMemory project. It is a UFFI binding for the LibC methods >> that provide support for the memory allocation between several >> separate processes. I significantly improved the performance by >> implementing the block access. Writing of 10MB byte array takes about >> 1 millisecond, reading of it from other image took me about 4 >> milliseconds. While serialization with Fuel is very fast, it opens >> interesting possibilities. >> To have a shared memory without synchronization tools is not very >> useful so I wrote a basic UFFI interface for the POSIX named >> semaphores. They are quite easy to use and work nicely with Pharo. The >> VM can all wait on the semaphore or it can check the status of it >> periodically in an image thread. It has two small disadvantages. It >> requires to dynamically link the next library (pthread) and they must >> be cleaned manually. I plan to look at System V alternative in future. >> Now we should write a nice framework for inter-image communication on >> top of it or/and adopt Seamless for it ;-) >> >> Cheers, >> -- Pavel >> > > |
In reply to this post by Pavel Krivanek-3
Hi Pavel,
This is very cool. Is the code available? Cheers, Doru > On Feb 12, 2018, at 10:33 AM, Pavel Krivanek <[hidden email]> wrote: > > Hi, > > among other less interesting things, I spent some time on existing > PosixSharedMemory project. It is a UFFI binding for the LibC methods > that provide support for the memory allocation between several > separate processes. I significantly improved the performance by > implementing the block access. Writing of 10MB byte array takes about > 1 millisecond, reading of it from other image took me about 4 > milliseconds. While serialization with Fuel is very fast, it opens > interesting possibilities. > To have a shared memory without synchronization tools is not very > useful so I wrote a basic UFFI interface for the POSIX named > semaphores. They are quite easy to use and work nicely with Pharo. The > VM can all wait on the semaphore or it can check the status of it > periodically in an image thread. It has two small disadvantages. It > requires to dynamically link the next library (pthread) and they must > be cleaned manually. I plan to look at System V alternative in future. > Now we should write a nice framework for inter-image communication on > top of it or/and adopt Seamless for it ;-) > > Cheers, > -- Pavel > -- www.tudorgirba.com www.feenk.com "From an abstract enough point of view, any two things are similar." |
In reply to this post by Thierry Goubier
> On 14 Feb 2018, at 21:34, Thierry Goubier <[hidden email]> wrote: > > Le 14/02/2018 à 20:19, Stephane Ducasse a écrit : >> Thanks Pavel this looks quite fast :). >> Do you have a scenario in mind that could take advantage of this? > > I'd be very interested to see that used with image segments for objects migration between images running in different processes (or coupled with a distributed shared memory implementation like [1]). > > If latency is low enough, I think multi-window applications could be developped as multi-process single window images (and we could get scalability to thousands of cores, if the application design is right. Even code synchronisation between images would be easy to do). > > This started with me and Pavel discussion about the general idea that it would be interesting if Objects would actually be “really recursive” (and the whole system would be such an Object, too). And then asking the question: what is the minimal thing we need to explore that direction? > > [1] https://hal.archives-ouvertes.fr/hal-01679052 > I will read that. Marcus |
2018-02-15 11:31 GMT+01:00 Marcus Denker <[hidden email]>:
> > >> On 14 Feb 2018, at 21:34, Thierry Goubier <[hidden email]> wrote: >> >> Le 14/02/2018 à 20:19, Stephane Ducasse a écrit : >>> Thanks Pavel this looks quite fast :). >>> Do you have a scenario in mind that could take advantage of this? >> >> I'd be very interested to see that used with image segments for objects migration between images running in different processes (or coupled with a distributed shared memory implementation like [1]). >> >> If latency is low enough, I think multi-window applications could be developped as multi-process single window images (and we could get scalability to thousands of cores, if the application design is right. Even code synchronisation between images would be easy to do). >> >> > Yes, that is the direction to explore :-) When I worked on that sort of systems, one of the questions I had was: what about the GC ? But yes, in today's context, focusing on process based concurrency can be interesting. > This started with me and Pavel discussion about the general idea that it would be interesting if Objects would actually be “really recursive” (and the whole system would be such an Object, too). I'm not sure I get the concept. Would you be ready to explain ? > And then asking the question: what is the minimal thing we need to explore that direction? On a meta-level, I can understand that. Thierry >> >> [1] https://hal.archives-ouvertes.fr/hal-01679052 >> > > I will read that. I talked to Loic about that sort of ideas (his S-DSM could do even more interesting things) but also of the fact we don't have the resources and time to explore it :( Thierry > > Marcus > > |
> On 15 Feb 2018, at 12:56, Thierry Goubier <[hidden email]> wrote: > > 2018-02-15 11:31 GMT+01:00 Marcus Denker <[hidden email]>: >> >> >>> On 14 Feb 2018, at 21:34, Thierry Goubier <[hidden email]> wrote: >>> >>> Le 14/02/2018 à 20:19, Stephane Ducasse a écrit : >>>> Thanks Pavel this looks quite fast :). >>>> Do you have a scenario in mind that could take advantage of this? >>> >>> I'd be very interested to see that used with image segments for objects migration between images running in different processes (or coupled with a distributed shared memory implementation like [1]). >>> >>> If latency is low enough, I think multi-window applications could be developped as multi-process single window images (and we could get scalability to thousands of cores, if the application design is right. Even code synchronisation between images would be easy to do). >>> >>> >> Yes, that is the direction to explore :-) > > When I worked on that sort of systems, one of the questions I had was: > what about the GC ? But yes, in today's context, focusing on process > based concurrency can be interesting. > >> This started with me and Pavel discussion about the general idea that it would be interesting if Objects would actually be “really recursive” (and the whole system would be such an Object, too). > > I'm not sure I get the concept. Would you be ready to explain ? I have started to write something about it (I need to make it clearer for myself, too). When it is in some presentable shape (currently just notes) I will forward it. Marcus |
In reply to this post by Pavel Krivanek-3
Hi Pavel, I'm looking at PosixSharedMemory again since I have to write a Master student proposal and I think this could be a good topic. I'm not really expert on SharedMemory, so I'm going to share what I have in mind, we can discuss about it, you could also co-supervise the student indirectly (though officially the supervisor has to be VUB staff). Implementation-wise, I was thinking what could be interesting are: - Making PosixSharedMemory compatible with TaskIt so from you image you can create a SharedMemory buffer, spawn another image+VM and attach it to the shared memory to have multiple threads working on the shared section. - Add the #at:if:put: primitive, which write into the shared memory using compare and swap instruction for efficient thread-safe access. - Add on SharedMemory all the primitives to read/write native types to the buffer (int64, double, etc.) with CAS and non CAS instructions. - Maybe add APIs to read/write objects through Fuel to pass them by copy, though this looks difficult in some cases. - Implement a lock system or a semaphore system on top of the CAS - implement lock-free and lock-full algorithm using CAS and non CAS instructions (I think a first try would be parallelSort on a 1Gb buffer of int32 with 4 native threads) What do you think ? Do you have ideas ? Are you interested ? Do you thing having a student on this would be nice ? The master thesis proposal has to include a research question. I am not sure what other languages do regarding shared memory.It's not clear so far what the research question is. Best, On Mon, Feb 12, 2018 at 10:33 AM, Pavel Krivanek <[hidden email]> wrote: Hi, |
2018-04-17 14:03 GMT+02:00 Clément Bera <[hidden email]>:
> Hi Pavel, > > I'm looking at PosixSharedMemory again since I have to write a Master > student proposal and I think this could be a good topic. I'm not really > expert on SharedMemory, so I'm going to share what I have in mind, we can > discuss about it, you could also co-supervise the student indirectly (though > officially the supervisor has to be VUB staff). > > Implementation-wise, I was thinking what could be interesting are: > - Making PosixSharedMemory compatible with TaskIt so from you image you can > create a SharedMemory buffer, spawn another image+VM and attach it to the > shared memory to have multiple threads working on the shared section. > - Add the #at:if:put: primitive, which write into the shared memory using > compare and swap instruction for efficient thread-safe access. > - Add on SharedMemory all the primitives to read/write native types to the > buffer (int64, double, etc.) with CAS and non CAS instructions. > - Maybe add APIs to read/write objects through Fuel to pass them by copy, > though this looks difficult in some cases. > - Implement a lock system or a semaphore system on top of the CAS > - implement lock-free and lock-full algorithm using CAS and non CAS > instructions (I think a first try would be parallelSort on a 1Gb buffer of > int32 with 4 native threads) > > What do you think ? Do you have ideas ? Are you interested ? Do you thing > having a student on this would be nice ? > > The master thesis proposal has to include a research question. I am not sure > what other languages do regarding shared memory.It's not clear so far what > the research question is. I can forward that to a researcher here working on distributed shared memory. Research question can be: - heterogeneity (x86 + pi at the same time) - load balancing between competing images on heterogeneous hardware ... - Object migration (pointer forwarding). Probably not, state of the art is very advanced on that which means a costly implementation to reach parity. Thierry > > Best, > > On Mon, Feb 12, 2018 at 10:33 AM, Pavel Krivanek <[hidden email]> > wrote: >> >> Hi, >> >> among other less interesting things, I spent some time on existing >> PosixSharedMemory project. It is a UFFI binding for the LibC methods >> that provide support for the memory allocation between several >> separate processes. I significantly improved the performance by >> implementing the block access. Writing of 10MB byte array takes about >> 1 millisecond, reading of it from other image took me about 4 >> milliseconds. While serialization with Fuel is very fast, it opens >> interesting possibilities. >> To have a shared memory without synchronization tools is not very >> useful so I wrote a basic UFFI interface for the POSIX named >> semaphores. They are quite easy to use and work nicely with Pharo. The >> VM can all wait on the semaphore or it can check the status of it >> periodically in an image thread. It has two small disadvantages. It >> requires to dynamically link the next library (pthread) and they must >> be cleaned manually. I plan to look at System V alternative in future. >> Now we should write a nice framework for inter-image communication on >> top of it or/and adopt Seamless for it ;-) >> >> Cheers, >> -- Pavel >> > > > > -- > Clément Béra > https://clementbera.github.io/ > https://clementbera.wordpress.com/ |
On Tue, Apr 17, 2018 at 2:08 PM, Thierry Goubier <[hidden email]> wrote:
SharedMemory on heterogenous hardware ? Do you mean you need to physically plug the memory into a Raspberry Pie and an x86 computer ? Or you mean exporting the RAM of one hardware as NFS to the others ? I was just thinking sharing the memory between multiple pairs of image+VM on the same machine to be able to run some multi-threaded algorithm on the shared buffer. I know it's not much but we need to start somewhere. The student would have 6 months, including 1 month to write the thesis so it cannot be too heavy. Something that we can re-use with a minor research contribution would be nice. Thierry |
2018-04-17 14:28 GMT+02:00 Clément Bera <[hidden email]>:
> > > On Tue, Apr 17, 2018 at 2:08 PM, Thierry Goubier <[hidden email]> > wrote: >> >> 2018-04-17 14:03 GMT+02:00 Clément Bera <[hidden email]>: >> > Hi Pavel, >> > >> > I'm looking at PosixSharedMemory again since I have to write a Master >> > student proposal and I think this could be a good topic. I'm not really >> > expert on SharedMemory, so I'm going to share what I have in mind, we >> > can >> > discuss about it, you could also co-supervise the student indirectly >> > (though >> > officially the supervisor has to be VUB staff). >> > >> > Implementation-wise, I was thinking what could be interesting are: >> > - Making PosixSharedMemory compatible with TaskIt so from you image you >> > can >> > create a SharedMemory buffer, spawn another image+VM and attach it to >> > the >> > shared memory to have multiple threads working on the shared section. >> > - Add the #at:if:put: primitive, which write into the shared memory >> > using >> > compare and swap instruction for efficient thread-safe access. >> > - Add on SharedMemory all the primitives to read/write native types to >> > the >> > buffer (int64, double, etc.) with CAS and non CAS instructions. >> > - Maybe add APIs to read/write objects through Fuel to pass them by >> > copy, >> > though this looks difficult in some cases. >> > - Implement a lock system or a semaphore system on top of the CAS >> > - implement lock-free and lock-full algorithm using CAS and non CAS >> > instructions (I think a first try would be parallelSort on a 1Gb buffer >> > of >> > int32 with 4 native threads) >> > >> > What do you think ? Do you have ideas ? Are you interested ? Do you >> > thing >> > having a student on this would be nice ? >> > >> > The master thesis proposal has to include a research question. I am not >> > sure >> > what other languages do regarding shared memory.It's not clear so far >> > what >> > the research question is. >> >> I can forward that to a researcher here working on distributed shared >> memory. >> >> Research question can be: >> - heterogeneity (x86 + pi at the same time) >> - load balancing between competing images on heterogeneous hardware >> ... >> - Object migration (pointer forwarding). Probably not, state of the >> art is very advanced on that which means a costly implementation to >> reach parity. >> > > SharedMemory on heterogenous hardware ? Do you mean you need to physically > plug the memory into a Raspberry Pie and an x86 computer ? Or you mean > exporting the RAM of one hardware as NFS to the others ? I was just thinking > sharing the memory between multiple pairs of image+VM on the same machine to > be able to run some multi-threaded algorithm on the shared buffer. I know > it's not much but we need to start somewhere. No, it's having a memory abstraction (memory chunks) handled by a server on a host (x86 or ARM), sending them over MPI to clients (x86 or ARM), each client accessing it through the OS shared memory, releasing it when done so that other tasks can work on it (distributed pipeline). > The student would have 6 months, including 1 month to write the thesis so it > cannot be too heavy. Something that we can re-use with a minor research > contribution would be nice. Given the state of the art in the field (20 years + of shared memory + distributed shared memory + distributed object store and migration already done), the only easy one that I know of is heterogeneity. And even there, doing something worthy of a paper is hard. Thierry >> Thierry >> >> > >> > Best, >> > >> > On Mon, Feb 12, 2018 at 10:33 AM, Pavel Krivanek >> > <[hidden email]> >> > wrote: >> >> >> >> Hi, >> >> >> >> among other less interesting things, I spent some time on existing >> >> PosixSharedMemory project. It is a UFFI binding for the LibC methods >> >> that provide support for the memory allocation between several >> >> separate processes. I significantly improved the performance by >> >> implementing the block access. Writing of 10MB byte array takes about >> >> 1 millisecond, reading of it from other image took me about 4 >> >> milliseconds. While serialization with Fuel is very fast, it opens >> >> interesting possibilities. >> >> To have a shared memory without synchronization tools is not very >> >> useful so I wrote a basic UFFI interface for the POSIX named >> >> semaphores. They are quite easy to use and work nicely with Pharo. The >> >> VM can all wait on the semaphore or it can check the status of it >> >> periodically in an image thread. It has two small disadvantages. It >> >> requires to dynamically link the next library (pthread) and they must >> >> be cleaned manually. I plan to look at System V alternative in future. >> >> Now we should write a nice framework for inter-image communication on >> >> top of it or/and adopt Seamless for it ;-) >> >> >> >> Cheers, >> >> -- Pavel >> >> >> > >> > >> > >> > -- >> > Clément Béra >> > https://clementbera.github.io/ >> > https://clementbera.wordpress.com/ >> > > > > -- > Clément Béra > https://clementbera.github.io/ > https://clementbera.wordpress.com/ |
Free forum by Nabble | Edit this page |