On Mon, Mar 23, 2009 at 12:18 AM, Igor Stasenko <[hidden email]> wrote: Gulik.2009/3/22 Igor Stasenko <[hidden email]>: The problem here is that the code above wasn't written with concurrency in mind. I'd be looking at what I'm trying to acheive, and then try to keep each thread's data structures as disentangled as possible. If, for example, you were trying to allocate an array of arrays concurrently, then I would write (using my namespaces :-) ): array := Concurrent.Collections.Array new: 1000000000. " We ignore the cost of making this array to begin with :-) " array withIndexdo: [ :each :i | " each will always be nil. " array at: i put: (Array new: 10). ]. Concurrent.Collections.Array>>withIndexDo: aBlock | nThreads sema | nThreads := 1000. " Pull this from a system setting or something. " sema := Semaphore withSignals: 0-nThreads. " Needs implementing: signal it -1000 times. " 1 to: nThreads do: [ :i | [ (nThreads*i) to: (nThreads*i - 1) do: [ :j aBlock value: (self at: j) value: j ] ] fork ]. ^ self. There are probably bugs, I haven't tried running this. No synchronisation should be needed in this example, making it fast. Currently, this wouldn't work on the Squeak VM because block contexts aren't er... "reentrant".
Then... invent a better garbage collector. This might be a difficult problem to solve, but it isn't impossible. http://gulik.pbwiki.com/Block-based+virtual+machine (ideas only)
I plan to find and fix these at some stage. Firstly, I would change the behaviour of the scheduler to get rid of the predictable yielding behaviour to expose these bugs. Doing so is required for the implementation of "Dominions" in my SecureSqueak project anyway, and I'll try to feed my changes back into the community. I also plan to make the Squeak equivalent of java.util.concurrent at some stage.
Answered by Anthony which I agree with. -- http://gulik.pbwiki.com/ _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Argh. Stupid GMail sent it prematurely.
On Mon, Mar 23, 2009 at 10:45 AM, Michael van der Gulik <[hidden email]> wrote:
Again, it probably won't just run without some bug fixing. And it won't work on non-closure Squeak anyway. Gulik. -- http://gulik.pbwiki.com/ _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
2009/3/22 Michael van der Gulik <[hidden email]>:
> Argh. Stupid GMail sent it prematurely. > > On Mon, Mar 23, 2009 at 10:45 AM, Michael van der Gulik <[hidden email]> > wrote: >> >> If, for example, you were trying to allocate an array of arrays >> concurrently, then I would write (using my namespaces :-) ): > > >> >> array := Concurrent.Collections.Array new: 1000000000. " We ignore the >> cost of making this array to begin with :-) " >> array withIndexdo: [ :each :i | >> " each will always be nil. " >> array at: i put: (Array new: 10). >> ]. >> >> Concurrent.Collections.Array>>withIndexDo: aBlock >> | nThreads sema | >> nThreads := 1000. " Pull this from a system setting or something. " >> sema := Semaphore withSignals: 0-nThreads. " Needs implementing: >> signal it -1000 times (or -999 or -1001? Unsure) " >> 1 to: nThreads do: [ :i | >> [ (nThreads*i) to: (nThreads*i - 1) do: [ :j >> aBlock value: (self at: j) value: j ] >> >> sema signal. >> ] fork ]. >> >> sema wait. >> ^ self. >> > > Again, it probably won't just run without some bug fixing. And it won't work > on non-closure Squeak anyway. > Now consider the overhead of creating a fork vs the actual useful code which is running within a block. I presume, this code will run 10x times slower on a single core processor, comparing to one w/o forks. So, you will need to have 10 cores to match the computation time with single core processor. I think its not wise to introduce parallelism on such low levels (in Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails with microscope :) That's why i'm saying its too good to be true. Introducing parallelism at such low levels will be a waste. I leaning toward island model. This is a middle point between no sharing, like in Hydra and sharing everything, like in what you proposing. > Gulik. > > -- > http://gulik.pbwiki.com/ > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> wrote:
10 times slower? Sounds like a made-up number to me... " Using 101 threads: " c := ConcurrentArray new: 1000001. Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]]. 5711 5626 6074 " Using 11 threads: " c := ConcurrentArray new: 1000001. Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i asString. ]]. 3086 3406 3256 " Using 1 thread: " d := Array new: 1000001. Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i asString]]. 2426 2610 2599 My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If the blocks did more work, then the overhead would be lower and some benefit would be gained from using multiple cores. I don't have a good idea of where the overhead is going - maybe it's being lost in the block copying that is needed to work around Squeak's deficiencies? Or maybe it's waiting for the scheduler to do its stuff? Implementation attached, MIT license if you're picky. Gulik. -- http://gulik.pbwiki.com/ _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ConcurrentArray.st (1K) Download Attachment |
On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]> wrote:
I just tried it on VisualWorks as well. I removed the block copying and renamed the method to "keysDo:" (I never thought of an array like that before... keys and values). d := Array new: 1000001. Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i printString]]. 1180 982 1008 " Using 101 threads " c := ConcurrentArray new: 1000001. Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i printString. ]]. 1072 1120 962 At this stage, I'm suspicous about the results :-). Gulik. -- http://gulik.pbwiki.com/ _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Strange since andreas was mentioning that squeak thread swicthing
costs far less than in VW. Stef On Mar 23, 2009, at 4:49 AM, Michael van der Gulik wrote: > > > On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email] > > wrote: > > > On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> > wrote: > > Now consider the overhead of creating a fork vs the actual useful code > which is running within a block. > I presume, this code will run 10x times slower on a single core > processor, comparing to one w/o forks. > So, you will need to have 10 cores to match the computation time with > single core processor. > > I think its not wise to introduce parallelism on such low levels (in > Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails > with microscope :) > > That's why i'm saying its too good to be true. > Introducing parallelism at such low levels will be a waste. I leaning > toward island model. This is a middle point between no sharing, like > in Hydra and sharing everything, like in what you proposing. > > 10 times slower? Sounds like a made-up number to me... > > " Using 101 threads: " > c := ConcurrentArray new: 1000001. > Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i > asString. ]]. > 5711 > 5626 > 6074 > > " Using 11 threads: " > c := ConcurrentArray new: 1000001. > Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i > asString. ]]. > 3086 > 3406 > 3256 > > " Using 1 thread: " > d := Array new: 1000001. > Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i > asString]]. > 2426 > 2610 > 2599 > > My implementation is 1/2 to 1/3 the speed of the single-threaded > Array. If the blocks did more work, then the overhead would be lower > and some benefit would be gained from using multiple cores. > > I don't have a good idea of where the overhead is going - maybe it's > being lost in the block copying that is needed to work around > Squeak's deficiencies? Or maybe it's waiting for the scheduler to do > its stuff? > > Implementation attached, MIT license if you're picky. > > > I just tried it on VisualWorks as well. I removed the block copying > and renamed the method to "keysDo:" (I never thought of an array > like that before... keys and values). > > d := Array new: 1000001. > Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i > printString]]. > 1180 > 982 > 1008 > > " Using 101 threads " > c := ConcurrentArray new: 1000001. > Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i > printString. ]]. > 1072 > 1120 > 962 > > At this stage, I'm suspicous about the results :-). > > Gulik. > > -- > http://gulik.pbwiki.com/ > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Michael van der Gulik-2
2009/3/23 Michael van der Gulik <[hidden email]>:
> > > On Mon, Mar 23, 2009 at 2:21 PM, Michael van der Gulik <[hidden email]> > wrote: >> >> >> On Mon, Mar 23, 2009 at 12:34 PM, Igor Stasenko <[hidden email]> >> wrote: >>> >>> Now consider the overhead of creating a fork vs the actual useful code >>> which is running within a block. >>> I presume, this code will run 10x times slower on a single core >>> processor, comparing to one w/o forks. >>> So, you will need to have 10 cores to match the computation time with >>> single core processor. >>> >>> I think its not wise to introduce parallelism on such low levels (in >>> Concurrent.Collections.Array>>withIndexDo:). Its like hammering nails >>> with microscope :) >>> >>> That's why i'm saying its too good to be true. >>> Introducing parallelism at such low levels will be a waste. I leaning >>> toward island model. This is a middle point between no sharing, like >>> in Hydra and sharing everything, like in what you proposing. >> >> 10 times slower? Sounds like a made-up number to me... >> >> " Using 101 threads: " >> c := ConcurrentArray new: 1000001. >> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i >> asString. ]]. >> 5711 >> 5626 >> 6074 >> >> " Using 11 threads: " >> c := ConcurrentArray new: 1000001. >> Time millisecondsToRun: [c withIndexDo: [ :each :i | c at: i put: i >> asString. ]]. >> 3086 >> 3406 >> 3256 >> >> " Using 1 thread: " >> d := Array new: 1000001. >> Time millisecondsToRun: [d withIndexDo: [ :each :i | d at: i put: i >> asString]]. >> 2426 >> 2610 >> 2599 >> >> My implementation is 1/2 to 1/3 the speed of the single-threaded Array. If >> the blocks did more work, then the overhead would be lower and some benefit >> would be gained from using multiple cores. >> >> I don't have a good idea of where the overhead is going - maybe it's being >> lost in the block copying that is needed to work around Squeak's >> deficiencies? Or maybe it's waiting for the scheduler to do its stuff? >> >> Implementation attached, MIT license if you're picky. > > I just tried it on VisualWorks as well. I removed the block copying and > renamed the method to "keysDo:" (I never thought of an array like that > before... keys and values). > associating an index as a key to access the values. > d := Array new: 1000001. > Time millisecondsToRun: [d keysDo: [ :i | d at: i put: i > printString]]. > 1180 > 982 > 1008 > > " Using 101 threads " > c := ConcurrentArray new: 1000001. > Time millisecondsToRun: [c keysDo: [ :i | c at: i put: i printString. > ]]. > 1072 > 1120 > 962 > > At this stage, I'm suspicous about the results :-). > > Gulik. > > -- > http://gulik.pbwiki.com/ > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Michael van der Gulik-2
On Mon, Mar 23, 2009 at 4:49 PM, Michael van der Gulik <[hidden email]> wrote:
I tried it on SmalltalkMT on a dual-core system, but failed. I couldn't work out how to use its Semaphores; they don't behave like other Smalltalks and seem to be some MS Windows concoction. In fact, the whole environment feels like C++ with the Windows API. I also failed to get SmalltalkMT to use 100% of both cores on my machine despite forking lots of busy blocks. Huemul Smalltalk crashed when I tried to fork something. So... I remain saddened. It appears that Hydra and GemStone are the nearest we have to a multi-core Smalltalk, and they only do coarse-grained parallelism. This is a sad state of affairs. Gulik. -- http://gulik.pbwiki.com/ _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |