Hello,
just out of interest, i tried to compare the speed of my FIFOQueue implementation and SharedQueue, using Levente's benchmarks for stack: "The linked Stack implementation" (1 to: 5) collect: [ :run | | stack | Smalltalk garbageCollect. stack := FIFOQueue new. { [ 1 to: 1000000 do: [ :each | stack nextPut: each ] ] timeToRun. [ 1 to: 1000000 do: [ :each | stack next ] ] timeToRun } ] #(#(291 69) #(170 65) #(168 66) #(168 65) #(168 65)) Then i changed FIFOQueue to SharedQueue and run it again.. waiting 1 minute.. wait a bit more.. then i came to smoke.. and after returning, it was still running.. i interrupted it, and inspected the queue size.. it was slightly above 300000 items. Of course, SharedQueue usually not used in scenarios, where you need to push such large number of items. So, its just a warning. Btw, here is another comparison (Stack vs thread-safe LIFO queue): (1 to: 5) collect: [ :run | | stack | Smalltalk garbageCollect. stack := Stack new. { [ 1 to: 1000000 do: [ :each | stack push: each ] ] timeToRun. [ 1 to: 1000000 do: [ :each | stack pop ] ] timeToRun } ] Stack: #(#(166 94) #(160 90) #(162 91) #(162 92) #(160 92)) LIFOQueue: #(#(172 250) #(174 248) #(172 250) #(174 252) #(172 250)) Yes, it is slower (mainly for reading). But it is price for being thread safe :) -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Igor sorry we were busy running around :)
So could you summarize with pros and cons with potential path for the next actions May be we should schedule something for 1.3 Stef On Oct 16, 2010, at 4:19 AM, Igor Stasenko wrote: > Hello, > > just out of interest, i tried to compare the speed of my FIFOQueue > implementation and SharedQueue, > using Levente's benchmarks for stack: > > "The linked Stack implementation" > (1 to: 5) collect: [ :run | > | stack | > Smalltalk garbageCollect. > stack := FIFOQueue new. > { > [ 1 to: 1000000 do: [ :each | stack nextPut: each ] ] timeToRun. > [ 1 to: 1000000 do: [ :each | stack next ] ] timeToRun } ] > > #(#(291 69) #(170 65) #(168 66) #(168 65) #(168 65)) > > Then i changed FIFOQueue to SharedQueue and run it again.. > waiting 1 minute.. wait a bit more.. then i came to smoke.. and after > returning, it was still running.. > i interrupted it, and inspected the queue size.. it was slightly above > 300000 items. > > Of course, SharedQueue usually not used in scenarios, where you need > to push such large number of items. > So, its just a warning. > > Btw, here is another comparison (Stack vs thread-safe LIFO queue): > > (1 to: 5) collect: [ :run | > | stack | > Smalltalk garbageCollect. > stack := Stack new. > { > [ 1 to: 1000000 do: [ :each | stack push: each ] ] timeToRun. > [ 1 to: 1000000 do: [ :each | stack pop ] ] timeToRun } ] > > Stack: > #(#(166 94) #(160 90) #(162 91) #(162 92) #(160 92)) > > LIFOQueue: > #(#(172 250) #(174 248) #(172 250) #(174 252) #(172 250)) > > Yes, it is slower (mainly for reading). But it is price for being thread safe :) > > > -- > Best regards, > Igor Stasenko AKA sig. > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 17 October 2010 12:01, Stéphane Ducasse <[hidden email]> wrote:
> Igor sorry we were busy running around :) > So could you summarize > with pros and cons > with potential path for the next actions > May be we should schedule something for 1.3 > The two implementations i made in Atomic-Collections package run without using any kind of locking (i.e. semaphores). However, we should make a difference between lock-free and wait-free algorithms. Inserting items in queue (both FIFO and LIFO) using no wait-loops, and can be considered a O(1) operation, if we neglect the cost of allocating new queue item. But to extract items, both FIFO and LIFO queues using a wait loops, if queue is in the middle of extraction by other process. This means, that if there is only one process, which extracting items from queue, extraction will be always wait-free. But if you having more than one, you should expect wait-loops to be used. There is, however a way to give a quick answer without entering a wait-loop at attempt to extract a single item from queue. But it won't be deterministic answer: - #nextIfNone: will return nil if there either no items in queue or queue is currently in the middle of extraction by other process. I think that there should be two distinct methods(protocol) for accessing queue in wait-free way, which not always guarantees deterministic answers, and methods, which using wait loops. Because currently, in order to match with SharedQueue protocol, there is a mix of methods with wait-free and wait-loops to match the SharedQueue behavior. Then, applications, which using these queues, could have a guarantees that any access to queue are wait-free, if they can neglect that answers are non-deterministic, or use methods with wait-loops to always have a deterministic answers. > Stef > > > On Oct 16, 2010, at 4:19 AM, Igor Stasenko wrote: > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 17 October 2010 13:42, Igor Stasenko <[hidden email]> wrote:
> On 17 October 2010 12:01, Stéphane Ducasse <[hidden email]> wrote: >> Igor sorry we were busy running around :) >> So could you summarize >> with pros and cons >> with potential path for the next actions >> May be we should schedule something for 1.3 >> > > The two implementations i made in Atomic-Collections package run > without using any kind of locking (i.e. semaphores). > However, we should make a difference between lock-free and wait-free algorithms. > > Inserting items in queue (both FIFO and LIFO) using no wait-loops, and > can be considered a O(1) operation, > if we neglect the cost of allocating new queue item. > > But to extract items, both FIFO and LIFO queues using a wait loops, if > queue is in the middle of extraction by > other process. > This means, that if there is only one process, which extracting items > from queue, > extraction will be always wait-free. But if you having more than one, > you should expect wait-loops to be used. > There is, however a way to give a quick answer without entering a > wait-loop at attempt > to extract a single item from queue. But it won't be deterministic answer: > - #nextIfNone: will return nil if there either no items in queue or > queue is currently in the middle of extraction by > other process. oops.. #nextIfNone: , will evaluate a block if there is no items in queue, it is a #nextOrNil, which answers nil if there are no items in queue. > > I think that there should be two distinct methods(protocol) for > accessing queue in wait-free way, > which not always guarantees deterministic answers, and methods, which > using wait loops. > Because currently, in order to match with SharedQueue protocol, > there is a mix of methods with wait-free and wait-loops to match the > SharedQueue behavior. > > Then, applications, which using these queues, could have a guarantees > that any access to queue > are wait-free, if they can neglect that answers are non-deterministic, > or use methods with wait-loops to always have a deterministic answers. > >> Stef >> >> >> On Oct 16, 2010, at 4:19 AM, Igor Stasenko wrote: >> > -- > Best regards, > Igor Stasenko AKA sig. > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Igor Stasenko
On Sat, 16 Oct 2010, Igor Stasenko wrote:
> Hello, > > just out of interest, i tried to compare the speed of my FIFOQueue > implementation and SharedQueue, > using Levente's benchmarks for stack: > > "The linked Stack implementation" > (1 to: 5) collect: [ :run | > | stack | > Smalltalk garbageCollect. > stack := FIFOQueue new. > { > [ 1 to: 1000000 do: [ :each | stack nextPut: each ] ] timeToRun. > [ 1 to: 1000000 do: [ :each | stack next ] ] timeToRun } ] > > #(#(291 69) #(170 65) #(168 66) #(168 65) #(168 65)) > > Then i changed FIFOQueue to SharedQueue and run it again.. > waiting 1 minute.. wait a bit more.. then i came to smoke.. and after > returning, it was still running.. > i interrupted it, and inspected the queue size.. it was slightly above > 300000 items. > > Of course, SharedQueue usually not used in scenarios, where you need > to push such large number of items. > So, its just a warning. SharedQueue's code for "growing" (#makeRoomAtEnd) is crap IMHO. Because of that it takes O(n) time to add or remove and element to or from the queue. SharedQueue2 is a lot better approach, because it doesn't try to reimplement a dynamic array, but uses OrderedCollection instead. Btw calling a LIFO datastructure a queue is strange, because a queue is always FIFO. Please call it a stack. Levente > > Btw, here is another comparison (Stack vs thread-safe LIFO queue): > > (1 to: 5) collect: [ :run | > | stack | > Smalltalk garbageCollect. > stack := Stack new. > { > [ 1 to: 1000000 do: [ :each | stack push: each ] ] timeToRun. > [ 1 to: 1000000 do: [ :each | stack pop ] ] timeToRun } ] > > Stack: > #(#(166 94) #(160 90) #(162 91) #(162 92) #(160 92)) > > LIFOQueue: > #(#(172 250) #(174 248) #(172 250) #(174 252) #(172 250)) > > Yes, it is slower (mainly for reading). But it is price for being thread safe :) > > > -- > Best regards, > Igor Stasenko AKA sig. > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Sun, 17 Oct 2010, Levente Uzonyi wrote:
snip > SharedQueue's code for "growing" (#makeRoomAtEnd) is crap IMHO. Because of > that it takes O(n) time to add or remove and element to or from the queue. > SharedQueue2 is a lot better approach, because it doesn't try to > reimplement a dynamic array, but uses OrderedCollection instead. I uploaded a new version of that method to the Trunk. I don't think it's really useful, because I'm pretty sure we will get rid of both SharedQueue and SharedQueue2 in the near future. Anyway I did some benchmarks using Cog, and SharedQueue became surprisingly good, though it's still not even close to Igor's new FIFOQueue or AtomicSharedQueue. Benchmark #1: Make a large queue { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass | | queue | queue := queueClass new. queueClass -> ( (1 to: 5) collect: [ :run | Smalltalk garbageCollect. { [ 1 to: 1000000 do: [ :each | queue nextPut: each ] ] timeToRun. [ 1 to: 1000000 do: [ :each | queue next ] ] timeToRun } ]) ]. SharedQueue->#(#(1028 1615) #(945 1557) #(976 2322) #(492 459) #(489 476)). SharedQueue2->#(#(1976 2284) #(1318 8298) #(982 692) #(1005 675) #(1002 665)). FIFOQueue->#(#(180 67) #(184 67) #(182 69) #(181 66) #(177 67)). AtomicSharedQueue->#(#(208 66) #(207 67) #(209 66) #(213 68) #(209 65)). This benchmark is similar to what Igor used, but it doesn't create a new queue between runs. It simply adds 1,000,000 elements then removes them which is a pretty unrealistic scenario for a shared queue. The effect of GC is pretty high on this benchmark, probably that's responsible for the high spikes. Benchmark #2: Sequential throughput test { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass | | queue | queue := queueClass new. queueClass -> ( (1 to: 5) collect: [ :run | | results | Smalltalk garbageCollect. results := #(0 0). 1 to: 1000 do: [ :round | results := results + { [ 1 to: 1000 do: [ :each | queue nextPut: each ] ] timeToRun. [ 1 to: 1000 do: [ :each | queue next ] ] timeToRun } ]. results ]) ]. SharedQueue->#(#(464 452) #(472 436) #(466 437) #(463 449) #(462 452)). SharedQueue2->#(#(949 692) #(980 663) #(984 670) #(992 670) #(958 677)). FIFOQueue->#(#(125 67) #(263 62) #(250 76) #(262 63) #(247 81)). AtomicSharedQueue->#(#(154 70) #(264 77) #(273 62) #(275 63) #(265 71)). This is similar to benchmark #1, but instead of adding and removing 1,000,000 at once it's chunked up to 1,000 equal parts. It's more realistic than benchmark #1. It's interesting that both FIFOQueue and AtomicSharedQueue performed better in the previous benchmark, unlike the other two queues, which are better here. Benchmark #3: Concurrent throughput test { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass | | queue semaphore | queue := queueClass new. semaphore := Semaphore new. queueClass -> ( (1 to: 5) collect: [ :run | | producers consumers | Smalltalk garbageCollect. producers := (1 to: 100) collect: [ :each | [ 1 to: 10000 do: [ :index | queue nextPut: each ] ] newProcess ]. consumers := (1 to: 100) collect: [ :each | [ 1 to: 10000 do: [ :index | queue next ]. semaphore signal ] newProcess ]. [ consumers do: [ :each | each priority: 39; resume ]. producers do: [ :each | each priority: 39; resume ]. 100 timesRepeat: [ semaphore wait ] ] timeToRun ]) ]. SharedQueue->#(3143 2977 3034 2949 3021). SharedQueue2->#(4280 4384 4179 4160 4181). FIFOQueue->#(245 311 252 254 255). AtomicSharedQueue->#(277 274 277 280 274) This benchmark is the real concurrent stress test. 100 processes are adding 10,000 elements to the queue while another 100 are reading from it. It clearly shows that Igor's queues are an order of magnitude faster. Also 200 concurrent processes cause much less slowdown compared to the sequential tests for them. So, even though SharedQueue is now faster than SharedQueue2, both will have to go IMHO. :) Levente snip _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>
> This benchmark is the real concurrent stress test. 100 processes are adding 10,000 elements to the queue while another 100 are reading from it. It clearly shows that Igor's queues are an order of magnitude faster. Also 200 concurrent processes cause much less slowdown compared to the sequential tests for them. > > So, even though SharedQueue is now faster than SharedQueue2, both will have to go IMHO. :) Hi Levente Naively why? and replace by what? Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
On 18 October 2010 07:00, Levente Uzonyi <[hidden email]> wrote:
> On Sun, 17 Oct 2010, Levente Uzonyi wrote: > > snip > >> SharedQueue's code for "growing" (#makeRoomAtEnd) is crap IMHO. Because of >> that it takes O(n) time to add or remove and element to or from the queue. >> SharedQueue2 is a lot better approach, because it doesn't try to >> reimplement a dynamic array, but uses OrderedCollection instead. > > I uploaded a new version of that method to the Trunk. I don't think it's > really useful, because I'm pretty sure we will get rid of both SharedQueue > and SharedQueue2 in the near future. > Anyway I did some benchmarks using Cog, and SharedQueue became surprisingly > good, though it's still not even close to Igor's new FIFOQueue or > AtomicSharedQueue. > > Benchmark #1: Make a large queue > > { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ > :queueClass | > | queue | > queue := queueClass new. > queueClass -> ( > (1 to: 5) collect: [ :run | > Smalltalk garbageCollect. > { > [ 1 to: 1000000 do: [ :each | queue nextPut: > each ] ] timeToRun. > [ 1 to: 1000000 do: [ :each | queue next ] ] > timeToRun } ]) ]. > > SharedQueue->#(#(1028 1615) #(945 1557) #(976 2322) #(492 459) #(489 476)). > SharedQueue2->#(#(1976 2284) #(1318 8298) #(982 692) #(1005 675) #(1002 > 665)). > FIFOQueue->#(#(180 67) #(184 67) #(182 69) #(181 66) #(177 67)). > AtomicSharedQueue->#(#(208 66) #(207 67) #(209 66) #(213 68) #(209 65)). > > This benchmark is similar to what Igor used, but it doesn't create a new > queue between runs. It simply adds 1,000,000 elements then removes them > which is a pretty unrealistic scenario for a shared queue. The effect of GC > is pretty high on this benchmark, probably that's responsible for the high > spikes. > > Benchmark #2: Sequential throughput test > > { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ > :queueClass | > | queue | > queue := queueClass new. > queueClass -> ( > (1 to: 5) collect: [ :run | > | results | > Smalltalk garbageCollect. > results := #(0 0). > 1 to: 1000 do: [ :round | > results := results + { > [ 1 to: 1000 do: [ :each | queue > nextPut: each ] ] timeToRun. > [ 1 to: 1000 do: [ :each | queue next > ] ] timeToRun } ]. > results ]) ]. > > SharedQueue->#(#(464 452) #(472 436) #(466 437) #(463 449) #(462 452)). > SharedQueue2->#(#(949 692) #(980 663) #(984 670) #(992 670) #(958 677)). > FIFOQueue->#(#(125 67) #(263 62) #(250 76) #(262 63) #(247 81)). > AtomicSharedQueue->#(#(154 70) #(264 77) #(273 62) #(275 63) #(265 71)). > > This is similar to benchmark #1, but instead of adding and removing > 1,000,000 at once it's chunked up to 1,000 equal parts. It's more realistic > than benchmark #1. It's interesting that both FIFOQueue and > AtomicSharedQueue performed better in the previous benchmark, unlike the > other two queues, which are better here. > > Benchmark #3: Concurrent throughput test > > { SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ > :queueClass | > | queue semaphore | > queue := queueClass new. > semaphore := Semaphore new. > queueClass -> ( > (1 to: 5) collect: [ :run | > | producers consumers | > Smalltalk garbageCollect. > producers := (1 to: 100) collect: [ :each | > [ 1 to: 10000 do: [ :index | queue nextPut: > each ] ] newProcess ]. > consumers := (1 to: 100) collect: [ :each | > [ > 1 to: 10000 do: [ :index | queue next > ]. > semaphore signal ] newProcess ]. > [ > consumers do: [ :each | each priority: 39; > resume ]. > producers do: [ :each | each priority: 39; > resume ]. > 100 timesRepeat: [ semaphore wait ] ] > timeToRun ]) ]. > > SharedQueue->#(3143 2977 3034 2949 3021). > SharedQueue2->#(4280 4384 4179 4160 4181). > FIFOQueue->#(245 311 252 254 255). > AtomicSharedQueue->#(277 274 277 280 274) > > This benchmark is the real concurrent stress test. 100 processes are adding > 10,000 elements to the queue while another 100 are reading from it. It > clearly shows that Igor's queues are an order of magnitude faster. Also 200 > concurrent processes cause much less slowdown compared to the sequential > tests for them. > > So, even though SharedQueue is now faster than SharedQueue2, both will have > to go IMHO. :) > Thanks, Levente for giving a feedback. Please, feel free to shape my classes in more complete from (such as proper naming), to make them ready for inclusion in both Squeak's and Pharo cores. I propose the following names: AtomicQueue (base class) -> AtomicCollection FIFOQueue -> AtomicQueue LIFOQueue -> AtomicStack If you, or anyone else having better suggestions, speak now :) In any case, i'm am open to discuss further details and possible caveats of using new classes to anyone interested in using them. P.S. As a side note, i now can explain (to myself at first place), why i intuitively choosed to used atomic swap instead of CAS. Because it fits better fits with smalltalk language semantics: A swap operation in smalltalk implemented as two assignments: x := y. y := z. An assignments is basic operation, which have nothing to do with late-bound nature of language. Unless we going to introduce a meta-object protocol(s), which could turn a simple assignment into some message sends under the hood, it will remain a basic, early-bound operation. And even if we do, it is highly unlikely, that even then we will throw away the old, simple assignment, which identifies an assignment source & target at compile time. In contrast, a CAS operation , if written in smalltalk looks like: (a == b ) ifTrue: [ a := c ] so, it having two message sends (#== , #ifTrue:), and from strict, pure language perspective, this using a late-bound semantics (a message sends), and as any message send, the message result and behavior cannot be predicted at compile time and therefore its wrong to assume that such statement could be an atomic operation. Unless, of course, we introduce a new language syntax which will denote a CAS operation explicitly. > > Levente > > > snip > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
On Mon, 18 Oct 2010, Stéphane Ducasse wrote:
>> >> This benchmark is the real concurrent stress test. 100 processes are adding 10,000 elements to the queue while another 100 are reading from it. It clearly shows that Igor's queues are an order of magnitude faster. Also 200 concurrent processes cause much less slowdown compared to the sequential tests for them. >> >> So, even though SharedQueue is now faster than SharedQueue2, both will have to go IMHO. :) > > Hi Levente > > Naively why? Did you see the benchmarks? > and replace by what? By Igor's new collections. Though the name should be kept for compatibility reasons. Levente > > Stef > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Igor Stasenko
On Mon, 18 Oct 2010, Igor Stasenko wrote:
snip > Thanks, Levente for giving a feedback. Please, feel free to shape my classes in more complete from (such as proper naming), to make them ready for inclusion in both Squeak's and Pharo cores. I propose the following names: AtomicQueue (base class) -> AtomicCollection FIFOQueue -> AtomicQueue LIFOQueue -> AtomicStack If you, or anyone else having better suggestions, speak now :) I think these should be the names: FIFOQueue -> SharedQueue LIFOQueue -> SharedStack I don't know a really good name for AtomicQueue, maybe SharedList, SharedCollection or SharedListStub. Levente In any case, i'm am open to discuss further details and possible caveats of using new classes to anyone interested in using them. P.S. As a side note, i now can explain (to myself at first place), why i intuitively choosed to used atomic swap instead of CAS. Because it fits better fits with smalltalk language semantics: A swap operation in smalltalk implemented as two assignments: x := y. y := z. An assignments is basic operation, which have nothing to do with late-bound nature of language. Unless we going to introduce a meta-object protocol(s), which could turn a simple assignment into some message sends under the hood, it will remain a basic, early-bound operation. And even if we do, it is highly unlikely, that even then we will throw away the old, simple assignment, which identifies an assignment source & target at compile time. In contrast, a CAS operation , if written in smalltalk looks like: (a == b ) ifTrue: [ a := c ] so, it having two message sends (#== , #ifTrue:), and from strict, pure language perspective, this using a late-bound semantics (a message sends), and as any message send, the message result and behavior cannot be predicted at compile time and therefore its wrong to assume that such statement could be an atomic operation. Unless, of course, we introduce a new language syntax which will denote a CAS operation explicitly. > > Levente > > > snip > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 19 October 2010 02:06, Levente Uzonyi <[hidden email]> wrote:
> On Mon, 18 Oct 2010, Igor Stasenko wrote: > > snip > >> > > Thanks, Levente for giving a feedback. > Please, feel free to shape my classes in more complete from (such as > proper naming), > to make them ready for inclusion in both Squeak's and Pharo cores. > I propose the following names: > AtomicQueue (base class) -> AtomicCollection > FIFOQueue -> AtomicQueue > LIFOQueue -> AtomicStack > If you, or anyone else having better suggestions, speak now :) > > > > I think these should be the names: > > FIFOQueue -> SharedQueue this name already used by Kernel. So, unless my class will fully replace it, i see no way how i could use this name in separate package. Also, i must stress, that behavior of FIFOQueue only attempts to closely resemble the SharedQueue behavior. However, there is a situations (related to how scheduling works and use of #yield), where using them could lead to deadlocks. In this regard, an AtomicSharedQueue (a subclass of FIFOQueue) is much better analogy to current SharedQueue. As i noted in another mail, i see that we might also provide a separate wait-free interface. So we can guarantee, that if you using only wait-free interface, a queue can never be the cause of deadlock. > LIFOQueue -> SharedStack > > I don't know a really good name for AtomicQueue, maybe SharedList, > SharedCollection or SharedListStub. > > > Levente > > > > In any case, i'm am open to discuss further details and possible > caveats of using new classes > to anyone interested in using them. > > P.S. As a side note, i now can explain (to myself at first place), why > i intuitively choosed to used atomic swap > instead of CAS. > Because it fits better fits with smalltalk language semantics: > > A swap operation in smalltalk implemented as two assignments: > x := y. y := z. > An assignments is basic operation, which have nothing to do with > late-bound nature of language. > Unless we going to introduce a meta-object protocol(s), which could > turn a simple assignment > into some message sends under the hood, it will remain a basic, > early-bound operation. > And even if we do, it is highly unlikely, that even then we will throw > away the old, > simple assignment, which identifies an assignment source & target at > compile time. > > In contrast, a CAS operation , if written in smalltalk looks like: > > (a == b ) ifTrue: [ a := c ] > > so, it having two message sends (#== , #ifTrue:), and from strict, > pure language perspective, > this using a late-bound semantics (a message sends), > and as any message send, the message result and behavior cannot be > predicted at compile time > and therefore its wrong to assume that such statement could be an > atomic operation. > > Unless, of course, we introduce a new language syntax which will > denote a CAS operation explicitly. > >> >> Levente >> >> >> snip >> > > > -- > Best regards, > Igor Stasenko AKA sig. > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Tue, 19 Oct 2010, Igor Stasenko wrote:
> On 19 October 2010 02:06, Levente Uzonyi <[hidden email]> wrote: >> On Mon, 18 Oct 2010, Igor Stasenko wrote: >> >> snip >> >>> >> >> Thanks, Levente for giving a feedback. >> Please, feel free to shape my classes in more complete from (such as >> proper naming), >> to make them ready for inclusion in both Squeak's and Pharo cores. >> I propose the following names: >> AtomicQueue (base class) -> AtomicCollection >> FIFOQueue -> AtomicQueue >> LIFOQueue -> AtomicStack >> If you, or anyone else having better suggestions, speak now :) >> >> >> >> I think these should be the names: >> >> FIFOQueue -> SharedQueue > > this name already used by Kernel. > So, unless my class will fully replace it, i see no way how i could > use this name in separate package. Yes, it would be good to replace the implementation IMHO. The API seems to be complete to me (except for copying ;)). > > Also, i must stress, that behavior of FIFOQueue only attempts to > closely resemble the SharedQueue behavior. > However, there is a situations (related to how scheduling works and > use of #yield), where using them could lead to deadlocks. > In this regard, an AtomicSharedQueue (a subclass of FIFOQueue) is much > better analogy to current SharedQueue. If you mean the case: "if process A tries to read from an empty queue, later process B tries to do the same, then process A is guaranteed to read before process B", then that shouldn't be a problem. It would require an external synchronization step to make use of this feature with the current implementation. I doubt that anyone wrote such code ever. > > As i noted in another mail, i see that we might also provide a > separate wait-free interface. So we can guarantee, > that if you using only wait-free interface, a queue can never be the > cause of deadlock. That's great, and it can be a future addition even if we push the current implementation to the Trunk. Levente > >> LIFOQueue -> SharedStack >> >> I don't know a really good name for AtomicQueue, maybe SharedList, >> SharedCollection or SharedListStub. >> >> >> Levente >> >> >> >> In any case, i'm am open to discuss further details and possible >> caveats of using new classes >> to anyone interested in using them. >> >> P.S. As a side note, i now can explain (to myself at first place), why >> i intuitively choosed to used atomic swap >> instead of CAS. >> Because it fits better fits with smalltalk language semantics: >> >> A swap operation in smalltalk implemented as two assignments: >> x := y. y := z. >> An assignments is basic operation, which have nothing to do with >> late-bound nature of language. >> Unless we going to introduce a meta-object protocol(s), which could >> turn a simple assignment >> into some message sends under the hood, it will remain a basic, >> early-bound operation. >> And even if we do, it is highly unlikely, that even then we will throw >> away the old, >> simple assignment, which identifies an assignment source & target at >> compile time. >> >> In contrast, a CAS operation , if written in smalltalk looks like: >> >> (a == b ) ifTrue: [ a := c ] >> >> so, it having two message sends (#== , #ifTrue:), and from strict, >> pure language perspective, >> this using a late-bound semantics (a message sends), >> and as any message send, the message result and behavior cannot be >> predicted at compile time >> and therefore its wrong to assume that such statement could be an >> atomic operation. >> >> Unless, of course, we introduce a new language syntax which will >> denote a CAS operation explicitly. >> >>> >>> Levente >>> >>> >>> snip >>> >> >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 19 October 2010 03:42, Levente Uzonyi <[hidden email]> wrote:
> On Tue, 19 Oct 2010, Igor Stasenko wrote: > >> On 19 October 2010 02:06, Levente Uzonyi <[hidden email]> wrote: >>> >>> On Mon, 18 Oct 2010, Igor Stasenko wrote: >>> >>> snip >>> >>>> >>> >>> Thanks, Levente for giving a feedback. >>> Please, feel free to shape my classes in more complete from (such as >>> proper naming), >>> to make them ready for inclusion in both Squeak's and Pharo cores. >>> I propose the following names: >>> AtomicQueue (base class) -> AtomicCollection >>> FIFOQueue -> AtomicQueue >>> LIFOQueue -> AtomicStack >>> If you, or anyone else having better suggestions, speak now :) >>> >>> >>> >>> I think these should be the names: >>> >>> FIFOQueue -> SharedQueue >> >> this name already used by Kernel. >> So, unless my class will fully replace it, i see no way how i could >> use this name in separate package. > > Yes, it would be good to replace the implementation IMHO. The API seems to > be complete to me (except for copying ;)). > copy ^ self errorDontCopy errorDontCopy "copying a structure, involved in concurrent operations is a bad idea" ^ self error: 'Copying not available' :) See how Squeak's EventSensor doing right thing to make a 'copy': EventSensor>>flushAllButDandDEvents | newQueue oldQueue | newQueue := SharedQueue new. self eventQueue ifNil: [eventQueue := newQueue. ^self]. oldQueue := self eventQueue. [oldQueue size > 0] whileTrue: [| item type | item := oldQueue next. type := item at: 1. type = EventTypeDragDropFiles ifTrue: [ newQueue nextPut: item]]. eventQueue := newQueue. Well, you might be right, that #copy can be implemented as: copy | copy | copy := self class new. [ copy put: (self nextIfNone: [ ^ copy ] ) ] repeat if it makes any sense, to anyone.. Its hard to imagine a situation where one may need to copy existing queue, because he simply can keep using it. >> >> Also, i must stress, that behavior of FIFOQueue only attempts to >> closely resemble the SharedQueue behavior. >> However, there is a situations (related to how scheduling works and >> use of #yield), where using them could lead to deadlocks. >> In this regard, an AtomicSharedQueue (a subclass of FIFOQueue) is much >> better analogy to current SharedQueue. > > If you mean the case: "if process A tries to read from an empty queue, later > process B tries to do the same, then process A is guaranteed to read before > process B", then that shouldn't be a problem. It would require an external > synchronization step to make use of this feature with the current > implementation. I doubt that anyone wrote such code ever. > how #yield primitive works. Here the VM's primitiveYield: primitiveYield "primitively do the equivalent of Process>yield" | activeProc priority processLists processList | activeProc := self fetchPointer: ActiveProcessIndex ofObject: self schedulerPointer. priority := self quickFetchInteger: PriorityIndex ofObject: activeProc. processLists := self fetchPointer: ProcessListsIndex ofObject: self schedulerPointer. processList := self fetchPointer: priority - 1 ofObject: processLists. (self isEmptyList: processList) ifFalse:[ self addLastLink: activeProc toList: processList. self transferTo: self wakeHighestPriority] Note #wakeHighestPriority. So, a fetcher (which using #yield in spin loop) with priority higher than pusher process, will loop infinitely blocking pusher and all lower priority processes from advancing. To avoid this problem, one should make sure that process which pushing new items to queue having either higher or same priority as any fetching process(es) using same queue. Or use wait-free access to queue (avoid use #next, use #nextOrNil instead). That's why in subclass - AtomicSharedQueue, i using semaphore to workaround this issue. And potentially, it would be good some day to have a way to say to scheduler: please stop current process and see if you can run anything with lower priority. (Another reason to move scheduling to language side, so we are free to modify it in a way we like ;). >> >> As i noted in another mail, i see that we might also provide a >> separate wait-free interface. So we can guarantee, >> that if you using only wait-free interface, a queue can never be the >> cause of deadlock. > > That's great, and it can be a future addition even if we push the current > implementation to the Trunk. > Yes. I think that for most cases in concurrent environment, a wait-free access is preferable way to work with queues. > > Levente > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Tue, 19 Oct 2010, Igor Stasenko wrote:
> On 19 October 2010 03:42, Levente Uzonyi <[hidden email]> wrote: >> On Tue, 19 Oct 2010, Igor Stasenko wrote: >> >>> On 19 October 2010 02:06, Levente Uzonyi <[hidden email]> wrote: >>>> >>>> On Mon, 18 Oct 2010, Igor Stasenko wrote: >>>> >>>> snip >>>> >>>>> >>>> >>>> Thanks, Levente for giving a feedback. >>>> Please, feel free to shape my classes in more complete from (such as >>>> proper naming), >>>> to make them ready for inclusion in both Squeak's and Pharo cores. >>>> I propose the following names: >>>> AtomicQueue (base class) -> AtomicCollection >>>> FIFOQueue -> AtomicQueue >>>> LIFOQueue -> AtomicStack >>>> If you, or anyone else having better suggestions, speak now :) >>>> >>>> >>>> >>>> I think these should be the names: >>>> >>>> FIFOQueue -> SharedQueue >>> >>> this name already used by Kernel. >>> So, unless my class will fully replace it, i see no way how i could >>> use this name in separate package. >> >> Yes, it would be good to replace the implementation IMHO. The API seems to >> be complete to me (except for copying ;)). >> > You mean this: > > copy > ^ self errorDontCopy > > errorDontCopy > "copying a structure, involved in concurrent operations is a bad idea" > ^ self error: 'Copying not available' > > :) > > See how Squeak's EventSensor doing right thing to make a 'copy': > > EventSensor>>flushAllButDandDEvents > | newQueue oldQueue | > > newQueue := SharedQueue new. > self eventQueue ifNil: > [eventQueue := newQueue. > ^self]. > oldQueue := self eventQueue. > [oldQueue size > 0] whileTrue: > [| item type | > item := oldQueue next. > type := item at: 1. > type = EventTypeDragDropFiles ifTrue: [ newQueue nextPut: item]]. > eventQueue := newQueue. > > Well, you might be right, that #copy can be implemented as: > > copy > | copy | > copy := self class new. > [ copy put: (self nextIfNone: [ ^ copy ] ) ] repeat > > if it makes any sense, to anyone.. > > Its hard to imagine a situation where one may need to copy existing queue, > because he simply can keep using it. I didn't say that I find copying useful, but the API is different. > > >>> >>> Also, i must stress, that behavior of FIFOQueue only attempts to >>> closely resemble the SharedQueue behavior. >>> However, there is a situations (related to how scheduling works and >>> use of #yield), where using them could lead to deadlocks. >>> In this regard, an AtomicSharedQueue (a subclass of FIFOQueue) is much >>> better analogy to current SharedQueue. >> >> If you mean the case: "if process A tries to read from an empty queue, later >> process B tries to do the same, then process A is guaranteed to read before >> process B", then that shouldn't be a problem. It would require an external >> synchronization step to make use of this feature with the current >> implementation. I doubt that anyone wrote such code ever. >> > No. The problems is not in that. The problem related to scheduling and > how #yield primitive works. > > Here the VM's primitiveYield: > > primitiveYield > "primitively do the equivalent of Process>yield" > | activeProc priority processLists processList | > activeProc := self fetchPointer: ActiveProcessIndex > ofObject: self schedulerPointer. > priority := self quickFetchInteger: PriorityIndex ofObject: activeProc. > processLists := self fetchPointer: ProcessListsIndex ofObject: self > schedulerPointer. > processList := self fetchPointer: priority - 1 ofObject: processLists. > > (self isEmptyList: processList) ifFalse:[ > self addLastLink: activeProc toList: processList. > self transferTo: self wakeHighestPriority] > > Note #wakeHighestPriority. > > So, a fetcher (which using #yield in spin loop) with priority higher > than pusher process, will loop infinitely > blocking pusher and all lower priority processes from advancing. > > To avoid this problem, one should make sure that process which pushing > new items to queue > having either higher or same priority as any fetching process(es) > using same queue. > Or use wait-free access to queue (avoid use #next, use #nextOrNil instead). > > That's why in subclass - AtomicSharedQueue, i using semaphore to > workaround this issue. Okay. So AtomicSharedQueue is the class which can be used to replace SharedQueue. So the names could be: LIFOQueue -> AtomicStack FIFOQueue -> AtomicQueue AtomicSharedQueue -> SharedQueue > > And potentially, it would be good some day to have a way to say to scheduler: > please stop current process and see if you can run anything with lower priority. That would break the current scheduling policy. Levente > (Another reason to move scheduling to language side, so we are free to > modify it in a way we like ;). > >>> >>> As i noted in another mail, i see that we might also provide a >>> separate wait-free interface. So we can guarantee, >>> that if you using only wait-free interface, a queue can never be the >>> cause of deadlock. >> >> That's great, and it can be a future addition even if we push the current >> implementation to the Trunk. >> > > Yes. I think that for most cases in concurrent environment, a > wait-free access is preferable way to work with queues. > > >> >> Levente >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 19 October 2010 06:29, Levente Uzonyi <[hidden email]> wrote:
> On Tue, 19 Oct 2010, Igor Stasenko wrote: > >> On 19 October 2010 03:42, Levente Uzonyi <[hidden email]> wrote: >>> >>> On Tue, 19 Oct 2010, Igor Stasenko wrote: >>> >>>> On 19 October 2010 02:06, Levente Uzonyi <[hidden email]> wrote: >>>>> >>>>> On Mon, 18 Oct 2010, Igor Stasenko wrote: >>>>> >>>>> snip >>>>> >>>>>> >>>>> >>>>> Thanks, Levente for giving a feedback. >>>>> Please, feel free to shape my classes in more complete from (such as >>>>> proper naming), >>>>> to make them ready for inclusion in both Squeak's and Pharo cores. >>>>> I propose the following names: >>>>> AtomicQueue (base class) -> AtomicCollection >>>>> FIFOQueue -> AtomicQueue >>>>> LIFOQueue -> AtomicStack >>>>> If you, or anyone else having better suggestions, speak now :) >>>>> >>>>> >>>>> >>>>> I think these should be the names: >>>>> >>>>> FIFOQueue -> SharedQueue >>>> >>>> this name already used by Kernel. >>>> So, unless my class will fully replace it, i see no way how i could >>>> use this name in separate package. >>> >>> Yes, it would be good to replace the implementation IMHO. The API seems >>> to >>> be complete to me (except for copying ;)). >>> >> You mean this: >> >> copy >> ^ self errorDontCopy >> >> errorDontCopy >> "copying a structure, involved in concurrent operations is a bad >> idea" >> ^ self error: 'Copying not available' >> >> :) >> >> See how Squeak's EventSensor doing right thing to make a 'copy': >> >> EventSensor>>flushAllButDandDEvents >> | newQueue oldQueue | >> >> newQueue := SharedQueue new. >> self eventQueue ifNil: >> [eventQueue := newQueue. >> ^self]. >> oldQueue := self eventQueue. >> [oldQueue size > 0] whileTrue: >> [| item type | >> item := oldQueue next. >> type := item at: 1. >> type = EventTypeDragDropFiles ifTrue: [ newQueue nextPut: >> item]]. >> eventQueue := newQueue. >> >> Well, you might be right, that #copy can be implemented as: >> >> copy >> | copy | >> copy := self class new. >> [ copy put: (self nextIfNone: [ ^ copy ] ) ] repeat >> >> if it makes any sense, to anyone.. >> >> Its hard to imagine a situation where one may need to copy existing queue, >> because he simply can keep using it. > > I didn't say that I find copying useful, but the API is different. > >> >> >>>> >>>> Also, i must stress, that behavior of FIFOQueue only attempts to >>>> closely resemble the SharedQueue behavior. >>>> However, there is a situations (related to how scheduling works and >>>> use of #yield), where using them could lead to deadlocks. >>>> In this regard, an AtomicSharedQueue (a subclass of FIFOQueue) is much >>>> better analogy to current SharedQueue. >>> >>> If you mean the case: "if process A tries to read from an empty queue, >>> later >>> process B tries to do the same, then process A is guaranteed to read >>> before >>> process B", then that shouldn't be a problem. It would require an >>> external >>> synchronization step to make use of this feature with the current >>> implementation. I doubt that anyone wrote such code ever. >>> >> No. The problems is not in that. The problem related to scheduling and >> how #yield primitive works. >> >> Here the VM's primitiveYield: >> >> primitiveYield >> "primitively do the equivalent of Process>yield" >> | activeProc priority processLists processList | >> activeProc := self fetchPointer: ActiveProcessIndex >> ofObject: self >> schedulerPointer. >> priority := self quickFetchInteger: PriorityIndex ofObject: >> activeProc. >> processLists := self fetchPointer: ProcessListsIndex ofObject: self >> schedulerPointer. >> processList := self fetchPointer: priority - 1 ofObject: >> processLists. >> >> (self isEmptyList: processList) ifFalse:[ >> self addLastLink: activeProc toList: processList. >> self transferTo: self wakeHighestPriority] >> >> Note #wakeHighestPriority. >> >> So, a fetcher (which using #yield in spin loop) with priority higher >> than pusher process, will loop infinitely >> blocking pusher and all lower priority processes from advancing. >> >> To avoid this problem, one should make sure that process which pushing >> new items to queue >> having either higher or same priority as any fetching process(es) >> using same queue. >> Or use wait-free access to queue (avoid use #next, use #nextOrNil >> instead). >> >> That's why in subclass - AtomicSharedQueue, i using semaphore to >> workaround this issue. > > Okay. So AtomicSharedQueue is the class which can be used to replace > SharedQueue. So the names could be: > > LIFOQueue -> AtomicStack > FIFOQueue -> AtomicQueue > AtomicSharedQueue -> SharedQueue > Right, and i think i'll move all non wait-free methods (like #next ) into SharedQueue, so AtomicQueue will not contain #next in its protocol, and will be purely a wait-free based implementation. >> >> And potentially, it would be good some day to have a way to say to >> scheduler: >> please stop current process and see if you can run anything with lower >> priority. > > That would break the current scheduling policy. > I prefer an 'alter' word as in 'alternative' :) > > Levente > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
On Oct 18, 2010, at 10:03 PM, Levente Uzonyi wrote: > On Mon, 18 Oct 2010, Stéphane Ducasse wrote: > >>> >>> This benchmark is the real concurrent stress test. 100 processes are adding 10,000 elements to the queue while another 100 are reading from it. It clearly shows that Igor's queues are an order of magnitude faster. Also 200 concurrent processes cause much less slowdown compared to the sequential tests for them. >>> >>> So, even though SharedQueue is now faster than SharedQueue2, both will have to go IMHO. :) >> >> Hi Levente >> >> Naively why? > > Did you see the benchmarks? > >> and replace by what? > > By Igor's new collections. Though the name should be kept for compatibility reasons. Ok I was confused... too many emails and threads. So yes! Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |