Hello guys,
i hope you are interested in having a rock-solid & safe concurrency in smalltalk, despite it green-threaded. A new implementation of scheduler will allow us to improve stuff and be no longer dependant from VM side scheduling policy. You can find the changes on mantis: http://bugs.squeak.org/view.php?id=7345 Please, give me a feedback, if you can successfully build VM with new changes on all platforms. I used a pretty old VMMaker package version for my VM: VMMaker-dtl.91 Will try to see if it can be built with latest one , available on SqS. Andreas (& others) - do you have any tests/benchmarks, how to measure a new scheduler overhead comparing to old one? Personally, i didn't noticed much speed degradation :) And last thing, i need an advice how to implement a #callbackEnter: properly. Its currently leaved unchanged, using old VM scheduling policy. So, its unsafe to use plugins with callbacks if you running under new scheduler. A callbackEnter remembers the active process, suspends it and then activates it back when callback is done. The problem is, that squeak 3.10 doesn't have any image-side code which i could change to play with callbacks :( A fix for callbacks should be pretty easy: - at entry, deactivate active process (which should be NOT the interrupt process) & switch to interrupt process, so it could schedule next available process. - when leaving switch from interrupt process to previously remembered active process. So, a certain preconditions should be met: - a code which calls a primitives which using callbacks should never run in interruptProcess - a code , which returns from a callback should run in interruptProcess -- Best regards, Igor Stasenko AKA sig. |
i added a new changeset in mantis which deals with callbackEnter: code.
Unfortunately, a callback handling , to work correctly needs an image-side changes as well. P.S. Having fun to watch how scheduler handling signals, by inspecting Process and selecting its 'signals' ivar :) -- Best regards, Igor Stasenko AKA sig. |
A quick comparion for Delays:
delay := Delay forMilliseconds: 1. bag := Bag new. 1000 timesRepeat:[bag add: [delay wait] timeToRun]. bag sortedCounts on my 4-core box it yields: - with AdvancedProcessorScheduler install a SortedCollection(951->2 49->1) - with Processor fallbackToOldScheduler a SortedCollection(953->2 47->1) - with old VM a SortedCollection(952->2 47->1 1->3) not much overhead huh? :) But Delay's code now can be refactored, which will simplify it seriously, because it could run in interrupt process and take no care for concurrency issues. 2009/4/30 Igor Stasenko <[hidden email]>: > i added a new changeset in mantis which deals with callbackEnter: code. > Unfortunately, a callback handling , to work correctly needs an > image-side changes as well. > > P.S. Having fun to watch how scheduler handling signals, by inspecting > Process and selecting its 'signals' ivar :) > > -- > Best regards, > Igor Stasenko AKA sig. > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote:
> A quick comparion for Delays: > > delay := Delay forMilliseconds: 1. > bag := Bag new. > 1000 timesRepeat:[bag add: [delay wait] timeToRun]. > bag sortedCounts > > on my 4-core box it yields: > > - with AdvancedProcessorScheduler install > a SortedCollection(951->2 49->1) > > - with Processor fallbackToOldScheduler > a SortedCollection(953->2 47->1) > > - with old VM > a SortedCollection(952->2 47->1 1->3) > > not much overhead huh? :) That's not exactly the kind of benchmark I was looking for (if your process scheduler takes milliseconds to do a switch I think we're not even close to the ballpark ;-) More interesting for comparison is this (requires closures): semas := Array new: 10000. plist := Array new: 10000. 1 to: semas size do:[:i| semas at: i put: Semaphore new]. 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas at:i+1) signal] fork]. [semas first signal. semas last wait] timeToRun. Cheers, - Andreas |
2009/4/30 Andreas Raab <[hidden email]>:
> Igor Stasenko wrote: >> >> A quick comparion for Delays: >> >> delay := Delay forMilliseconds: 1. >> bag := Bag new. >> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >> bag sortedCounts >> >> on my 4-core box it yields: >> >> - with AdvancedProcessorScheduler install >> a SortedCollection(951->2 49->1) >> >> - with Processor fallbackToOldScheduler >> a SortedCollection(953->2 47->1) >> >> - with old VM >> a SortedCollection(952->2 47->1 1->3) >> >> not much overhead huh? :) > > That's not exactly the kind of benchmark I was looking for (if your process > scheduler takes milliseconds to do a switch I think we're not even close to > the ballpark ;-) More interesting for comparison is this (requires > closures): > > semas := Array new: 10000. > plist := Array new: 10000. > 1 to: semas size do:[:i| semas at: i put: Semaphore new]. > 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas > at:i+1) signal] fork]. > [semas first signal. semas last wait] timeToRun. > I not yet tried to apply changes to VMMaker with closures. Do you have a recipe, which base image could load latest VMMaker w/o problems? Is 3.10.2 ok for it? > Cheers, > - Andreas > > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote:
> I not yet tried to apply changes to VMMaker with closures. > Do you have a recipe, which base image could load latest VMMaker w/o > problems? Is 3.10.2 ok for it? Yes, 3.10.2 is fine. Cheers, - Andreas |
In reply to this post by Igor Stasenko
Running this test with #fixTemps.
semas := Array new: 10000. plist := Array new: 10000. 1 to: semas size do:[:i| semas at: i put: Semaphore new]. 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas at:i+1) signal] fixTemps fork]. [semas first signal. semas last wait] timeToRun. New: 63 63 148 65 61 Old: 94 27 92 22 22 23 23 if we not count a spikes which seems just an intrusion of delays, the difference about 3 times. Fantastic results (bytecode interpreted scheduling works 3 times slower than C code) :) After changing 2 lines of code (to use #interruptWith: , with no fallbacks) 50 125 50 49 48 48 60 so, now its about 2 times difference. 2009/4/30 Igor Stasenko <[hidden email]>: > 2009/4/30 Andreas Raab <[hidden email]>: >> Igor Stasenko wrote: >>> >>> A quick comparion for Delays: >>> >>> delay := Delay forMilliseconds: 1. >>> bag := Bag new. >>> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >>> bag sortedCounts >>> >>> on my 4-core box it yields: >>> >>> - with AdvancedProcessorScheduler install >>> a SortedCollection(951->2 49->1) >>> >>> - with Processor fallbackToOldScheduler >>> a SortedCollection(953->2 47->1) >>> >>> - with old VM >>> a SortedCollection(952->2 47->1 1->3) >>> >>> not much overhead huh? :) >> >> That's not exactly the kind of benchmark I was looking for (if your process >> scheduler takes milliseconds to do a switch I think we're not even close to >> the ballpark ;-) More interesting for comparison is this (requires >> closures): >> >> semas := Array new: 10000. >> plist := Array new: 10000. >> 1 to: semas size do:[:i| semas at: i put: Semaphore new]. >> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas >> at:i+1) signal] fork]. >> [semas first signal. semas last wait] timeToRun. >> > > I not yet tried to apply changes to VMMaker with closures. > Do you have a recipe, which base image could load latest VMMaker w/o > problems? Is 3.10.2 ok for it? > >> Cheers, >> - Andreas >> >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote:
> Running this test with #fixTemps. > if we not count a spikes which seems just an intrusion of delays, the > difference about 3 times. > Fantastic results (bytecode interpreted scheduling works 3 times > slower than C code) :) Indeed. That is quite impressive since every single process switch now requires two process switches (one to the scheduler process and one away from it). That is very encouraging. Cheers, - Andreas |
2009/4/30 Andreas Raab <[hidden email]>:
> Igor Stasenko wrote: >> >> Running this test with #fixTemps. >> if we not count a spikes which seems just an intrusion of delays, the >> difference about 3 times. >> Fantastic results (bytecode interpreted scheduling works 3 times >> slower than C code) :) > > Indeed. That is quite impressive since every single process switch now > requires two process switches (one to the scheduler process and one away > from it). That is very encouraging. > And keep in mind, there is a space for optimization, if we make these changes non-backwards compatible, especially wiping a lot of cruft from VM :) Ok. i will try apply this to closure VM. > Cheers, > - Andreas > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Andreas.Raab
A better benchmark is attached. It allows adjusting the number of
processes and the total time the benchmark should run and answers the number of process switches per second. This is our "standard" process switch benchmark. Run it like here: 100 "processes" benchSwitch: 5. "seconds" This will switch between 100 processes for 5 seconds. Cheers, - Andreas Andreas Raab wrote: > Igor Stasenko wrote: >> A quick comparion for Delays: >> >> delay := Delay forMilliseconds: 1. >> bag := Bag new. >> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >> bag sortedCounts >> >> on my 4-core box it yields: >> >> - with AdvancedProcessorScheduler install >> a SortedCollection(951->2 49->1) >> >> - with Processor fallbackToOldScheduler >> a SortedCollection(953->2 47->1) >> >> - with old VM >> a SortedCollection(952->2 47->1 1->3) >> >> not much overhead huh? :) > > That's not exactly the kind of benchmark I was looking for (if your > process scheduler takes milliseconds to do a switch I think we're not > even close to the ballpark ;-) More interesting for comparison is this > (requires closures): > > semas := Array new: 10000. > plist := Array new: 10000. > 1 to: semas size do:[:i| semas at: i put: Semaphore new]. > 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. > (semas at:i+1) signal] fork]. > [semas first signal. semas last wait] timeToRun. > > Cheers, > - Andreas > > BenchSwitch.1.cs (2K) Download Attachment |
2009/4/30 Andreas Raab <[hidden email]>:
> A better benchmark is attached. It allows adjusting the number of processes > and the total time the benchmark should run and answers the number of > process switches per second. This is our "standard" process switch > benchmark. Run it like here: > > 100 "processes" benchSwitch: 5. "seconds" > I have built a new VM with closure support (VMMaker.dtl.120). A changeset can be loaded to VMMaker w/o changes, except changing the Interpreter class redefinition. All is needed is to add 3 class vars to Interpreter manually: ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex Running squeak-3.10 (no closure compiler). 100 "processes" benchSwitch: 5. "seconds" Processor fallbackToOldScheduler '1,156,135 switches/sec' '1,144,994 switches/sec' '1,147,744 switches/sec' AdvancedProcessorScheduler install '148,067 switches/sec' '141,211 switches/sec' '141,692 switches/sec' '148,650 switches/sec' (1156/148) ~~ 7.8 ratio > This will switch between 100 processes for 5 seconds. > > Cheers, > - Andreas > > Andreas Raab wrote: >> >> Igor Stasenko wrote: >>> >>> A quick comparion for Delays: >>> >>> delay := Delay forMilliseconds: 1. >>> bag := Bag new. >>> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >>> bag sortedCounts >>> >>> on my 4-core box it yields: >>> >>> - with AdvancedProcessorScheduler install >>> a SortedCollection(951->2 49->1) >>> >>> - with Processor fallbackToOldScheduler >>> a SortedCollection(953->2 47->1) >>> >>> - with old VM >>> a SortedCollection(952->2 47->1 1->3) >>> >>> not much overhead huh? :) >> >> That's not exactly the kind of benchmark I was looking for (if your >> process scheduler takes milliseconds to do a switch I think we're not even >> close to the ballpark ;-) More interesting for comparison is this (requires >> closures): >> >> semas := Array new: 10000. >> plist := Array new: 10000. >> 1 to: semas size do:[:i| semas at: i put: Semaphore new]. >> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas >> at:i+1) signal] fork]. >> [semas first signal. semas last wait] timeToRun. >> >> Cheers, >> - Andreas >> >> > > > > > -- Best regards, Igor Stasenko AKA sig. |
When this will be in the official VMs? And to use or benefit this
improvement should previous code (especially Seaside, I'm using it) be modified? Thanks On May 1, 2009, at 5:11 AM, Igor Stasenko wrote: > 2009/4/30 Andreas Raab <[hidden email]>: >> A better benchmark is attached. It allows adjusting the number of >> processes >> and the total time the benchmark should run and answers the number of >> process switches per second. This is our "standard" process switch >> benchmark. Run it like here: >> >> 100 "processes" benchSwitch: 5. "seconds" >> > > I have built a new VM with closure support (VMMaker.dtl.120). > A changeset can be loaded to VMMaker w/o changes, except changing the > Interpreter class redefinition. > All is needed is to add 3 class vars to Interpreter manually: > ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex > > Running squeak-3.10 (no closure compiler). > > 100 "processes" benchSwitch: 5. "seconds" > > Processor fallbackToOldScheduler > > '1,156,135 switches/sec' > '1,144,994 switches/sec' > '1,147,744 switches/sec' > > AdvancedProcessorScheduler install > > '148,067 switches/sec' > '141,211 switches/sec' > '141,692 switches/sec' > '148,650 switches/sec' > > (1156/148) ~~ 7.8 ratio > > > >> This will switch between 100 processes for 5 seconds. >> >> Cheers, >> - Andreas >> >> Andreas Raab wrote: >>> >>> Igor Stasenko wrote: >>>> >>>> A quick comparion for Delays: >>>> >>>> delay := Delay forMilliseconds: 1. >>>> bag := Bag new. >>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >>>> bag sortedCounts >>>> >>>> on my 4-core box it yields: >>>> >>>> - with AdvancedProcessorScheduler install >>>> a SortedCollection(951->2 49->1) >>>> >>>> - with Processor fallbackToOldScheduler >>>> a SortedCollection(953->2 47->1) >>>> >>>> - with old VM >>>> a SortedCollection(952->2 47->1 1->3) >>>> >>>> not much overhead huh? :) >>> >>> That's not exactly the kind of benchmark I was looking for (if your >>> process scheduler takes milliseconds to do a switch I think we're >>> not even >>> close to the ballpark ;-) More interesting for comparison is this >>> (requires >>> closures): >>> >>> semas := Array new: 10000. >>> plist := Array new: 10000. >>> 1 to: semas size do:[:i| semas at: i put: Semaphore new]. >>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. >>> (semas >>> at:i+1) signal] fork]. >>> [semas first signal. semas last wait] timeToRun. >>> >>> Cheers, >>> - Andreas >>> >>> >> >> >> >> >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > > |
Chun, Sungjin wrote:
> When this will be in the official VMs? And to use or benefit this > improvement should > previous code (especially Seaside, I'm using it) be modified? I think that you read the results wrong... the new code runs the benchmark about 8x slower than the existing code. Cheers, Josh > > Thanks > > > On May 1, 2009, at 5:11 AM, Igor Stasenko wrote: > >> 2009/4/30 Andreas Raab <[hidden email]>: >>> A better benchmark is attached. It allows adjusting the number of >>> processes >>> and the total time the benchmark should run and answers the number of >>> process switches per second. This is our "standard" process switch >>> benchmark. Run it like here: >>> >>> 100 "processes" benchSwitch: 5. "seconds" >>> >> >> I have built a new VM with closure support (VMMaker.dtl.120). >> A changeset can be loaded to VMMaker w/o changes, except changing the >> Interpreter class redefinition. >> All is needed is to add 3 class vars to Interpreter manually: >> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex >> >> Running squeak-3.10 (no closure compiler). >> >> 100 "processes" benchSwitch: 5. "seconds" >> >> Processor fallbackToOldScheduler >> >> '1,156,135 switches/sec' >> '1,144,994 switches/sec' >> '1,147,744 switches/sec' >> >> AdvancedProcessorScheduler install >> >> '148,067 switches/sec' >> '141,211 switches/sec' >> '141,692 switches/sec' >> '148,650 switches/sec' >> >> (1156/148) ~~ 7.8 ratio >> >> >> >>> This will switch between 100 processes for 5 seconds. >>> >>> Cheers, >>> - Andreas >>> >>> Andreas Raab wrote: >>>> >>>> Igor Stasenko wrote: >>>>> >>>>> A quick comparion for Delays: >>>>> >>>>> delay := Delay forMilliseconds: 1. >>>>> bag := Bag new. >>>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun]. >>>>> bag sortedCounts >>>>> >>>>> on my 4-core box it yields: >>>>> >>>>> - with AdvancedProcessorScheduler install >>>>> a SortedCollection(951->2 49->1) >>>>> >>>>> - with Processor fallbackToOldScheduler >>>>> a SortedCollection(953->2 47->1) >>>>> >>>>> - with old VM >>>>> a SortedCollection(952->2 47->1 1->3) >>>>> >>>>> not much overhead huh? :) >>>> >>>> That's not exactly the kind of benchmark I was looking for (if your >>>> process scheduler takes milliseconds to do a switch I think we're >>>> not even >>>> close to the ballpark ;-) More interesting for comparison is this >>>> (requires >>>> closures): >>>> >>>> semas := Array new: 10000. >>>> plist := Array new: 10000. >>>> 1 to: semas size do:[:i| semas at: i put: Semaphore new]. >>>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. >>>> (semas >>>> at:i+1) signal] fork]. >>>> [semas first signal. semas last wait] timeToRun. >>>> >>>> Cheers, >>>> - Andreas >>>> >>>> >>> >>> >>> >>> >>> >> >> >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> >> > > |
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
> I have built a new VM with closure support (VMMaker.dtl.120). > A changeset can be loaded to VMMaker w/o changes, except changing the > Interpreter class redefinition. > All is needed is to add 3 class vars to Interpreter manually: > ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex Hm ... I compiled a VM with your changes but I can't recreate your results. My results are more in the range of 15:1 to 20:1 in a before-after comparison. Are you using the code from http://bugs.squeak.org/view.php?id=7345 or something more elaborate by now? Cheers, - Andreas |
2009/5/1 Andreas Raab <[hidden email]>:
> Igor Stasenko wrote: >> >> I have built a new VM with closure support (VMMaker.dtl.120). >> A changeset can be loaded to VMMaker w/o changes, except changing the >> Interpreter class redefinition. >> All is needed is to add 3 class vars to Interpreter manually: >> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex > > Hm ... I compiled a VM with your changes but I can't recreate your results. > My results are more in the range of 15:1 to 20:1 in a before-after > comparison. Are you using the code from > http://bugs.squeak.org/view.php?id=7345 or something more elaborate by now? > improved a bit. I checked everything thoughoutly, and since performance degrades linearly with each introduced message send, i don't think its because of bogus code. The overhead is because we using interpreter to switch processes: -removing process from a list - adding process to a list etc etc. Of course, i could make an additional prims to speed things up for a little, like doing LinkedList>>removeFirst: LinkedList>>addLast: primitively. But i wouldn't care about it now. I want to explore the real benefits of having scheduling logic at language side - like reimplementing Delays, adding nicer process termination procedure (see my other thread) etc. > Cheers, > - Andreas > > -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |