Smalltalk › Squeak › Squeak - Dev

[squeak-dev] [ANN] A new scheduler + VM changes alpha-release

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

15 messages Options

Igor Stasenko

[squeak-dev] [ANN] A new scheduler + VM changes alpha-release

Hello guys,
i hope you are interested in having a rock-solid & safe concurrency in
smalltalk, despite it green-threaded.

A new implementation of scheduler will allow us to improve stuff and
be no longer dependant from VM side scheduling policy.

You can find the changes on mantis:

http://bugs.squeak.org/view.php?id=7345

Please, give me a feedback, if you can successfully build VM with new
changes on all platforms.
I used a pretty old VMMaker package version for my VM: VMMaker-dtl.91

Will try to see if it can be built with latest one , available on SqS.

Andreas (& others) - do you have any tests/benchmarks, how to measure
a new scheduler overhead comparing to old one?

Personally, i didn't noticed much speed degradation :)

And last thing, i need an advice how to implement a #callbackEnter: properly.
Its currently leaved unchanged, using old VM scheduling policy. So,
its unsafe to use plugins with callbacks if you running under new
scheduler.
A callbackEnter remembers the active process, suspends it and then
activates it back when callback is done.
The problem is, that squeak 3.10 doesn't have any image-side code
which i could change to play with callbacks :(

A fix for callbacks should be pretty easy:
- at entry, deactivate active process (which should be NOT the
interrupt process) & switch to interrupt process, so it could schedule
next available process.
- when leaving switch from interrupt process to previously remembered
active process.
So, a certain preconditions should be met:
- a code which calls a primitives which using callbacks should never
run in interruptProcess
- a code , which returns from a callback should run in interruptProcess

--
Best regards,
Igor Stasenko AKA sig.

Igor Stasenko

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

i added a new changeset in mantis which deals with callbackEnter: code.
Unfortunately, a callback handling , to work correctly needs an
image-side changes as well.

P.S. Having fun to watch how scheduler handling signals, by inspecting
Process and selecting its 'signals' ivar :)

--
Best regards,
Igor Stasenko AKA sig.

Igor Stasenko

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

A quick comparion for Delays:

delay := Delay forMilliseconds: 1.
bag := Bag new.
1000 timesRepeat:[bag add: [delay wait] timeToRun].
bag sortedCounts

on my 4-core box it yields:

- with AdvancedProcessorScheduler install
a SortedCollection(951->2 49->1)

- with Processor fallbackToOldScheduler
a SortedCollection(953->2 47->1)

- with old VM
a SortedCollection(952->2 47->1 1->3)

not much overhead huh? :)

But Delay's code now can be refactored, which will simplify it
seriously, because it could run in interrupt process and take no care
for concurrency issues.

2009/4/30 Igor Stasenko <[hidden email]>:

> i added a new changeset in mantis which deals with callbackEnter: code.
> Unfortunately, a callback handling , to work correctly needs an
> image-side changes as well.
>
> P.S. Having fun to watch how scheduler handling signals, by inspecting
> Process and selecting its 'signals' ivar :)
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

Igor Stasenko wrote:

> A quick comparion for Delays:
>
> delay := Delay forMilliseconds: 1.
> bag := Bag new.
> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
> bag sortedCounts
>
> on my 4-core box it yields:
>
> - with AdvancedProcessorScheduler install
> a SortedCollection(951->2 49->1)
>
> - with Processor fallbackToOldScheduler
> a SortedCollection(953->2 47->1)
>
> - with old VM
> a SortedCollection(952->2 47->1 1->3)
>
> not much overhead huh? :)

That's not exactly the kind of benchmark I was looking for (if your
process scheduler takes milliseconds to do a switch I think we're not
even close to the ballpark ;-) More interesting for comparison is this
(requires closures):

semas := Array new: 10000.
plist := Array new: 10000.
1 to: semas size do:[:i| semas at: i put: Semaphore new].
1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait.
(semas at:i+1) signal] fork].
[semas first signal. semas last wait] timeToRun.

Cheers,
- Andreas

Igor Stasenko

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

2009/4/30 Andreas Raab <[hidden email]>:

> Igor Stasenko wrote:
>>
>> A quick comparion for Delays:
>>
>> delay := Delay forMilliseconds: 1.
>> bag := Bag new.
>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>> bag sortedCounts
>>
>> on my 4-core box it yields:
>>
>> - with AdvancedProcessorScheduler install
>> a SortedCollection(951->2 49->1)
>>
>> - with Processor fallbackToOldScheduler
>> a SortedCollection(953->2 47->1)
>>
>> - with old VM
>> a SortedCollection(952->2 47->1 1->3)
>>
>> not much overhead huh? :)
>
> That's not exactly the kind of benchmark I was looking for (if your process
> scheduler takes milliseconds to do a switch I think we're not even close to
> the ballpark ;-) More interesting for comparison is this (requires
> closures):
>
> semas := Array new: 10000.
> plist := Array new: 10000.
> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas
> at:i+1) signal] fork].
> [semas first signal. semas last wait] timeToRun.
>

I not yet tried to apply changes to VMMaker with closures.
Do you have a recipe, which base image could load latest VMMaker w/o
problems? Is 3.10.2 ok for it?

> Cheers,
> - Andreas
>
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

Igor Stasenko wrote:
> I not yet tried to apply changes to VMMaker with closures.
> Do you have a recipe, which base image could load latest VMMaker w/o
> problems? Is 3.10.2 ok for it?

Yes, 3.10.2 is fine.

Cheers,
- Andreas

Igor Stasenko

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

In reply to this post by Igor Stasenko

Running this test with #fixTemps.

semas := Array new: 10000.
plist := Array new: 10000.
1 to: semas size do:[:i| semas at: i put: Semaphore new].
1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait.
(semas at:i+1) signal] fixTemps fork].
[semas first signal. semas last wait] timeToRun.

New:
63
63
148
65
61

Old:

94
27
92
22
22
23
23

if we not count a spikes which seems just an intrusion of delays, the
difference about 3 times.
Fantastic results (bytecode interpreted scheduling works 3 times
slower than C code) :)

After changing 2 lines of code (to use #interruptWith: , with no fallbacks)

50 125 50 49 48 48 60

so, now its about 2 times difference.

2009/4/30 Igor Stasenko <[hidden email]>:

> 2009/4/30 Andreas Raab <[hidden email]>:
>> Igor Stasenko wrote:
>>>
>>> A quick comparion for Delays:
>>>
>>> delay := Delay forMilliseconds: 1.
>>> bag := Bag new.
>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>>> bag sortedCounts
>>>
>>> on my 4-core box it yields:
>>>
>>> - with AdvancedProcessorScheduler install
>>> a SortedCollection(951->2 49->1)
>>>
>>> - with Processor fallbackToOldScheduler
>>> a SortedCollection(953->2 47->1)
>>>
>>> - with old VM
>>> a SortedCollection(952->2 47->1 1->3)
>>>
>>> not much overhead huh? :)
>>
>> That's not exactly the kind of benchmark I was looking for (if your process
>> scheduler takes milliseconds to do a switch I think we're not even close to
>> the ballpark ;-) More interesting for comparison is this (requires
>> closures):
>>
>> semas := Array new: 10000.
>> plist := Array new: 10000.
>> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas
>> at:i+1) signal] fork].
>> [semas first signal. semas last wait] timeToRun.
>>
>
> I not yet tried to apply changes to VMMaker with closures.
> Do you have a recipe, which base image could load latest VMMaker w/o
> problems? Is 3.10.2 ok for it?
>
>> Cheers,
>> - Andreas
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

Igor Stasenko wrote:
> Running this test with #fixTemps.
> if we not count a spikes which seems just an intrusion of delays, the
> difference about 3 times.
> Fantastic results (bytecode interpreted scheduling works 3 times
> slower than C code) :)

Indeed. That is quite impressive since every single process switch now
requires two process switches (one to the scheduler process and one away
from it). That is very encouraging.

Cheers,
- Andreas

Igor Stasenko

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

2009/4/30 Andreas Raab <[hidden email]>:

> Igor Stasenko wrote:
>>
>> Running this test with #fixTemps.
>> if we not count a spikes which seems just an intrusion of delays, the
>> difference about 3 times.
>> Fantastic results (bytecode interpreted scheduling works 3 times
>> slower than C code) :)
>
> Indeed. That is quite impressive since every single process switch now
> requires two process switches (one to the scheduler process and one away
> from it). That is very encouraging.
>

And keep in mind, there is a space for optimization, if we make these
changes non-backwards compatible, especially wiping a lot of cruft
from VM :)

Ok. i will try apply this to closure VM.

> Cheers,
> - Andreas
>
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

In reply to this post by Andreas.Raab

A better benchmark is attached. It allows adjusting the number of
processes and the total time the benchmark should run and answers the
number of process switches per second. This is our "standard" process
switch benchmark. Run it like here:

100 "processes" benchSwitch: 5. "seconds"

This will switch between 100 processes for 5 seconds.

Cheers,
- Andreas

Andreas Raab wrote:

> Igor Stasenko wrote:
>> A quick comparion for Delays:
>>
>> delay := Delay forMilliseconds: 1.
>> bag := Bag new.
>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>> bag sortedCounts
>>
>> on my 4-core box it yields:
>>
>> - with AdvancedProcessorScheduler install
>> a SortedCollection(951->2 49->1)
>>
>> - with Processor fallbackToOldScheduler
>> a SortedCollection(953->2 47->1)
>>
>> - with old VM
>> a SortedCollection(952->2 47->1 1->3)
>>
>> not much overhead huh? :)
>
> That's not exactly the kind of benchmark I was looking for (if your
> process scheduler takes milliseconds to do a switch I think we're not
> even close to the ballpark ;-) More interesting for comparison is this
> (requires closures):
>
> semas := Array new: 10000.
> plist := Array new: 10000.
> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait.
> (semas at:i+1) signal] fork].
> [semas first signal. semas last wait] timeToRun.
>
> Cheers,
> - Andreas
>
>

BenchSwitch.1.cs (2K) Download Attachment

Igor Stasenko

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

2009/4/30 Andreas Raab <[hidden email]>:
> A better benchmark is attached. It allows adjusting the number of processes
> and the total time the benchmark should run and answers the number of
> process switches per second. This is our "standard" process switch
> benchmark. Run it like here:
>
> 100 "processes" benchSwitch: 5. "seconds"
>

I have built a new VM with closure support (VMMaker.dtl.120).
A changeset can be loaded to VMMaker w/o changes, except changing the
Interpreter class redefinition.
All is needed is to add 3 class vars to Interpreter manually:
ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex

Running squeak-3.10 (no closure compiler).

100 "processes" benchSwitch: 5. "seconds"

Processor fallbackToOldScheduler

'1,156,135 switches/sec'
'1,144,994 switches/sec'
'1,147,744 switches/sec'

AdvancedProcessorScheduler install

'148,067 switches/sec'
'141,211 switches/sec'
'141,692 switches/sec'
'148,650 switches/sec'

(1156/148) ~~ 7.8 ratio

> This will switch between 100 processes for 5 seconds.
>
> Cheers,
> - Andreas
>
> Andreas Raab wrote:
>>
>> Igor Stasenko wrote:
>>>
>>> A quick comparion for Delays:
>>>
>>> delay := Delay forMilliseconds: 1.
>>> bag := Bag new.
>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>>> bag sortedCounts
>>>
>>> on my 4-core box it yields:
>>>
>>> - with AdvancedProcessorScheduler install
>>> a SortedCollection(951->2 49->1)
>>>
>>> - with Processor fallbackToOldScheduler
>>> a SortedCollection(953->2 47->1)
>>>
>>> - with old VM
>>> a SortedCollection(952->2 47->1 1->3)
>>>
>>> not much overhead huh? :)
>>
>> That's not exactly the kind of benchmark I was looking for (if your
>> process scheduler takes milliseconds to do a switch I think we're not even
>> close to the ballpark ;-) More interesting for comparison is this (requires
>> closures):
>>
>> semas := Array new: 10000.
>> plist := Array new: 10000.
>> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait. (semas
>> at:i+1) signal] fork].
>> [semas first signal. semas last wait] timeToRun.
>>
>> Cheers,
>> - Andreas
>>
>>
>
>
>
>
>

--
Best regards,
Igor Stasenko AKA sig.

Chun, Sungjin

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

When this will be in the official VMs? And to use or benefit this
improvement should
previous code (especially Seaside, I'm using it) be modified?

Thanks

On May 1, 2009, at 5:11 AM, Igor Stasenko wrote:

> 2009/4/30 Andreas Raab <[hidden email]>:
>> A better benchmark is attached. It allows adjusting the number of
>> processes
>> and the total time the benchmark should run and answers the number of
>> process switches per second. This is our "standard" process switch
>> benchmark. Run it like here:
>>
>> 100 "processes" benchSwitch: 5. "seconds"
>>
>
> I have built a new VM with closure support (VMMaker.dtl.120).
> A changeset can be loaded to VMMaker w/o changes, except changing the
> Interpreter class redefinition.
> All is needed is to add 3 class vars to Interpreter manually:
> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex
>
> Running squeak-3.10 (no closure compiler).
>
> 100 "processes" benchSwitch: 5. "seconds"
>
> Processor fallbackToOldScheduler
>
> '1,156,135 switches/sec'
> '1,144,994 switches/sec'
> '1,147,744 switches/sec'
>
> AdvancedProcessorScheduler install
>
> '148,067 switches/sec'
> '141,211 switches/sec'
> '141,692 switches/sec'
> '148,650 switches/sec'
>
> (1156/148) ~~ 7.8 ratio
>
>
>
>> This will switch between 100 processes for 5 seconds.
>>
>> Cheers,
>> - Andreas
>>
>> Andreas Raab wrote:
>>>
>>> Igor Stasenko wrote:
>>>>
>>>> A quick comparion for Delays:
>>>>
>>>> delay := Delay forMilliseconds: 1.
>>>> bag := Bag new.
>>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>>>> bag sortedCounts
>>>>
>>>> on my 4-core box it yields:
>>>>
>>>> - with AdvancedProcessorScheduler install
>>>> a SortedCollection(951->2 49->1)
>>>>
>>>> - with Processor fallbackToOldScheduler
>>>> a SortedCollection(953->2 47->1)
>>>>
>>>> - with old VM
>>>> a SortedCollection(952->2 47->1 1->3)
>>>>
>>>> not much overhead huh? :)
>>>
>>> That's not exactly the kind of benchmark I was looking for (if your
>>> process scheduler takes milliseconds to do a switch I think we're
>>> not even
>>> close to the ballpark ;-) More interesting for comparison is this
>>> (requires
>>> closures):
>>>
>>> semas := Array new: 10000.
>>> plist := Array new: 10000.
>>> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
>>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait.
>>> (semas
>>> at:i+1) signal] fork].
>>> [semas first signal. semas last wait] timeToRun.
>>>
>>> Cheers,
>>> - Andreas
>>>
>>>
>>
>>
>>
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Joshua Gargus-2

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

Chun, Sungjin wrote:
> When this will be in the official VMs? And to use or benefit this
> improvement should
> previous code (especially Seaside, I'm using it) be modified?

I think that you read the results wrong... the new code runs the
benchmark about 8x slower than the existing code.

Cheers,
Josh

>
> Thanks
>
>
> On May 1, 2009, at 5:11 AM, Igor Stasenko wrote:
>
>> 2009/4/30 Andreas Raab <[hidden email]>:
>>> A better benchmark is attached. It allows adjusting the number of
>>> processes
>>> and the total time the benchmark should run and answers the number of
>>> process switches per second. This is our "standard" process switch
>>> benchmark. Run it like here:
>>>
>>> 100 "processes" benchSwitch: 5. "seconds"
>>>
>>
>> I have built a new VM with closure support (VMMaker.dtl.120).
>> A changeset can be loaded to VMMaker w/o changes, except changing the
>> Interpreter class redefinition.
>> All is needed is to add 3 class vars to Interpreter manually:
>> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex
>>
>> Running squeak-3.10 (no closure compiler).
>>
>> 100 "processes" benchSwitch: 5. "seconds"
>>
>> Processor fallbackToOldScheduler
>>
>> '1,156,135 switches/sec'
>> '1,144,994 switches/sec'
>> '1,147,744 switches/sec'
>>
>> AdvancedProcessorScheduler install
>>
>> '148,067 switches/sec'
>> '141,211 switches/sec'
>> '141,692 switches/sec'
>> '148,650 switches/sec'
>>
>> (1156/148) ~~ 7.8 ratio
>>
>>
>>
>>> This will switch between 100 processes for 5 seconds.
>>>
>>> Cheers,
>>> - Andreas
>>>
>>> Andreas Raab wrote:
>>>>
>>>> Igor Stasenko wrote:
>>>>>
>>>>> A quick comparion for Delays:
>>>>>
>>>>> delay := Delay forMilliseconds: 1.
>>>>> bag := Bag new.
>>>>> 1000 timesRepeat:[bag add: [delay wait] timeToRun].
>>>>> bag sortedCounts
>>>>>
>>>>> on my 4-core box it yields:
>>>>>
>>>>> - with AdvancedProcessorScheduler install
>>>>> a SortedCollection(951->2 49->1)
>>>>>
>>>>> - with Processor fallbackToOldScheduler
>>>>> a SortedCollection(953->2 47->1)
>>>>>
>>>>> - with old VM
>>>>> a SortedCollection(952->2 47->1 1->3)
>>>>>
>>>>> not much overhead huh? :)
>>>>
>>>> That's not exactly the kind of benchmark I was looking for (if your
>>>> process scheduler takes milliseconds to do a switch I think we're
>>>> not even
>>>> close to the ballpark ;-) More interesting for comparison is this
>>>> (requires
>>>> closures):
>>>>
>>>> semas := Array new: 10000.
>>>> plist := Array new: 10000.
>>>> 1 to: semas size do:[:i| semas at: i put: Semaphore new].
>>>> 1 to: plist size-1 do:[:i| plist at: i put: [(semas at: i) wait.
>>>> (semas
>>>> at:i+1) signal] fork].
>>>> [semas first signal. semas last wait] timeToRun.
>>>>
>>>> Cheers,
>>>> - Andreas
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>>
>
>

Andreas.Raab

[squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

In reply to this post by Igor Stasenko

Igor Stasenko wrote:
> I have built a new VM with closure support (VMMaker.dtl.120).
> A changeset can be loaded to VMMaker w/o changes, except changing the
> Interpreter class redefinition.
> All is needed is to add 3 class vars to Interpreter manually:
> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex

Hm ... I compiled a VM with your changes but I can't recreate your
results. My results are more in the range of 15:1 to 20:1 in a
before-after comparison. Are you using the code from
http://bugs.squeak.org/view.php?id=7345 or something more elaborate by now?

Cheers,
- Andreas

Igor Stasenko

Re: [squeak-dev] Re: [ANN] A new scheduler + VM changes alpha-release

2009/5/1 Andreas Raab <[hidden email]>:

> Igor Stasenko wrote:
>>
>> I have built a new VM with closure support (VMMaker.dtl.120).
>> A changeset can be loaded to VMMaker w/o changes, except changing the
>> Interpreter class redefinition.
>> All is needed is to add 3 class vars to Interpreter manually:
>> ProcessActionIndex InterruptedProcessIndex InterruptProcessIndex
>
> Hm ... I compiled a VM with your changes but I can't recreate your results.
> My results are more in the range of 15:1 to 20:1 in a before-after
> comparison. Are you using the code from
> http://bugs.squeak.org/view.php?id=7345 or something more elaborate by now?
>

Well, i made some shortcuts, like removing fallback handler and it
improved a bit.
I checked everything thoughoutly, and since performance degrades
linearly with each introduced message send,
i don't think its because of bogus code.
The overhead is because we using interpreter to switch processes:
-removing process from a list
- adding process to a list
etc etc.

Of course, i could make an additional prims to speed things up for a
little, like doing
LinkedList>>removeFirst:
LinkedList>>addLast:
primitively.
But i wouldn't care about it now. I want to explore the real benefits
of having scheduling logic at language side - like reimplementing
Delays, adding nicer process termination procedure (see my other
thread) etc.

> Cheers,
> - Andreas
>
>

--
Best regards,
Igor Stasenko AKA sig.