Smalltalk › Squeak › Squeak - Dev

ProcessTest>>testSchedulingIsFirstComeFirstServed

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

Ben Coman

ProcessTest>>testSchedulingIsFirstComeFirstServed

My general understanding of process scheduling is that it is FIFO
within same priority. Indeed Pharo has a test for this
ProcessTest>>testSchedulingIsFirstComeFirstServed, which however I
thought was a bit fragile and I was looking to revise.

But now I'm curious whether same-priority-process FIFO scheduling is
an expected guarantee?
On both Pharo 5 build 50560 & Squeak 5 build 15113 the following test fails...

[ 1 to: 100000 do: [ :n |
| ranFirst ranSecond |
[ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
:= 1] ] forkAt: 78.
[ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
:= 2] ] forkAt: 78.
] forkAt: 79.
self assert: ranFirst=1.
self assert: ranSecond=2. ]
] fork.

... but only after 20k iterations, which seems an "almost" guarantee.
Interestingly, it doesn't fail when run without the outer fork.
Also if I take an alternate approach with less message calls...

faults := OrderedCollection new.
done := Semaphore new.
[ 1 to: 10000000 do: [ :n | | ranFirst ranSecond |
[ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
:= 1] ] forkAt: 78.
[ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
:= 2] ] forkAt: 78.
] forkAt: 79.
(ranFirst=1 and: ranSecond=2) ifFalse: [ faults add: { n .
ranFirst . ranSecond } ].
].
done signal.
] fork.
done wait.
faults inspect.

I only get seven failures in 10 million iterations.
"an OrderedCollection(
#( 258535 2 1)
#(1148605 2 1)
#(3502820 2 1)
#(4010935 2 1)
#(4533713 2 1)
#(6301878 2 1)
#(8497001 2 1))"

So what I'm wondering is if there is a bug to hunt down, or if FIFO
scheduling is simply not an expected guarantee?

cheers -ben

Ben Coman

Re: ProcessTest>>testSchedulingIsFirstComeFirstServed

On Sun, Jan 31, 2016 at 6:16 PM, Ben Coman <[hidden email]> wrote:

> My general understanding of process scheduling is that it is FIFO
> within same priority. Indeed Pharo has a test for this
> ProcessTest>>testSchedulingIsFirstComeFirstServed, which however I
> thought was a bit fragile and I was looking to revise.
>
> But now I'm curious whether same-priority-process FIFO scheduling is
> an expected guarantee?
> On both Pharo 5 build 50560 & Squeak 5 build 15113 the following test fails...
>
> [ 1 to: 100000 do: [ :n |
> | ranFirst ranSecond |
> [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
> := 1] ] forkAt: 78.
> [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
> := 2] ] forkAt: 78.
> ] forkAt: 79.
> self assert: ranFirst=1.
> self assert: ranSecond=2. ]
> ] fork.
>
> ... but only after 20k iterations, which seems an "almost" guarantee.
> Interestingly, it doesn't fail when run without the outer fork.
> Also if I take an alternate approach with less message calls...
>
> faults := OrderedCollection new.
> done := Semaphore new.
> [ 1 to: 10000000 do: [ :n | | ranFirst ranSecond |
> [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
> := 1] ] forkAt: 78.
> [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
> := 2] ] forkAt: 78.
> ] forkAt: 79.
> (ranFirst=1 and: ranSecond=2) ifFalse: [ faults add: { n .
> ranFirst . ranSecond } ].
> ].
> done signal.
> ] fork.
> done wait.
> faults inspect.
>
> I only get seven failures in 10 million iterations.
> "an OrderedCollection(
> #( 258535 2 1)
> #(1148605 2 1)
> #(3502820 2 1)
> #(4010935 2 1)
> #(4533713 2 1)
> #(6301878 2 1)
> #(8497001 2 1))"
>
> So what I'm wondering is if there is a bug to hunt down, or if FIFO
> scheduling is simply not an expected guarantee?
>
> cheers -ben

A while ago in Pharo I added DelayNullScheduler, which stops the
timingPriority(=80) process that schedules delays. After selecting it
via World > System > Settings > System > Delay Scheduler I cannot
reproduce any of the faults(?) above. (Aside: I just found
performance a bit better if DelayNullScheduler>>schedule: first does a
"Processor yield")

With the usual delay scheduler operating, the fault(?) can be bypassed
by signalling the timingSemaphore...

timingSemaphore := Delay testCaseSupportTimingSemaphore.
[ 1 to: 100000 do: [ :n |
| ranFirst ranSecond |
[ [ ranFirst ifNil: [ ranFirst := 1 ]
ifNotNil: [ ranSecond := 1] ] forkAt: 78.
[ ranFirst ifNil: [ ranFirst := 2 ]
ifNotNil: [ ranSecond := 2] ] forkAt: 78.
] forkAt: 79.
self assert: ranFirst=1.
self assert: ranSecond=2. ]
timingSemaphore signal.
] fork.

The fault(?) can also be bypassed by commenting the call to #intercyclePause:
out of WorldState>>doOneCycleFor:

I'd be glad for any insights.

cheers -ben