[squeak-dev] Process bug introduced in 3.10

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Process bug introduced in 3.10

Bryce Kampjes

Run the following in a 3.10 image:
   100 timesRepeat:
        [processes := (1 to: 10) collect: [:each| [[(MessageTally spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
        (Delay forSeconds: 1) wait.
        processes do: [:each| each terminate].]

It locks up the image, debuggers will start popping up at some stage.
This works on 3.9. I've run the code above in a fresh 3.10 image and
reproduced the problem.

I'll try to find time to investigate further. It's probably the final
showstopper to me upgrading to 3.10. The code above is reworked from
on of Exupery's SUnits which locks up the image intermittently.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Igor Stasenko
2008/4/27  <[hidden email]>:

>
>  Run the following in a 3.10 image:
>    100 timesRepeat:
>         [processes := (1 to: 10) collect: [:each| [[(MessageTally spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
>         (Delay forSeconds: 1) wait.
>         processes do: [:each| each terminate].]
>
>  It locks up the image, debuggers will start popping up at some stage.
>  This works on 3.9. I've run the code above in a fresh 3.10 image and
>  reproduced the problem.
>
>  I'll try to find time to investigate further. It's probably the final
>  showstopper to me upgrading to 3.10. The code above is reworked from
>  on of Exupery's SUnits which locks up the image intermittently.
>

AFAIK, since 3.9 there was some fixes in Delay & Semaphores.
Its hard to say, if its a new bug in Delay or Process & scheduler or
this is remnants of code (in MessageTally) which was not fixed and
relies on old behavior.

As for me, its no wonder, that given code refuses to work correctly.
The implementation of Processes and all what is around them is very
brittle, and given stress test exposing this. As you know there are
different hacks & workarounds to keep things running without modifying
VM.
Thanks for giving another reason why squeak threading model requires
serious review. :)

>  Bryce
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Igor Stasenko
I removed the MessageTally  and added outer fork , just to not lock ui
when test running. Everything works fine.

| processes |
[
100 timesRepeat:
       [processes := (1 to: 10) collect: [:each|
                [ [ ((Delay forMilliseconds: 500) wait) ] repeat ] fork ].
       (Delay forSeconds: 1) wait.
       processes do: [:each| each terminate].]

] fork

In your test, a MessageTally will pop up a zeons windows, do you
really think its needed to be placed in test?
I suspect that it can work fine, but because of too many morphs
created it either run out of memory or spends too much time trying to
redraw screen. Also, since morphic was not designed with concurrency
in mind, adding morphs to world from different processes may lead to
race and damage different containers it using.


2008/4/27 Igor Stasenko <[hidden email]>:

> 2008/4/27  <[hidden email]>:
>
>
> >
>  >  Run the following in a 3.10 image:
>  >    100 timesRepeat:
>  >         [processes := (1 to: 10) collect: [:each| [[(MessageTally spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
>  >         (Delay forSeconds: 1) wait.
>  >         processes do: [:each| each terminate].]
>  >
>  >  It locks up the image, debuggers will start popping up at some stage.
>  >  This works on 3.9. I've run the code above in a fresh 3.10 image and
>  >  reproduced the problem.
>  >
>  >  I'll try to find time to investigate further. It's probably the final
>  >  showstopper to me upgrading to 3.10. The code above is reworked from
>  >  on of Exupery's SUnits which locks up the image intermittently.
>  >
>
>  AFAIK, since 3.9 there was some fixes in Delay & Semaphores.
>  Its hard to say, if its a new bug in Delay or Process & scheduler or
>  this is remnants of code (in MessageTally) which was not fixed and
>  relies on old behavior.
>
>  As for me, its no wonder, that given code refuses to work correctly.
>  The implementation of Processes and all what is around them is very
>  brittle, and given stress test exposing this. As you know there are
>  different hacks & workarounds to keep things running without modifying
>  VM.
>  Thanks for giving another reason why squeak threading model requires
>  serious review. :)
>
>  >  Bryce
>  >
>  >
>
>
>
>  --
>  Best regards,
>  Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Igor Stasenko
To confirm my point of view, see Morph>>privateAddMorph:atIndex:

It manipulates a submorphs collection without guarding it with semaphore.
It really likely that tight loops running in multiple processes may
damage collection, especially when you using Delays which enables
scheduler to switch processes at any point.
Also, i'm sure this is not the only place, where breakage can occur.
Morphic is too complex to say something definitely.

To summarize: manipulating morphs using concurrent code is a big no-no :)

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Karl-19
Igor Stasenko wrote:

> To confirm my point of view, see Morph>>privateAddMorph:atIndex:
>
> It manipulates a submorphs collection without guarding it with semaphore.
> It really likely that tight loops running in multiple processes may
> damage collection, especially when you using Delays which enables
> scheduler to switch processes at any point.
> Also, i'm sure this is not the only place, where breakage can occur.
> Morphic is too complex to say something definitely.
>
> To summarize: manipulating morphs using concurrent code is a big no-no :)
>
>  
Shouldn't other processes add to the morphic process using WorldState
addDeferredUIMessage:  ?


Karl

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Bryce Kampjes
In reply to this post by Igor Stasenko
Igor Stasenko writes:
 > To confirm my point of view, see Morph>>privateAddMorph:atIndex:
 >
 > It manipulates a submorphs collection without guarding it with semaphore.
 > It really likely that tight loops running in multiple processes may
 > damage collection, especially when you using Delays which enables
 > scheduler to switch processes at any point.
 > Also, i'm sure this is not the only place, where breakage can occur.
 > Morphic is too complex to say something definitely.
 >
 > To summarize: manipulating morphs using concurrent code is a big no-no :)

The real test doesn't touch morphic. That was just a recreation that
seemed to reproduce the problem without relying on any Exupery code.
The actual test is using ExuperyProfiler which writes results to a
SharedQueue.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Bryce Kampjes
In reply to this post by Igor Stasenko
Igor Stasenko writes:
 > I removed the MessageTally  and added outer fork , just to not lock ui
 > when test running. Everything works fine.
 >
 > | processes |
 > [
 > 100 timesRepeat:
 >        [processes := (1 to: 10) collect: [:each|
 > [ [ ((Delay forMilliseconds: 500) wait) ] repeat ] fork ].
 >        (Delay forSeconds: 1) wait.
 >        processes do: [:each| each terminate].]
 >
 > ] fork
 >
 > In your test, a MessageTally will pop up a zeons windows, do you
 > really think its needed to be placed in test?
 > I suspect that it can work fine, but because of too many morphs
 > created it either run out of memory or spends too much time trying to
 > redraw screen. Also, since morphic was not designed with concurrency
 > in mind, adding morphs to world from different processes may lead to
 > race and damage different containers it using.

Thanks Igor, here's a bit more background and a reworked test. I
didn't realise that I was using Morphic in multiple threads in the
3.10 "clean" version of the test.

Here's the real test I started from:
   testCompilingTheCompilerSemaphoreBug
        | processes  |
        Exupery
                compileMethod: #critical:
                class: Semaphore.
       
        processes := (1 to: 10) collect: [:each| [[(ExuperyProfiler spyOn: [(Delay forMilliseconds: 500) wait]) queueCompilationCommandsOn: SharedQueue2 new] repeat] fork ].
        (Delay forSeconds: 1) wait.
        processes do: [:each| each terminate].

Yes, it does need the profilers, they are there to generate plenty of
process switches to trigger a bug.

My next attempt at formulating a version of the same bug that'll
run in a vanilla 3.10 image is:

   100 timesRepeat:
        [processes := (1 to: 10) collect: [:each| [[(MessageTally new spyEvery: 1 on: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
        (Delay forSeconds: 1) wait.
        processes do: [:each| each terminate].]

Debugging in an image with Exupery loaded (to provide logging to a
file only) it's locking up in "processes do: [:each| each terminate]"

It look's like the VM is still alive from poking around with gdb but
that the UI process is dead. In lockup printAllStacks is showing 87
profiling processes. The only other processes running are the idle
process and the finalisation process.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Process bug introduced in 3.10

Igor Stasenko
In reply to this post by Karl-19
2008/4/27 karl <[hidden email]>:

>
> Igor Stasenko wrote:
>
> > To confirm my point of view, see Morph>>privateAddMorph:atIndex:
> >
> > It manipulates a submorphs collection without guarding it with semaphore.
> > It really likely that tight loops running in multiple processes may
> > damage collection, especially when you using Delays which enables
> > scheduler to switch processes at any point.
> > Also, i'm sure this is not the only place, where breakage can occur.
> > Morphic is too complex to say something definitely.
> >
> > To summarize: manipulating morphs using concurrent code is a big no-no :)
> >
> >
> >
>  Shouldn't other processes add to the morphic process using WorldState
> addDeferredUIMessage:  ?
>
>
Maybe it should, but it not for MessageTally. Here the stack dump (i
put a break in #openInWorld)

SystemWindow(Object)>>break
SystemWindow(Morph)>>openInWorld
StringHolder>>openAsMorphLabel:
StringHolder>>openLabel:andTerminate:
StringHolder>>openLabel:
MessageTally class>>spyOn:

>  Karl
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Andreas.Raab
In reply to this post by Bryce Kampjes
[hidden email] wrote:
> My next attempt at formulating a version of the same bug that'll
> run in a vanilla 3.10 image is:
>
>    100 timesRepeat:
> [processes := (1 to: 10) collect: [:each| [[(MessageTally new spyEvery: 1 on: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
> (Delay forSeconds: 1) wait.
> processes do: [:each| each terminate].]

Running MessageTallies concurrently is a big, big no-no. Check out class
MessageTally and in particular its (shared) class variables
ObservedProcess and Timer.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Bryce Kampjes
Andreas Raab writes:
 > [hidden email] wrote:
 > > My next attempt at formulating a version of the same bug that'll
 > > run in a vanilla 3.10 image is:
 > >
 > >    100 timesRepeat:
 > > [processes := (1 to: 10) collect: [:each| [[(MessageTally new spyEvery: 1 on: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
 > > (Delay forSeconds: 1) wait.
 > > processes do: [:each| each terminate].]
 >
 > Running MessageTallies concurrently is a big, big no-no. Check out class
 > MessageTally and in particular its (shared) class variables
 > ObservedProcess and Timer.

Good point, but I can reproduce the bug without using MessageTally
or any class variables. ExuperyProfiler doesn't use class variables.

The following will reproduce it:
   TestProfiler>>spyOn: aBlock
        | myDelay millisecs timer |
        millisecs := 1.
        (aBlock isMemberOf: BlockContext) ifFalse:
            [self error: 'spy needs a block here'].
        myDelay := Delay forMilliseconds: millisecs.
        timer := [[true] whileTrue:
                                        [myDelay wait].
                         nil] newProcess.
        timer priority: Processor userInterruptPriority.
        "activate the probe and evaluate the block"
        timer resume.
        aBlock ensure: [timer terminate].
        ^ self

  100 timesRepeat:
        [processes := (1 to: 10) collect: [:each| [[(TestProfiler new spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
        (Delay forSeconds: 1) wait.
        processes do: [:each| each terminate]]

TestProfiler is a new subclass of Object without any variables or
any other methods.

I'm not convinced this is a new bug with 3.10. It feels similar to
something I've encountered earlier. It is happening frequently in
3.10 but not in 3.9.

Bryce

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Andreas.Raab
[hidden email] wrote:
> I'm not convinced this is a new bug with 3.10. It feels similar to
> something I've encountered earlier. It is happening frequently in
> 3.10 but not in 3.9.

Can you at least describe the kind of problem you are seeing? It seems
to me that we're all just fishing in the dark because you're just saying
"there is a problem somewhere" but nobody knows what they're looking for
so Igors and my posts are all perfectly applicable to what we observe
but may have nothing to do with what you're looking for.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Process bug introduced in 3.10

Igor Stasenko
In reply to this post by Bryce Kampjes
2008/4/27  <[hidden email]>:

>
>  Good point, but I can reproduce the bug without using MessageTally
>  or any class variables. ExuperyProfiler doesn't use class variables.
>
>  The following will reproduce it:
>    TestProfiler>>spyOn: aBlock
>         | myDelay millisecs timer |
>         millisecs := 1.
>         (aBlock isMemberOf: BlockContext) ifFalse:
>             [self error: 'spy needs a block here'].
>         myDelay := Delay forMilliseconds: millisecs.
>         timer := [[true] whileTrue:
>                                         [myDelay wait].
>                          nil] newProcess.
>         timer priority: Processor userInterruptPriority.
>         "activate the probe and evaluate the block"
>         timer resume.
>         aBlock ensure: [timer terminate].
>         ^ self
>

Ok, here we see a scheduler stress test. Given that infinite loop
running at higher priority, and you having 10 concurrent loops, there
is a good chance that processes with lower priorities will starve
forever (never gain control to evaluate anything).

Lets analyze why it can be. The Delay interrupt watcher uses single
semaphore and single primitive to tell VM when to signal it.
So, whenever this semaphore get signaled, a #timerInterruptWatcher
gains control, takes active delay and signals delay's semaphore.
Then it takes another delay and if its not yet elapsed, waits for it
.. and so on.

So, what we having :
- 10 processes with highest priority, each waiting for 1 msec delay
-  1 process with highest priority (a #timerInterruptWatcher)
- 10 processes with user priority each waiting 500 msec delay.

Squeak's scheduler never passing control to lower priority process if
there are higher priority process waiting to be executed, right?
To have a chance for process at user priority to be executed, we
should have 10+1 processes waiting for semaphore at some point of
time, which is very unlikely (consider time, which interpreter need to
spend to switch processes and reschedule 10 delays - it may be more
than 1 msec), so this leads to an infinite loop which bounces between
11 topmost priority processes.

In ideal environment, where all processes running in parallel, your
code SHOULD work. But we not in ideal environment, and squeak's
scheduler design having own limits in this regard :)

>   100 timesRepeat:
>         [processes := (1 to: 10) collect: [:each| [[(TestProfiler new spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
>         (Delay forSeconds: 1) wait.
>         processes do: [:each| each terminate]]
>
>  TestProfiler is a new subclass of Object without any variables or
>  any other methods.
>
>  I'm not convinced this is a new bug with 3.10. It feels similar to
>  something I've encountered earlier. It is happening frequently in
>  3.10 but not in 3.9.
>
>  Bryce
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Process bug introduced in 3.10

stephane ducasse
In reply to this post by Bryce Kampjes
Bryce
which version of 3.9?
The latest I created with the fixes of andreas? 3.9.1
or the one before, because I have the impression that the latest 3.9.1  
has the same fixes than
3.10. So may be there is a problem with the fixes I harvested too.
Can you check? I'm travelling right now.

Stef

On Apr 27, 2008, at 9:19 PM, <[hidden email]> <[hidden email]
 > wrote:

> Andreas Raab writes:
>> [hidden email] wrote:
>>> My next attempt at formulating a version of the same bug that'll
>>> run in a vanilla 3.10 image is:
>>>
>>>   100 timesRepeat:
>>> [processes := (1 to: 10) collect: [:each| [[(MessageTally new  
>>> spyEvery: 1 on: [(Delay forMilliseconds: 500) wait]) ] repeat]  
>>> fork ].
>>> (Delay forSeconds: 1) wait.
>>> processes do: [:each| each terminate].]
>>
>> Running MessageTallies concurrently is a big, big no-no. Check out  
>> class
>> MessageTally and in particular its (shared) class variables
>> ObservedProcess and Timer.
>
> Good point, but I can reproduce the bug without using MessageTally
> or any class variables. ExuperyProfiler doesn't use class variables.
>
> The following will reproduce it:
>   TestProfiler>>spyOn: aBlock
> | myDelay millisecs timer |
> millisecs := 1.
> (aBlock isMemberOf: BlockContext) ifFalse:
>    [self error: 'spy needs a block here'].
> myDelay := Delay forMilliseconds: millisecs.
> timer := [[true] whileTrue:
> [myDelay wait].
> nil] newProcess.
> timer priority: Processor userInterruptPriority.
> "activate the probe and evaluate the block"
> timer resume.
> aBlock ensure: [timer terminate].
> ^ self
>
>  100 timesRepeat:
> [processes := (1 to: 10) collect: [:each| [[(TestProfiler new  
> spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
> (Delay forSeconds: 1) wait.
> processes do: [:each| each terminate]]
>
> TestProfiler is a new subclass of Object without any variables or
> any other methods.
>
> I'm not convinced this is a new bug with 3.10. It feels similar to
> something I've encountered earlier. It is happening frequently in
> 3.10 but not in 3.9.
>
> Bryce
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Process bug introduced in 3.10

Bryce Kampjes
stephane ducasse writes:
 > Bryce
 > which version of 3.9?
 > The latest I created with the fixes of andreas? 3.9.1
 > or the one before, because I have the impression that the latest 3.9.1  
 > has the same fixes than
 > 3.10. So may be there is a problem with the fixes I harvested too.
 > Can you check? I'm travelling right now.

The original 3.9 version. But some other problems have gotten better
so the fixes are probably a net improvement even if they are involved
with my lock-ups which hasn't yet been proven. I'd wait until these
problems are fixed before doing anything, at this stage we don't know
what's causing them. It's easy to reproduce, so should be debuggable.

One thing that's gotten better is the tests now clear up all Exupery's
processes which they start. Profiling processes used to build up in
the image slowly in 3.9. I assumed that was due to a failing in one of
my tests, but given it's not happening in 3.10 it may either be due to
this test, or be fixed by Andreas's fixes.

Bryce

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Bryce Kampjes
In reply to this post by Andreas.Raab
Andreas Raab writes:
 > [hidden email] wrote:
 > > I'm not convinced this is a new bug with 3.10. It feels similar to
 > > something I've encountered earlier. It is happening frequently in
 > > 3.10 but not in 3.9.
 >
 > Can you at least describe the kind of problem you are seeing? It seems
 > to me that we're all just fishing in the dark because you're just saying
 > "there is a problem somewhere" but nobody knows what they're looking for
 > so Igors and my posts are all perfectly applicable to what we observe
 > but may have nothing to do with what you're looking for.

Squeak stops responding, Alt-. will not recover. It's consuming
100% of the CPU but is spending a decent amount of time in the
idle loop. Alt-. may occasionally pop up a debugger but normally
the image becomes completely unresponsive. printAllStacks has
shown over 80 profiling threads running which indicates that
they are not getting terminated.

Success for that test is not crashing.

Consuming 100% of a CPU's time may be due to a single profiling
thread. Even though the tread consumes very little CPU, it consumes a
little in each tick which leads to misleading time reporting.

It locks up while running the line:
   processes do: [:each| each terminate].

This can be shown either by using Exupery's logging (which uses a
primitive to serialise and write to the file), or a simple
Transcript>>show: which is executing in the controlling UI thread,
not the forked threads.

It may be helpful if someone else tries to reproduce give the last
version using TestProfiler.

The original bug was Exupery's test suite started occasionally locking
up the image. The bug reproduces after commenting out Exupery's
compilation. I then tried to reproduce in a vanilla 3.10 image, and
messed up, which you and Igor caught. The TestProfiler reproduction is
Exupery's test with the tallying code and compilation removed to
reduce the size of the test.The "100 timesRepeat:" is to ensure that
the lock-up happens, the actual test is only run once but doesn't lock
up every time.

The combination of an image locking up while consuming 100% of the CPU
while in the idle loop is something that I've seen before but not
recently. If I then demonstrated that the bug was not in Exupery, I'd
have deleted the test. Now, Exupery is much more reliable, and the
test has been running regularly since September 2007. The reason
to mention that I may have seen this before is because the bug may
not be new to 3.10, it may just be triggered differently.

I'm going to keep digging to find out what's going on.

Bryce

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Andreas.Raab
[hidden email] wrote:
> Squeak stops responding, Alt-. will not recover. It's consuming
> 100% of the CPU but is spending a decent amount of time in the
> idle loop. Alt-. may occasionally pop up a debugger but normally
> the image becomes completely unresponsive. printAllStacks has
> shown over 80 profiling threads running which indicates that
> they are not getting terminated.

Thanks. I think the problem is the simulation of nested unwind blocks.
Here is an example for the kind of problem:

p1 := [
        (Delay forMilliseconds: 1000) wait.
] forkAt: Processor activePriority+1.
p2 := [ | timout |
        timout := Time millisecondClockValue+1000.
        [[timeout < Time millisecondClockValue] whileTrue]
                ensure:[p1 terminate].
] forkAt: Processor activePriority-1.
(Delay forMilliseconds: 500) wait.
p2 terminate.

For some reason this explodes with a nil DNU: < but I think the root
cause is the (nested) simulation of unwind handling. And to make things
more interesting, for some inexplicable reason the simpler variant works
fine:

p1 := [(Delay forMilliseconds: 1000) wait]
        forkAt: Processor activePriority+1.
p2 := [[[(Delay forMilliseconds: 1000) wait] whileTrue]
        ensure:[p1 terminate]
] forkAt: Processor activePriority-1.
(Delay forMilliseconds: 500) wait.
p2 terminate.

I'm pretty certain this is what you're seeing. The reason it would show
in 3.10 and not in 3.9 is that in 3.10 Delay>>wait has an unwind protect
block to ensure that ridiculously long delays that are terminated get
taken out of the timer queue.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Andreas.Raab
Andreas Raab wrote:

> p1 := [
>     (Delay forMilliseconds: 1000) wait.
> ] forkAt: Processor activePriority+1.
> p2 := [ | timout |
>     timout := Time millisecondClockValue+1000.
>     [[timeout < Time millisecondClockValue] whileTrue]
>         ensure:[p1 terminate].
> ] forkAt: Processor activePriority-1.
> (Delay forMilliseconds: 500) wait.
> p2 terminate.
>
> For some reason this explodes with a nil DNU: < but I think the root
> cause is the (nested) simulation of unwind handling.

Of course, if I'd ever learn how to spell "timeout" correctly, I might
find out that that's not the problem. Back to the drawing board... (but
I'm still not convinced that the problem isn't somewhere in the nested
unwind-protect stuff).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: [squeak-dev] Re: Process bug introduced in 3.10

Sebastian Sastre-2
In reply to this post by Bryce Kampjes
Hi Bryce,

        maybe if you use some #yield they have the chance to terminate before
creating new ones restoring balance?

        cheers,

Sebastian

 

> -----Mensaje original-----
> De: [hidden email]
> [mailto:[hidden email]] En
> nombre de [hidden email]
> Enviado el: Domingo, 27 de Abril de 2008 16:19
> Para: The general-purpose Squeak developers list
> Asunto: [squeak-dev] Re: Process bug introduced in 3.10
>
> Andreas Raab writes:
>  > [hidden email] wrote:
>  > > My next attempt at formulating a version of the same bug that'll
>  > > run in a vanilla 3.10 image is:
>  > >
>  > >    100 timesRepeat:
>  > > [processes := (1 to: 10) collect: [:each|
> [[(MessageTally new spyEvery: 1 on: [(Delay forMilliseconds:
> 500) wait]) ] repeat] fork ].
>  > > (Delay forSeconds: 1) wait.
>  > > processes do: [:each| each terminate].]
>  >
>  > Running MessageTallies concurrently is a big, big no-no.
> Check out class
>  > MessageTally and in particular its (shared) class variables
>  > ObservedProcess and Timer.
>
> Good point, but I can reproduce the bug without using MessageTally
> or any class variables. ExuperyProfiler doesn't use class variables.
>
> The following will reproduce it:
>    TestProfiler>>spyOn: aBlock
> | myDelay millisecs timer |
> millisecs := 1.
> (aBlock isMemberOf: BlockContext) ifFalse:
>    [self error: 'spy needs a block here'].
> myDelay := Delay forMilliseconds: millisecs.
> timer := [[true] whileTrue:
> [myDelay wait].
> nil] newProcess.
> timer priority: Processor userInterruptPriority.
> "activate the probe and evaluate the block"
> timer resume.
> aBlock ensure: [timer terminate].
> ^ self
>
>   100 timesRepeat:
> [processes := (1 to: 10) collect: [:each|
> [[(TestProfiler new spyOn: [(Delay forMilliseconds: 500)
> wait]) ] repeat] fork ].
> (Delay forSeconds: 1) wait.
> processes do: [:each| each terminate]]
>
> TestProfiler is a new subclass of Object without any variables or
> any other methods.
>
> I'm not convinced this is a new bug with 3.10. It feels similar to
> something I've encountered earlier. It is happening frequently in
> 3.10 but not in 3.9.
>
> Bryce
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Bryce Kampjes
In reply to this post by Andreas.Raab
Andreas Raab writes:
 > Of course, if I'd ever learn how to spell "timeout" correctly, I might
 > find out that that's not the problem. Back to the drawing board... (but
 > I'm still not convinced that the problem isn't somewhere in the nested
 > unwind-protect stuff).

The first version crashes reliably during the first few iterations but
the second runs to completion.  This indicates to me that it's a race
inside terminate, but this isn't sufficient proof. All that's changed
between the two expressions is the timeout of the TestProfiler delay.
In the first expression it's 500 milliseconds, which will line up with
the outer termination loop while in the second it's 600 milliseconds
which doesn't syncronise so often.


    100 timesRepeat:
        [processes := (1 to: 10) collect: [:each| [[(TestProfiler new spyOn: [(Delay forMilliseconds: 500) wait]) ] repeat] fork ].
        (Delay forSeconds: 1) wait.
        Exupery log: 'terminate'..
        processes do: [:each| each terminate.].
        Exupery log: 'done'.]


    100 timesRepeat:
        [processes := (1 to: 10) collect: [:each| [[(TestProfiler new spyOn: [(Delay forMilliseconds: 600) wait]) ] repeat] fork ].
        (Delay forSeconds: 1) wait.
        Exupery log: 'terminate'..
        processes do: [:each| each terminate.].
        Exupery log: 'done'.]

Bryce

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Process bug introduced in 3.10

Bryce Kampjes
In reply to this post by Andreas.Raab
Andreas Raab writes:
 > Of course, if I'd ever learn how to spell "timeout" correctly, I might
 > find out that that's not the problem. Back to the drawing board... (but
 > I'm still not convinced that the problem isn't somewhere in the nested
 > unwind-protect stuff).

It definately looks like it's caused by the changes to Delay>>wait.

  wait
        "Schedule this Delay, then wait on its semaphore. The current
         process will be suspended for the amount of time specified
         when this Delay was created."

        self schedule.
        [delaySemaphore wait] ifCurtailed:[self unschedule].

If I comment out the "self unschedule" it works reliably. If I replace
the "self unschedule" with "Processor yield" the bug occurs but takes
a few more iterations (about 10 instead of 1 or two).

I'm going to leave this bug and go back to finishing up the next
Exupery release. I can make my test work by increasing the delay so
that the profilers never kill themselves.

Bryce

12