More Delay/Semaphore "fun"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

More Delay/Semaphore "fun"

Simon Kirk
Hi all. We've been playing with Semaphores and Delays recently, as we  
were having some problems with said issues. We tried installing  
Andreas' fixes, but if we do, the code below yields some strange  
results when evaluated in a workspace:


s _ Semaphore forMutualExclusion.
i _ 1.

"Loop while there's not too many signals on the semaphore (should be  
0 or 1)"
[(s instVarNamed: #excessSignals) < 2] whileTrue: [

"Fork two processes to do things inside the semaphore"
p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
        forkNamed: 'p', i printString.
       
q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
        forkNamed: 'q', i printString.

"Increment the counter just to make it easy to ID the processes"
i _ i + 1.

"Delay to give the processes a chance to resume and potentially get  
into the critical on the Semaphore"
(Delay forMilliseconds: 500 atRandom) wait.
p terminate.
(Delay forMilliseconds: 500 atRandom) wait.
q terminate.

"After terminating the two processes, the excess signals should be 1  
on the Semaphore because
nothing is in critical any more, assuming the processes terminated  
properly and the unwind happened
as it should, but... "

"In our images with Andreas' recent Delay and Semaphore fixes, we  
always get one of these two
error conditions (although normally too many signals). This would  
cause deadlock in 'real world' situations"
(s instVarNamed: #excessSignals) = 0 ifTrue: [
        WorldState addDeferredUIMessage: [self inform: 'Too few signals  
after ', i printString, ' loops']]].
self inform: 'Too many signals after ', i printString, ' loops'.


Don't know if we've done something wrong here, but if not I hope it  
may give cleverer people than me some more pointers to the root of  
the whole Semaphore problem :)

S


Pinesoft Computers are registered in England, Registered number: 2914825. Registered office: 266-268 High Street, Waltham Cross, Herts, EN8 7EA



This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
Excellent, thank you! I've been wanting an example which makes this
problem happen reliably for a while now. I'll look at it.

Cheers,
   - Andreas

Simon Kirk wrote:

> Hi all. We've been playing with Semaphores and Delays recently, as we
> were having some problems with said issues. We tried installing Andreas'
> fixes, but if we do, the code below yields some strange results when
> evaluated in a workspace:
>
>
> s _ Semaphore forMutualExclusion.
> i _ 1.
>
> "Loop while there's not too many signals on the semaphore (should be 0
> or 1)"
> [(s instVarNamed: #excessSignals) < 2] whileTrue: [
>
> "Fork two processes to do things inside the semaphore"
> p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
>     forkNamed: 'p', i printString.
>    
> q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
>     forkNamed: 'q', i printString.
>
> "Increment the counter just to make it easy to ID the processes"
> i _ i + 1.
>
> "Delay to give the processes a chance to resume and potentially get into
> the critical on the Semaphore"
> (Delay forMilliseconds: 500 atRandom) wait.
> p terminate.
> (Delay forMilliseconds: 500 atRandom) wait.
> q terminate.
>
> "After terminating the two processes, the excess signals should be 1 on
> the Semaphore because
> nothing is in critical any more, assuming the processes terminated
> properly and the unwind happened
> as it should, but... "
>
> "In our images with Andreas' recent Delay and Semaphore fixes, we always
> get one of these two
> error conditions (although normally too many signals). This would cause
> deadlock in 'real world' situations"
> (s instVarNamed: #excessSignals) = 0 ifTrue: [
>     WorldState addDeferredUIMessage: [self inform: 'Too few signals
> after ', i printString, ' loops']]].
> self inform: 'Too many signals after ', i printString, ' loops'.
>
>
> Don't know if we've done something wrong here, but if not I hope it may
> give cleverer people than me some more pointers to the root of the whole
> Semaphore problem :)
>
> S
>
>
> Pinesoft Computers are registered in England, Registered number:
> 2914825. Registered office: 266-268 High Street, Waltham Cross, Herts,
> EN8 7EA
>
>
>
> This message has been scanned for viruses by BlackSpider MailControl -
> www.blackspider.com
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
In reply to this post by Simon Kirk
Hi Simon -

I think I got it. Look at this:

Semaphore>>critical:
   caught := false.
   [
     caught := true.
     self wait.
     blockValue := mutuallyExcludedBlock value
   ] ensure: [
     caught ifTrue: [self signal].
   ].

Now let's run a little thought experiment:
* p1 enters the #critical:
* enters ensure: block
* sets "caught"
* waits on semaphore
* enters mutually excluded block; does its stuff

And then:
* p2 enters #critical:
* enters ensure: block
* sets "caught"
* waits on semaphore
* gets killed

At this point, we enter the ensure: handler for p2. And will 'ya look at
that: "caught" is set even though p2 didn't ever own the critical
section. This will make it signal the semaphore and screw up the rest of it.

Okay, obviously this needs to be changed to say, e.g.:

     self wait.
     caught := true. "entered semaphore successfully"

I have an image running in the background with your loop and it seems to
be doing fine for now. Coincidentally, I have not seen the effect that
too *few* signals were generated; are you certain you ever had that
effect? It couldn't be explained with the above sequence at all.

Cheers,
   - Andreas

Simon Kirk wrote:

> Hi all. We've been playing with Semaphores and Delays recently, as we
> were having some problems with said issues. We tried installing Andreas'
> fixes, but if we do, the code below yields some strange results when
> evaluated in a workspace:
>
>
> s _ Semaphore forMutualExclusion.
> i _ 1.
>
> "Loop while there's not too many signals on the semaphore (should be 0
> or 1)"
> [(s instVarNamed: #excessSignals) < 2] whileTrue: [
>
> "Fork two processes to do things inside the semaphore"
> p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
>     forkNamed: 'p', i printString.
>    
> q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
>     forkNamed: 'q', i printString.
>
> "Increment the counter just to make it easy to ID the processes"
> i _ i + 1.
>
> "Delay to give the processes a chance to resume and potentially get into
> the critical on the Semaphore"
> (Delay forMilliseconds: 500 atRandom) wait.
> p terminate.
> (Delay forMilliseconds: 500 atRandom) wait.
> q terminate.
>
> "After terminating the two processes, the excess signals should be 1 on
> the Semaphore because
> nothing is in critical any more, assuming the processes terminated
> properly and the unwind happened
> as it should, but... "
>
> "In our images with Andreas' recent Delay and Semaphore fixes, we always
> get one of these two
> error conditions (although normally too many signals). This would cause
> deadlock in 'real world' situations"
> (s instVarNamed: #excessSignals) = 0 ifTrue: [
>     WorldState addDeferredUIMessage: [self inform: 'Too few signals
> after ', i printString, ' loops']]].
> self inform: 'Too many signals after ', i printString, ' loops'.
>
>
> Don't know if we've done something wrong here, but if not I hope it may
> give cleverer people than me some more pointers to the root of the whole
> Semaphore problem :)
>
> S
>
>
> Pinesoft Computers are registered in England, Registered number:
> 2914825. Registered office: 266-268 High Street, Waltham Cross, Herts,
> EN8 7EA
>
>
>
> This message has been scanned for viruses by BlackSpider MailControl -
> www.blackspider.com
>
>
>


Reply | Threaded
Open this post in threaded view
|

RE: More Delay/Semaphore "fun"

Gary Chambers-4
*I* do get "too few signals" even after the extra change... At least the
"too many" problem seems to be fixed! Is a start...

Best to fork the thing off with a halt unless you like unresponsive UIs with
lots of popups...

--------------------

s _ Semaphore forMutualExclusion.
i _ 1.

"Loop while there's not too many signals on the semaphore (should be  
0 or 1)"
[[(s instVarNamed: #excessSignals) < 2] whileTrue: [

"Fork two processes to do things inside the semaphore"
p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
        forkNamed: 'p', i printString.
       
q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
        forkNamed: 'q', i printString.

"Increment the counter just to make it easy to ID the processes" i _ i + 1.

"Delay to give the processes a chance to resume and potentially get  
into the critical on the Semaphore"
(Delay forMilliseconds: 500 atRandom) wait.
p terminate.
(Delay forMilliseconds: 500 atRandom) wait.
q terminate.

"After terminating the two processes, the excess signals should be 1  
on the Semaphore because
nothing is in critical any more, assuming the processes terminated  
properly and the unwind happened
as it should, but... "

"In our images with Andreas' recent Delay and Semaphore fixes, we  
always get one of these two
error conditions (although normally too many signals). This would  
cause deadlock in 'real world' situations"
(s instVarNamed: #excessSignals) = 0 ifTrue: [
        WorldState addDeferredUIMessage: [self inform: 'Too few signals  
after ', i printString, ' loops'].
        self halt]].
self inform: 'Too many signals after ', i printString, ' loops'] fork

--------------

Varies in the number of successful loops


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Andreas Raab
> Sent: 05 October 2007 6:34 pm
> To: The general-purpose Squeak developers list
> Subject: Re: More Delay/Semaphore "fun"
>
>
> Hi Simon -
>
> I think I got it. Look at this:
>
> Semaphore>>critical:
>    caught := false.
>    [
>      caught := true.
>      self wait.
>      blockValue := mutuallyExcludedBlock value
>    ] ensure: [
>      caught ifTrue: [self signal].
>    ].
>
> Now let's run a little thought experiment:
> * p1 enters the #critical:
> * enters ensure: block
> * sets "caught"
> * waits on semaphore
> * enters mutually excluded block; does its stuff
>
> And then:
> * p2 enters #critical:
> * enters ensure: block
> * sets "caught"
> * waits on semaphore
> * gets killed
>
> At this point, we enter the ensure: handler for p2. And will
> 'ya look at
> that: "caught" is set even though p2 didn't ever own the critical
> section. This will make it signal the semaphore and screw up
> the rest of it.
>
> Okay, obviously this needs to be changed to say, e.g.:
>
>      self wait.
>      caught := true. "entered semaphore successfully"
>
> I have an image running in the background with your loop and
> it seems to
> be doing fine for now. Coincidentally, I have not seen the
> effect that
> too *few* signals were generated; are you certain you ever had that
> effect? It couldn't be explained with the above sequence at all.
>
> Cheers,
>    - Andreas
>
> Simon Kirk wrote:
> > Hi all. We've been playing with Semaphores and Delays
> recently, as we
> > were having some problems with said issues. We tried
> installing Andreas'
> > fixes, but if we do, the code below yields some strange
> results when
> > evaluated in a workspace:
> >
> >
> > s _ Semaphore forMutualExclusion.
> > i _ 1.
> >
> > "Loop while there's not too many signals on the semaphore
> (should be 0
> > or 1)"
> > [(s instVarNamed: #excessSignals) < 2] whileTrue: [
> >
> > "Fork two processes to do things inside the semaphore"
> > p _ [[true] whileTrue: [s critical: [(Delay
> forMilliseconds: 10) wait]]]
> >     forkNamed: 'p', i printString.
> >    
> > q _ [[true] whileTrue: [s critical: [(Delay
> forMilliseconds: 10) wait]]]
> >     forkNamed: 'q', i printString.
> >
> > "Increment the counter just to make it easy to ID the
> processes" i _ i
> > + 1.
> >
> > "Delay to give the processes a chance to resume and potentially get
> > into
> > the critical on the Semaphore"
> > (Delay forMilliseconds: 500 atRandom) wait.
> > p terminate.
> > (Delay forMilliseconds: 500 atRandom) wait.
> > q terminate.
> >
> > "After terminating the two processes, the excess signals
> should be 1
> > on
> > the Semaphore because
> > nothing is in critical any more, assuming the processes terminated
> > properly and the unwind happened
> > as it should, but... "
> >
> > "In our images with Andreas' recent Delay and Semaphore fixes, we
> > always
> > get one of these two
> > error conditions (although normally too many signals). This
> would cause
> > deadlock in 'real world' situations"
> > (s instVarNamed: #excessSignals) = 0 ifTrue: [
> >     WorldState addDeferredUIMessage: [self inform: 'Too few signals
> > after ', i printString, ' loops']]].
> > self inform: 'Too many signals after ', i printString, ' loops'.
> >
> >
> > Don't know if we've done something wrong here, but if not I hope it
> > may
> > give cleverer people than me some more pointers to the root
> of the whole
> > Semaphore problem :)
> >
> > S
> >
> >
> > Pinesoft Computers are registered in England, Registered number:
> > 2914825. Registered office: 266-268 High Street, Waltham
> Cross, Herts,
> > EN8 7EA
> >
> >
> >
> > This message has been scanned for viruses by BlackSpider
> MailControl -
> > www.blackspider.com
> >
> >
> >
>
>


Reply | Threaded
Open this post in threaded view
|

RE: More Delay/Semaphore "fun"

Gary Chambers-4
Though I think the problem may be down to the unwind handling. I'm sure we
saw the ensure block being run multiple times (with the previous "caught :=
true" before the wait version). Moving the caught to after the wait seems to
increase the chance of underflow on the signals. Really nasty!

What would be helpful is a VM driven critical primitive, I think.


Reply | Threaded
Open this post in threaded view
|

RE: More Delay/Semaphore "fun"

Simon Kirk
In reply to this post by Gary Chambers-4
Sadly I concur that, using a forked loop, I get the "too few signals" as well

Gary Chambers-4 wrote
*I* do get "too few signals" even after the extra change... At least the
"too many" problem seems to be fixed! Is a start...

Best to fork the thing off with a halt unless you like unresponsive UIs with
lots of popups...
Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
In reply to this post by Gary Chambers-4
Hi Gary -

Gary Chambers wrote:
> *I* do get "too few signals" even after the extra change... At least the
> "too many" problem seems to be fixed! Is a start...

Hm ... I've been running this code now for a while now and didn't have a
single problem. What platform are you running on? Do you have anything
else running in the background when you try it? How many loops does it
usually go through before exploding?

Cheers,
   - Andreas

> Best to fork the thing off with a halt unless you like unresponsive UIs with
> lots of popups...
>
> --------------------
>
> s _ Semaphore forMutualExclusion.
> i _ 1.
>
> "Loop while there's not too many signals on the semaphore (should be  
> 0 or 1)"
> [[(s instVarNamed: #excessSignals) < 2] whileTrue: [
>
> "Fork two processes to do things inside the semaphore"
> p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
> forkNamed: 'p', i printString.
>
> q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]]
> forkNamed: 'q', i printString.
>
> "Increment the counter just to make it easy to ID the processes" i _ i + 1.
>
> "Delay to give the processes a chance to resume and potentially get  
> into the critical on the Semaphore"
> (Delay forMilliseconds: 500 atRandom) wait.
> p terminate.
> (Delay forMilliseconds: 500 atRandom) wait.
> q terminate.
>
> "After terminating the two processes, the excess signals should be 1  
> on the Semaphore because
> nothing is in critical any more, assuming the processes terminated  
> properly and the unwind happened
> as it should, but... "
>
> "In our images with Andreas' recent Delay and Semaphore fixes, we  
> always get one of these two
> error conditions (although normally too many signals). This would  
> cause deadlock in 'real world' situations"
> (s instVarNamed: #excessSignals) = 0 ifTrue: [
> WorldState addDeferredUIMessage: [self inform: 'Too few signals  
> after ', i printString, ' loops'].
> self halt]].
> self inform: 'Too many signals after ', i printString, ' loops'] fork
>
> --------------
>
> Varies in the number of successful loops
>
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On
>> Behalf Of Andreas Raab
>> Sent: 05 October 2007 6:34 pm
>> To: The general-purpose Squeak developers list
>> Subject: Re: More Delay/Semaphore "fun"
>>
>>
>> Hi Simon -
>>
>> I think I got it. Look at this:
>>
>> Semaphore>>critical:
>>    caught := false.
>>    [
>>      caught := true.
>>      self wait.
>>      blockValue := mutuallyExcludedBlock value
>>    ] ensure: [
>>      caught ifTrue: [self signal].
>>    ].
>>
>> Now let's run a little thought experiment:
>> * p1 enters the #critical:
>> * enters ensure: block
>> * sets "caught"
>> * waits on semaphore
>> * enters mutually excluded block; does its stuff
>>
>> And then:
>> * p2 enters #critical:
>> * enters ensure: block
>> * sets "caught"
>> * waits on semaphore
>> * gets killed
>>
>> At this point, we enter the ensure: handler for p2. And will
>> 'ya look at
>> that: "caught" is set even though p2 didn't ever own the critical
>> section. This will make it signal the semaphore and screw up
>> the rest of it.
>>
>> Okay, obviously this needs to be changed to say, e.g.:
>>
>>      self wait.
>>      caught := true. "entered semaphore successfully"
>>
>> I have an image running in the background with your loop and
>> it seems to
>> be doing fine for now. Coincidentally, I have not seen the
>> effect that
>> too *few* signals were generated; are you certain you ever had that
>> effect? It couldn't be explained with the above sequence at all.
>>
>> Cheers,
>>    - Andreas
>>
>> Simon Kirk wrote:
>>> Hi all. We've been playing with Semaphores and Delays
>> recently, as we
>>> were having some problems with said issues. We tried
>> installing Andreas'
>>> fixes, but if we do, the code below yields some strange
>> results when
>>> evaluated in a workspace:
>>>
>>>
>>> s _ Semaphore forMutualExclusion.
>>> i _ 1.
>>>
>>> "Loop while there's not too many signals on the semaphore
>> (should be 0
>>> or 1)"
>>> [(s instVarNamed: #excessSignals) < 2] whileTrue: [
>>>
>>> "Fork two processes to do things inside the semaphore"
>>> p _ [[true] whileTrue: [s critical: [(Delay
>> forMilliseconds: 10) wait]]]
>>>     forkNamed: 'p', i printString.
>>>    
>>> q _ [[true] whileTrue: [s critical: [(Delay
>> forMilliseconds: 10) wait]]]
>>>     forkNamed: 'q', i printString.
>>>
>>> "Increment the counter just to make it easy to ID the
>> processes" i _ i
>>> + 1.
>>>
>>> "Delay to give the processes a chance to resume and potentially get
>>> into
>>> the critical on the Semaphore"
>>> (Delay forMilliseconds: 500 atRandom) wait.
>>> p terminate.
>>> (Delay forMilliseconds: 500 atRandom) wait.
>>> q terminate.
>>>
>>> "After terminating the two processes, the excess signals
>> should be 1
>>> on
>>> the Semaphore because
>>> nothing is in critical any more, assuming the processes terminated
>>> properly and the unwind happened
>>> as it should, but... "
>>>
>>> "In our images with Andreas' recent Delay and Semaphore fixes, we
>>> always
>>> get one of these two
>>> error conditions (although normally too many signals). This
>> would cause
>>> deadlock in 'real world' situations"
>>> (s instVarNamed: #excessSignals) = 0 ifTrue: [
>>>     WorldState addDeferredUIMessage: [self inform: 'Too few signals
>>> after ', i printString, ' loops']]].
>>> self inform: 'Too many signals after ', i printString, ' loops'.
>>>
>>>
>>> Don't know if we've done something wrong here, but if not I hope it
>>> may
>>> give cleverer people than me some more pointers to the root
>> of the whole
>>> Semaphore problem :)
>>>
>>> S
>>>
>>>
>>> Pinesoft Computers are registered in England, Registered number:
>>> 2914825. Registered office: 266-268 High Street, Waltham
>> Cross, Herts,
>>> EN8 7EA
>>>
>>>
>>>
>>> This message has been scanned for viruses by BlackSpider
>> MailControl -
>>> www.blackspider.com
>>>
>>>
>>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

RE: More Delay/Semaphore "fun"

Gary Chambers-4
Win and Linux. Depending on "other" activities, moving mouse, changing
active windows, can be 3 to 33! Depends on processor speed, I guess (the
critical "window").

We're on IRC at the moment, would be nice to chat with you there...
(freenode.net).


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Andreas Raab
> Sent: 05 October 2007 10:22 pm
> To: The general-purpose Squeak developers list
> Subject: Re: More Delay/Semaphore "fun"
>
>
> Hi Gary -
>
> Gary Chambers wrote:
> > *I* do get "too few signals" even after the extra change...
> At least
> > the "too many" problem seems to be fixed! Is a start...
>
> Hm ... I've been running this code now for a while now and
> didn't have a
> single problem. What platform are you running on? Do you have
> anything
> else running in the background when you try it? How many
> loops does it
> usually go through before exploding?
>
> Cheers,
>    - Andreas
>
> > Best to fork the thing off with a halt unless you like unresponsive
> > UIs with lots of popups...
> >
> > --------------------
> >
> > s _ Semaphore forMutualExclusion.
> > i _ 1.
> >
> > "Loop while there's not too many signals on the semaphore (should be
> > 0 or 1)"
> > [[(s instVarNamed: #excessSignals) < 2] whileTrue: [
> >
> > "Fork two processes to do things inside the semaphore"
> > p _ [[true] whileTrue: [s critical: [(Delay
> forMilliseconds: 10) wait]]]
> > forkNamed: 'p', i printString.
> >
> > q _ [[true] whileTrue: [s critical: [(Delay
> forMilliseconds: 10) wait]]]
> > forkNamed: 'q', i printString.
> >
> > "Increment the counter just to make it easy to ID the
> processes" i _ i
> > + 1.
> >
> > "Delay to give the processes a chance to resume and potentially get
> > into the critical on the Semaphore"
> > (Delay forMilliseconds: 500 atRandom) wait.
> > p terminate.
> > (Delay forMilliseconds: 500 atRandom) wait.
> > q terminate.
> >
> > "After terminating the two processes, the excess signals should be 1
> > on the Semaphore because
> > nothing is in critical any more, assuming the processes terminated  
> > properly and the unwind happened
> > as it should, but... "
> >
> > "In our images with Andreas' recent Delay and Semaphore fixes, we
> > always get one of these two
> > error conditions (although normally too many signals). This would  
> > cause deadlock in 'real world' situations"
> > (s instVarNamed: #excessSignals) = 0 ifTrue: [
> > WorldState addDeferredUIMessage: [self inform: 'Too few
> signals  
> > after ', i printString, ' loops'].
> > self halt]].
> > self inform: 'Too many signals after ', i printString, '
> loops'] fork
> >
> > --------------
> >
> > Varies in the number of successful loops
> >
> >
> >> -----Original Message-----
> >> From: [hidden email]
> >> [mailto:[hidden email]] On
> >> Behalf Of Andreas Raab
> >> Sent: 05 October 2007 6:34 pm
> >> To: The general-purpose Squeak developers list
> >> Subject: Re: More Delay/Semaphore "fun"
> >>
> >>
> >> Hi Simon -
> >>
> >> I think I got it. Look at this:
> >>
> >> Semaphore>>critical:
> >>    caught := false.
> >>    [
> >>      caught := true.
> >>      self wait.
> >>      blockValue := mutuallyExcludedBlock value
> >>    ] ensure: [
> >>      caught ifTrue: [self signal].
> >>    ].
> >>
> >> Now let's run a little thought experiment:
> >> * p1 enters the #critical:
> >> * enters ensure: block
> >> * sets "caught"
> >> * waits on semaphore
> >> * enters mutually excluded block; does its stuff
> >>
> >> And then:
> >> * p2 enters #critical:
> >> * enters ensure: block
> >> * sets "caught"
> >> * waits on semaphore
> >> * gets killed
> >>
> >> At this point, we enter the ensure: handler for p2. And will
> >> 'ya look at
> >> that: "caught" is set even though p2 didn't ever own the critical
> >> section. This will make it signal the semaphore and screw up
> >> the rest of it.
> >>
> >> Okay, obviously this needs to be changed to say, e.g.:
> >>
> >>      self wait.
> >>      caught := true. "entered semaphore successfully"
> >>
> >> I have an image running in the background with your loop and
> >> it seems to
> >> be doing fine for now. Coincidentally, I have not seen the
> >> effect that
> >> too *few* signals were generated; are you certain you ever
> had that
> >> effect? It couldn't be explained with the above sequence at all.
> >>
> >> Cheers,
> >>    - Andreas
> >>
> >> Simon Kirk wrote:
> >>> Hi all. We've been playing with Semaphores and Delays
> >> recently, as we
> >>> were having some problems with said issues. We tried
> >> installing Andreas'
> >>> fixes, but if we do, the code below yields some strange
> >> results when
> >>> evaluated in a workspace:
> >>>
> >>>
> >>> s _ Semaphore forMutualExclusion.
> >>> i _ 1.
> >>>
> >>> "Loop while there's not too many signals on the semaphore
> >> (should be 0
> >>> or 1)"
> >>> [(s instVarNamed: #excessSignals) < 2] whileTrue: [
> >>>
> >>> "Fork two processes to do things inside the semaphore"
> >>> p _ [[true] whileTrue: [s critical: [(Delay
> >> forMilliseconds: 10) wait]]]
> >>>     forkNamed: 'p', i printString.
> >>>    
> >>> q _ [[true] whileTrue: [s critical: [(Delay
> >> forMilliseconds: 10) wait]]]
> >>>     forkNamed: 'q', i printString.
> >>>
> >>> "Increment the counter just to make it easy to ID the
> >> processes" i _ i
> >>> + 1.
> >>>
> >>> "Delay to give the processes a chance to resume and
> potentially get
> >>> into
> >>> the critical on the Semaphore"
> >>> (Delay forMilliseconds: 500 atRandom) wait.
> >>> p terminate.
> >>> (Delay forMilliseconds: 500 atRandom) wait.
> >>> q terminate.
> >>>
> >>> "After terminating the two processes, the excess signals
> >> should be 1
> >>> on
> >>> the Semaphore because
> >>> nothing is in critical any more, assuming the processes terminated
> >>> properly and the unwind happened
> >>> as it should, but... "
> >>>
> >>> "In our images with Andreas' recent Delay and Semaphore fixes, we
> >>> always
> >>> get one of these two
> >>> error conditions (although normally too many signals). This
> >> would cause
> >>> deadlock in 'real world' situations"
> >>> (s instVarNamed: #excessSignals) = 0 ifTrue: [
> >>>     WorldState addDeferredUIMessage: [self inform: 'Too
> few signals
> >>> after ', i printString, ' loops']]].
> >>> self inform: 'Too many signals after ', i printString, ' loops'.
> >>>
> >>>
> >>> Don't know if we've done something wrong here, but if not
> I hope it
> >>> may
> >>> give cleverer people than me some more pointers to the root
> >> of the whole
> >>> Semaphore problem :)
> >>>
> >>> S
> >>>
> >>>
> >>> Pinesoft Computers are registered in England, Registered number:
> >>> 2914825. Registered office: 266-268 High Street, Waltham
> >> Cross, Herts,
> >>> EN8 7EA
> >>>
> >>>
> >>>
> >>> This message has been scanned for viruses by BlackSpider
> >> MailControl -
> >>> www.blackspider.com
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>
>


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Igor Stasenko
The problem is obviously in wait primitive.
Instead of simply return nothing, it should make difference between
successful wait and wait, which is abandoned due to process
termination.

I don't know if that possible to implement without VM modifications,
but wait primitive should work like:

self wait: afterwaitBlock

where afterwaitBlock accepts single boolean value - true means wait
successful and false means that it abandoned(process terminated).

or same, in slightly different form:

self waitOk: [ wait ok ] failed: [ wait failed ].

Then, you should never raise signal for abandoned wait, while can do
it if you successful on waiting:

self waitOk: [  [...] ensure: [ self signal ] failed: [ ".. do nothing"  ]

^^ but again, this code sometimes can fail (you can manage to
terminate process, when you entered in #waitOk: block, but before you
send #ensure: ) , so again, you have chances to not raise signal if
your process terminates shortly.

Another variant, since we don't really need to have block for handle
abandoned waits, then wait primitive can be a composition of two wait
and ensure, which is:

self waitOk: [ ".... working code .." ] ensureAfter: [ "This block
will be evaluated only if we entered working block and regardless
after it terminated or finished in regular way"  ]

And if wait abandoned, none of blocks should be evaluated.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Igor Stasenko
> self waitOk: [ ".... working code .." ] ensureAfter: [ "This block
> will be evaluated only if we entered working block and regardless
> after it terminated or finished in regular way"  ]
>

ohh.. yet again, there's no guarantees that code:

self waitOk: [ ".... working code .." ] ensureAfter: [ self signal ]

will exit with signalled state, because you can terminate process
after entering the ensureAfter block, but before sending #signal
message.

Then the only way is to make it to handle all by VM. And #critical:
code should look simply like:

critical: mutuallyExcludedBlock

   self waitThenEvaluateAndSignalAfter: mutuallyExcludedBlock


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
Igor Stasenko wrote:
> Then the only way is to make it to handle all by VM.

Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 to
see if this addresses your problem. It appears to work remarkably well
for me.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

stephane ducasse
HI andreas

do you know if your fixes have been harvested in 3.10. I'm about to  
write some chapters on concurrency
and it would be good to have the bug fixed :)

Stef

On 6 oct. 07, at 03:27, Andreas Raab wrote:

> Igor Stasenko wrote:
>> Then the only way is to make it to handle all by VM.
>
> Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 
> to see if this addresses your problem. It appears to work  
> remarkably well for me.
>
> Cheers,
>   - Andreas
>
>


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Edgar J. De Cleene



El 10/6/07 5:19 AM, "stephane ducasse" <[hidden email]> escribió:

> HI andreas
>
> do you know if your fixes have been harvested in 3.10. I'm about to
> write some chapters on concurrency
> and it would be good to have the bug fixed :)
>
> Stef

Andreas is a super master !
Just I try the last clue for 6576 and doing the last batch of updates( I
hope) for going to gamma.

Cheers.

Edgar



Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Igor Stasenko
In reply to this post by Andreas.Raab
On 06/10/2007, Andreas Raab <[hidden email]> wrote:
> Igor Stasenko wrote:
> > Then the only way is to make it to handle all by VM.
>
> Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 to
> see if this addresses your problem. It appears to work remarkably well
> for me.

A bit hacky approach, don't you think?

>
> Cheers,
>    - Andreas
>
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
Igor Stasenko wrote:
> On 06/10/2007, Andreas Raab <[hidden email]> wrote:
>> Igor Stasenko wrote:
>>> Then the only way is to make it to handle all by VM.
>> Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 to
>> see if this addresses your problem. It appears to work remarkably well
>> for me.
>
> A bit hacky approach, don't you think?

Without any doubt. But I don't see a quick solution to the problem that
isn't a bit of a hack. It's not clear what a VM-driven solution looks
like either btw.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Paolo Bonzini-2
Andreas Raab wrote:

> Igor Stasenko wrote:
>> On 06/10/2007, Andreas Raab <[hidden email]> wrote:
>>> Igor Stasenko wrote:
>>>> Then the only way is to make it to handle all by VM.
>>> Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 to
>>> see if this addresses your problem. It appears to work remarkably well
>>> for me.
>>
>> A bit hacky approach, don't you think?
>
> Without any doubt. But I don't see a quick solution to the problem that
> isn't a bit of a hack. It's not clear what a VM-driven solution looks
> like either btw.

GNU Smalltalk does it like this:

  | caught |
  ^[
      [
          "The VM will not preempt the process between the two statements.
           Note that it is *wrong* to set the variable to true before
           obtaining the semaphore, but the exception handler straightens
           that.  On the other hand, setting the variable to true after the
           wait would be wrong if the process was preempted and terminated
           after the wait (hence, with `caught' still set to false and the
           semaphore obtained)."
          caught := true.
          self wait ] on: ProcessBeingTerminated do: [ :ex |
              ex semaphore isNil ifFalse: [ caught := false ] ].
      aBlock value ]
          ensure: [caught ifTrue: [self signal] ]


with Process>>#terminate sending a ProcessBeingTerminated notification
before issuing the final "suspend" call.  If the process was waiting on
a semaphore (as seen from the suspendingList), that semaphore is stored
into the notification.  Seems to work.

Paolo


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Andreas.Raab
Paolo Bonzini wrote:

> GNU Smalltalk does it like this:
>
>  | caught |
>  ^[
>      [
>          "The VM will not preempt the process between the two statements.
>           Note that it is *wrong* to set the variable to true before
>           obtaining the semaphore, but the exception handler straightens
>           that.  On the other hand, setting the variable to true after the
>           wait would be wrong if the process was preempted and terminated
>           after the wait (hence, with `caught' still set to false and the
>           semaphore obtained)."
>          caught := true.
>          self wait ] on: ProcessBeingTerminated do: [ :ex |
>              ex semaphore isNil ifFalse: [ caught := false ] ].
>      aBlock value ]
>          ensure: [caught ifTrue: [self signal] ]
>
>
> with Process>>#terminate sending a ProcessBeingTerminated notification
> before issuing the final "suspend" call.  If the process was waiting on
> a semaphore (as seen from the suspendingList), that semaphore is stored
> into the notification.  Seems to work.

Hm ... it looks wrong to me unless there is a part which isn't shown
here or GST works differently. In Squeak, when the semaphore gets
signaled, the process gets transferred from the semaphore to one of
Processor's suspendedList. So testing for nil wouldn't work in a
situation like here:

   s := Semaphore new.

   "Create a process that enters the critical section at low priority"
   p := [s critical:[]] forkAt: Processor activePriority-1.
   [p suspendingList == s] whileFalse:[(Delay forMilliseconds: 10) wait].

   "At this point, the suspendingList is the semaphore"
   self assert:[p suspendingList == s].

   "Signal the semaphore"
   s signal.

   "At this point, the suspendingList is inside Processor (but non-nil)"
   self assert:[p suspendingList == (Processor waitingProcessesAt: p
priority)].

   "Now terminate p"
   p terminate.

   "And check the signal count"
   self assert:[(s instVarNamed: 'excessSignals') = 1].

I would really expect that code above to to test for "ex semaphore ==
self" instead of nil, which would make a lot more sense to me. Also, how
do you ensure atomicity in the operation when you manipulate the
suspendingList itself?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Paolo Bonzini-2

> Hm ... it looks wrong to me unless there is a part which isn't shown
> here or GST works differently. In Squeak, when the semaphore gets
> signaled, the process gets transferred from the semaphore to one of
> Processor's suspendedList.

If the suspendedList is one of the processor's, ex semaphore is nil.
That's part of how the ProcessBeingTerminated is built in
Process>>#terminate.

> I would really expect that code above to to test for "ex semaphore ==
> self" instead of nil, which would make a lot more sense to me.

Yes, it would be okay too.

> Also, how
> do you ensure atomicity in the operation when you manipulate the
> suspendingList itself?

I don't understand really, but the atomicity of some operations (for
example #queueInterrupt:, which is the same as in VisualAge and is how
the ProcessBeingTerminated exception is sent to the process) is
guaranteed by telling the VM to disable preemption.

Plus, as in ST-80 (Blue Book) and Squeak, no interruption can happen
between "caught := true" and "self wait" because of when interrupts
are tested for.

Paolo

Reply | Threaded
Open this post in threaded view
|

RE: More Delay/Semaphore "fun"

Gary Chambers-4
In reply to this post by Andreas.Raab
Well, we're finding that you fixes are helping a lot (maybe not 100% watertight but much better than without!).

For a more difficult challenge:

Monitor>>critical:

!

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]]On Behalf Of
> Andreas Raab
> Sent: 06 October 2007 7:18 PM
> To: The general-purpose Squeak developers list
> Subject: Re: More Delay/Semaphore "fun"
>
>
> Igor Stasenko wrote:
> > On 06/10/2007, Andreas Raab <[hidden email]> wrote:
> >> Igor Stasenko wrote:
> >>> Then the only way is to make it to handle all by VM.
> >> Maybe not quite. Check out http://bugs.squeak.org/view.php?id=6588 to
> >> see if this addresses your problem. It appears to work remarkably well
> >> for me.
> >
> > A bit hacky approach, don't you think?
>
> Without any doubt. But I don't see a quick solution to the problem that
> isn't a bit of a hack. It's not clear what a VM-driven solution looks
> like either btw.
>
> Cheers,
>    - Andreas
>


Reply | Threaded
Open this post in threaded view
|

Re: More Delay/Semaphore "fun"

Paolo Bonzini-2
Gary Chambers wrote:
> Well, we're finding that you fixes are helping a lot (maybe not 100% watertight but much better than without!).
>
> For a more difficult challenge:
>
> Monitor>>critical:

Not really... It would be more complicated to support *everything* in
Monitor.  But if all you want is a recursion-safe mutex, you can inline
enter and exit (and the nestingLevel variable is useless now):

| us |
us := Processor activeProcess.
ownerProcess == us
     ifTrue: [ ^aBlock value ]
     ifFalse: [
        mutex critical: [
            ["When we enter, the mutex is free so ownerProcess is nil.
               So the unwinding does not mess up anything."
             ownerProcess := us.
            blockValue := aBlock value ]
                ensure: [ ownerProcess := nil ] ] ].
     ^blockValue

The complicated stuff is done in Semaphore>>#critical: and Monitor can
simply leverage that.

Alternatively, you could have a primitive to notify a waiter on a
semaphore without adding a signal (a no-op if there is no one waiting).
    This simplifies everything because the excess signals are always
zero!  Then the code would look like this:

reset := false.
[ ownerProcess == us ] whileFalse: [
     semaphore wait.
     ownerProcess isNil ifTrue: [ownerProcess := us. reset := true]].
[ blockValue := aBlock value ]
     ensure: [reset ifTrue: [ownerProcess := nil. semaphore notify]].
^blockValue

Paolo


12