On 2/1/06, Cees De Groot <[hidden email]> wrote:
> If you look at it, there's one funny thing. I added an #on:do: handler > around the actual addition to the "SuspendedDelays add:..." bit in > Delay>>schedule: > Not funny at all. Just my stupidity - there are two calls to #add: in that method, I protected only one. On IRC, Jecel Assumpção remarks that the Semaphore in both cases is 'in rest' - excessSignals is 1, where 0 would be expected when the critical section is active. And -1 when two critical sections are active :-) |
On 2/1/06, Cees De Groot <[hidden email]> wrote:
> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is > 'in rest' - excessSignals is 1, where 0 would be expected when the > critical section is active. And -1 when two critical sections are > active :-) > More debugging, it turns out that AccessProtect's excessSignals indeed can become > 1 in the application. Whether that is due to the many times that we terminate processes inside delays or not, I don't know. Fact is that in my current image, the application starts up, does a lot of stuff exercising Delay, and then hits this condition. It must be some sort of race condition. On IRC, Craig suggested I move over to the VM level for debugging. However, I'm not sure how to proceed - I could presumably trap excessSignals becoming '2' on all semaphores, but only some of them are used as Mutex so that breakpoint would be called spuriously (unless someone knows how to formulate a bp that only hits on semaphores setup as mutex...). And then, the culprit could very well be the first time the semaphore is signalled, but I can only set a breakpoint on the second time. Manually checking all the times the BP is hit is undoable, hundreds of Delays are scheduled before the bug hits. Any suggestions on how we could proceed? Thanks, Cees |
Try making an Array to put them (the sems you know are going to be
used as mutex) in so GC keeps everything up to date and 'register' this array. Easiest way to do that would be to stick it in the specialOops array somewhere, there must be an empty slot. Hell, nobody uses 40/41 so far as I know, abuse them for a bit. Once you have this anchor you can follow the oops in the vm easily. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: RDL: Rotate Disk Left |
In reply to this post by Cees De Groot
Well if you recall a couple of weeks back I had a changeset that
turns on printing of the stack in message send, so you can see the stack on each message send by process. Mind you'll need to build a VM for this, plus write some code to read the log file and determine how the semaphore signalling is done. Or of course change the VM to note the excessive semaphore signalling and print the stack when it happens, and as Tim points out use the Special objects array to track the semaphore of interest. On 1-Feb-06, at 3:42 PM, Cees De Groot wrote: > On 2/1/06, Cees De Groot <[hidden email]> wrote: >> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is >> 'in rest' - excessSignals is 1, where 0 would be expected when the >> critical section is active. And -1 when two critical sections are >> active :-) > > > Any suggestions on how we could proceed? > > Thanks, > > Cees > -- ======================================================================== === John M. McIntosh <[hidden email]> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Thanks John & Tim, for these suggestions. I'll continue debugging
tonight (have to work in VisualAge today, work hard, earn money) and post as I make progress (or not). Regards, Cees On 2/2/06, John M McIntosh <[hidden email]> wrote: > Well if you recall a couple of weeks back I had a changeset that > turns on printing of the stack > in message send, so you can see the stack on each message send by > process. > > Mind you'll need to build a VM for this, plus write some code to read > the log file and > determine how the semaphore signalling is done. > > Or of course change the VM to note the excessive semaphore signalling > and print the stack when it > happens, and as Tim points out use the Special objects array to track > the semaphore of interest. > > > > On 1-Feb-06, at 3:42 PM, Cees De Groot wrote: > > > On 2/1/06, Cees De Groot <[hidden email]> wrote: > >> On IRC, Jecel Assumpção remarks that the Semaphore in both cases is > >> 'in rest' - excessSignals is 1, where 0 would be expected when the > >> critical section is active. And -1 when two critical sections are > >> active :-) > > > > > > Any suggestions on how we could proceed? > > > > Thanks, > > > > Cees > > > > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > |
BTW, you wouldn't kill a process *while* it is trying to execute
SortedCollection>>add: would you? Because this might get you a nil in SuspendedDelays faster than you could say "oops". And I agree with your analysis that something must be signaling the mutex twice but I will point out that one of the processes was the wx event loop which may very well be invoked via unlikely places (if I remember the discussion with Rob then he had serious problems getting wx to work asynchronously). Cheers, - Andreas Cees De Groot wrote: > Thanks John & Tim, for these suggestions. I'll continue debugging > tonight (have to work in VisualAge today, work hard, earn money) and > post as I make progress (or not). > > Regards, > > Cees > > On 2/2/06, John M McIntosh <[hidden email]> wrote: > >>Well if you recall a couple of weeks back I had a changeset that >>turns on printing of the stack >>in message send, so you can see the stack on each message send by >>process. >> >>Mind you'll need to build a VM for this, plus write some code to read >>the log file and >>determine how the semaphore signalling is done. >> >>Or of course change the VM to note the excessive semaphore signalling >>and print the stack when it >>happens, and as Tim points out use the Special objects array to track >>the semaphore of interest. >> >> >> >>On 1-Feb-06, at 3:42 PM, Cees De Groot wrote: >> >> >>>On 2/1/06, Cees De Groot <[hidden email]> wrote: >>> >>>>On IRC, Jecel Assumpção remarks that the Semaphore in both cases is >>>>'in rest' - excessSignals is 1, where 0 would be expected when the >>>>critical section is active. And -1 when two critical sections are >>>>active :-) >>> >>> >>>Any suggestions on how we could proceed? >>> >>>Thanks, >>> >>>Cees >>> >> >>-- >>======================================================================== >>=== >>John M. McIntosh <[hidden email]> 1-800-477-2659 >>Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com >>======================================================================== >>=== >> >> >> > > > |
On 2/2/06, Andreas Raab <[hidden email]> wrote:
> BTW, you wouldn't kill a process *while* it is trying to execute > SortedCollection>>add: would you? I wouldn't know, of course, but yes, it could happen. Thanks for pointing it out. We do lots of network searches, which are sent into the network, and then a process sleeps for a while and shuts down the search. If the search gives results, the process is terminated. Conceivably, this could happen while the process is being scheduled. Unlikely, but possible. Shouldn't this whole Delay>>schedule thing maybe be done in an un-preemteable block? Or is this something we would need to solve on our application level? > And I agree with your > analysis that something must be signaling the mutex twice but I will > point out that one of the processes was the wx event loop which may very > well be invoked via unlikely places (if I remember the discussion with > Rob then he had serious problems getting wx to work asynchronously). > Yup. I'm still not 100% convinced that we can't have a situation like two native threads going into the VM at the same time, but that is probably mostly ignorance. |
In reply to this post by Cees De Groot
That such a low-level operation appears to be failing
like this is very disturbing.. > Message: 4 > Date: Wed, 1 Feb 2006 15:06:33 +0100 > From: Cees De Groot <[hidden email]> > Subject: Re: nil in SuspendedDelays??? > To: The general-purpose Squeak developers list > <[hidden email]> > Message-ID: > > <[hidden email]> > Content-Type: text/plain; charset=ISO-8859-1 > > Well, I have "proof" now: > http://www.cdegroot.com/blog/wp-content/uploads/screenshot.png > shows a > screenshot with two debuggers that seem to be inside > the same critical > section... > > The VM's dump stack results in: > > SortedCollection>>add: > [] in Delay>>schedule {[beingWaitedOn := true. > resumptionTime := Time > millisecondClockValue + dela...]} > [] in Semaphore>>critical: {[[self wait] > ifCurtailed: [self signal]. > recover := true. result := mutu...]} > BlockContext>>ifCurtailed: > Semaphore>>critical: > Delay>>schedule > Delay>>wait > WorldState>>interCyclePause: > WorldState>>doOneCycleFor: > PasteUpMorph>>doOneCycle > [] in Project class>>spawnNewProcess {[[World > doOneCycle. Processor > yield. false] whileFalse. nil]} > [] in BlockContext>>newProcess {[self value. > Processor terminateActive]} > > It *seems* that this only happens in wxSqueak. I > have SysInternals' > process explorer running, and the Squeak VM is > running a bunch of > threads but I'm not sure what these threads do. But > apart from the > guess that something like this could happen when > multiple native > threads start executing Squeak code, I'm at a loss > what other cause > could be behind this... > > HEEELP!!! ;-) |
Free forum by Nabble | Edit this page |