Delay and Server reliability

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

keith1y
Andreas Raab wrote:

> Damien Cassou wrote:
>>> Is there any chance that this patch goes to 3.10?
>>
>> Chance would be greater if unit tests were included.
>
> Good luck with that. I tried for a couple of hours to find a reliable
> way of creating this problem with not even so much as a hint of being
> able to make it happen. The problem is that on any local machine
> you're never completely independent from the time source of that
> machine and if you are dependent on the time source you are in sync
> with Delay and everything will be fine. You need an independent source
> of events and I've yet to find someone who shows me how to write unit
> tests across multiple machines (and no, running multiple images on the
> same machine doesn't work because your process scheduler uses the same
> time source that your image uses so it's not independent).
>
> Cheers,
>   - Andreas
>
>
I dont know if this would help, but I used the process specific package
to warp the clock, so that you can specify your own DateAndTime
implementation on a per process basis. With this technique you can run
the clock at 2x speed or even backwards, so you may be able to schedule
specific event times to recreate the bug.

The code is in monticello repository http://gjallar.krampe.se/  in the
ProcessSpecific package

Keith


Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

keith1y
 
>>
>>
> I dont know if this would help,
On second thoughts I doubt that it would help at all

Keith

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

johnmci
In reply to this post by Andreas.Raab

On Jul 24, 2007, at 9:03 AM, Andreas Raab wrote:

> Ouch. You are right. Here is a variant with the class definition


Ok, well I'm wonder then if I should close

http://bugs.squeak.org/view.php?id=4882

which sounds similar, no cpu, nothing works. In looking at some  
stacks on the mac when this happens (very rare)
everything seems to be waiting on Delays of some sort, and just the  
idle loop process is running (well sleeping...)

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Andreas.Raab
John M McIntosh wrote:
> Ok, well I'm wonder then if I should close
>
> http://bugs.squeak.org/view.php?id=4882
>
> which sounds similar, no cpu, nothing works. In looking at some stacks
> on the mac when this happens (very rare)
> everything seems to be waiting on Delays of some sort, and just the idle
> loop process is running (well sleeping...)

Precisely. Those are the exact symptoms of the problem.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

stephane ducasse
In reply to this post by Andreas.Raab
Hi andreas


I was reading your code to exercise my bad concurrent skills. I guess  
that I'm hopeless on that but I always try :)

"This change set fixes this problem by moving *all* manipulation of  
Delay's internal structures out of the calling process."

Is this statement implemented by the AccessProtect critical...?

stef



Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

cdavidshaffer
In reply to this post by Andreas.Raab
Andreas Raab wrote:
> Ouch. You are right. Here is a variant with the class definition
> included.
>
> Cheers,
>   - Andreas
>

Andreas,

Are you sure that this is the complete patch?  We are currently having a
very similar problem: VNC doesn't respond to UI events, 0% cpu usage,
several processes frozen in Delay although our Seaside server still
responds.  I tried your patch against Squeak3.9 but it had no effect on
the problem.  It could be that our problem isn't Delay but I was hoping
it was ;-)

David

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Andreas.Raab
I'm pretty sure it's complete. If you want some help do this:
* Launch the VM with output redirected to a file
* Wait until it locks up
* Attach gdb to the running process, e.g,:
   gdb --pid: <pid of vm>
* Have it print all the call stacks, e.g.:
   p (int)printAllStacks
* Look at the result output file.

Cheers,
   -Andreas


David Shaffer wrote:

> Andreas Raab wrote:
>> Ouch. You are right. Here is a variant with the class definition
>> included.
>>
>> Cheers,
>>   - Andreas
>>
>
> Andreas,
>
> Are you sure that this is the complete patch?  We are currently having a
> very similar problem: VNC doesn't respond to UI events, 0% cpu usage,
> several processes frozen in Delay although our Seaside server still
> responds.  I tried your patch against Squeak3.9 but it had no effect on
> the problem.  It could be that our problem isn't Delay but I was hoping
> it was ;-)
>
> David
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

cdavidshaffer
Andreas Raab wrote:
> I'm pretty sure it's complete. If you want some help do this:
> * Launch the VM with output redirected to a file
> * Wait until it locks up
> * Attach gdb to the running process, e.g,:
>   gdb --pid: <pid of vm>
> * Have it print all the call stacks, e.g.:
>   p (int)printAllStacks
> * Look at the result output file.
>
Thanks for the gdb tip.  I can look at the processes via my Seaside
server as well since it is still responding.  Anyway the debugging
output is below.  The TdTimer processes are not making progress
although, in one case, the sleep should only be for 60 seconds.  The VNC
server accepts connections but isn't responding to user input including
alt-. (although the VNC cursor tracks and there is sometimes UI activity
if, for example, the Transcript window is open).  If I enter (Delay
forSeconds: 5) wait in  a web-browser based workspace it will hang
forever although, as I mentioned, I am able to interact with the image
in other ways through this workspace.

It seems like the list below isn't complete since I have a web server
process blocked waiting for connections...but anyway the image exhibits
this behavior with and without your Delay patch applied.

David


Process
2064888972 >idleProcess
2064858556 [] in >startUp
2064858648 [] in BlockContext>newProcess



Process
2064885004 >finalizationProcess
2064884820 [] in >restartFinalizationProcess
2064884912 [] in BlockContext>newProcess


Process
2085972252 Semaphore>critical:
2085972068 Delay>scheduleEvent
2085971932 Delay>schedule
2085971840 Delay>wait
2085971748 WorldState>interCyclePause:
2085971656 WorldState>doOneCycleFor:
2085971564 PasteUpMorph>doOneCycle
2054607980 [] in >spawnNewProcess
2054608164 [] in BlockContext>newProcess
Process
2085972988 Semaphore>critical:
2085972804 Delay>scheduleEvent
2085972712 Delay>schedule
2085972620 Delay>wait
2085972436 [] in EventSensor>eventTickler
2085972344 BlockContext>on:do:
2064857692 EventSensor>eventTickler
2064857416 [] in EventSensor>installEventTickler
2064857600 [] in BlockContext>newProcess
Process
2085973768 Semaphore>critical:
2085973584 Delay>scheduleEvent
2085973448 Delay>schedule
2085973356 Delay>wait
2085973264 [] in ApplicationService>sleepFor:
2085973172 >terminationOkDuring:
2085973080 ApplicationService>sleepFor:
2065158156 TdTimer>runWhile:
2065157788 [] in ApplicationService>start
2065157972 BlockContext>ensure:
2064894444 [] in ApplicationService>start
2065157696 BlockContext>on:do:
2065157512 BlockContext>valueWithBindingsContext:
2065157420 BlockContext>valueWithBindings:
2064894536 [] in BlockContext>newProcessWithBindings:
2064894628 [] in BlockContext>newProcess
Process
2086064536 Semaphore>critical:
2086064308 Delay>scheduleEvent
2086064184 Delay>schedule
2086064092 Delay>wait
2086063816 [] in Semaphore>waitTimeoutMSecs:
2086064000 [] in BlockContext>newProcess
Process
2086089116 Semaphore>critical:
2086088932 Delay>scheduleEvent
2086088796 Delay>schedule
2086088612 Delay>wait
2086088704 [] in ApplicationService>sleepFor:
2086088520 >terminationOkDuring:
2086088428 ApplicationService>sleepFor:
2064888880 TdTimer>runWhile:
2064888512 [] in ApplicationService>start
2064888696 BlockContext>ensure:
2064886188 [] in ApplicationService>start
2064888420 BlockContext>on:do:
2064888236 BlockContext>valueWithBindingsContext:
2064888144 BlockContext>valueWithBindings:
2064886280 [] in BlockContext>newProcessWithBindings:
2064886372 [] in BlockContext>newProcess
Process
2092063728 Semaphore>critical:
2092063544 Delay>scheduleEvent
2092063408 Delay>schedule
2092063200 Delay>wait
2092063316 [] in ApplicationService>sleepFor:
2092063092 >terminationOkDuring:
2092063000 ApplicationService>sleepFor:
2065160244 TdTimer>runWhile:
2065157144 [] in ApplicationService>start
2065157328 BlockContext>ensure:
2064893660 [] in ApplicationService>start
2065157052 BlockContext>on:do:
2065156868 BlockContext>valueWithBindingsContext:
2065156776 BlockContext>valueWithBindings:
2064893752 [] in BlockContext>newProcessWithBindings:
2064893844 [] in BlockContext>newProcess
Process
2099204904 Semaphore>critical:
2099204676 Delay>scheduleEvent
2099204552 Delay>schedule
2099204460 Delay>wait
2099187636 [] in Semaphore>waitTimeoutMSecs:
2099187820 [] in BlockContext>newProcess

Process
2099550248 >handleTimerEvent
2059283788 [] in >runTimerEventLoop
2059283368 BlockContext>on:do:
2059283140 >runTimerEventLoop
2059283572 [] in >startTimerEventLoop
2059283664 [] in BlockContext>newProcess
Process
2064857200 InputSensor>userInterruptWatcher
2064857016 [] in InputSensor>installInterruptWatcher
2064857108 [] in BlockContext>newProcess


Process
2064858136 SystemDictionary>lowSpaceWatcher
2064858228 [] in SystemDictionary>installLowSpaceWatcher
2064858320 [] in BlockContext>newProcess

Process
2086063724 Semaphore>waitTimeoutMSecs:
2086063632 Socket>waitForConnectionFor:ifTimedOut:
2086063448 Socket>waitForConnectionFor:
2086063172 [] in Socket>waitForAcceptFor:
2086063356 BlockContext>on:do:
2086063080 Socket>waitForAcceptFor:
2086062896 [] in RFBServer>runLoop
2086062804 BlockContext>on:do:
2064890492 RFBServer>runLoop
2064890612 [] in RFB



Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Andreas.Raab
Ah yes, of course. You're missing another batch of fixes that we have
long applied to our servers. In this case it's the handling of
Semaphore>>critical: (which is broken in all Squeak versions). Give the
attached changes a try and if it still don't work I'm sure there are
more fixes that we've applied in the meantime ;-)

Cheers,
   - Andreas

David Shaffer wrote:

> Andreas Raab wrote:
>> I'm pretty sure it's complete. If you want some help do this:
>> * Launch the VM with output redirected to a file
>> * Wait until it locks up
>> * Attach gdb to the running process, e.g,:
>>   gdb --pid: <pid of vm>
>> * Have it print all the call stacks, e.g.:
>>   p (int)printAllStacks
>> * Look at the result output file.
>>
> Thanks for the gdb tip.  I can look at the processes via my Seaside
> server as well since it is still responding.  Anyway the debugging
> output is below.  The TdTimer processes are not making progress
> although, in one case, the sleep should only be for 60 seconds.  The VNC
> server accepts connections but isn't responding to user input including
> alt-. (although the VNC cursor tracks and there is sometimes UI activity
> if, for example, the Transcript window is open).  If I enter (Delay
> forSeconds: 5) wait in  a web-browser based workspace it will hang
> forever although, as I mentioned, I am able to interact with the image
> in other ways through this workspace.
>
> It seems like the list below isn't complete since I have a web server
> process blocked waiting for connections...but anyway the image exhibits
> this behavior with and without your Delay patch applied.
>
> David
>
>
> Process
> 2064888972 >idleProcess
> 2064858556 [] in >startUp
> 2064858648 [] in BlockContext>newProcess
>
>
>
> Process
> 2064885004 >finalizationProcess
> 2064884820 [] in >restartFinalizationProcess
> 2064884912 [] in BlockContext>newProcess
>
>
> Process
> 2085972252 Semaphore>critical:
> 2085972068 Delay>scheduleEvent
> 2085971932 Delay>schedule
> 2085971840 Delay>wait
> 2085971748 WorldState>interCyclePause:
> 2085971656 WorldState>doOneCycleFor:
> 2085971564 PasteUpMorph>doOneCycle
> 2054607980 [] in >spawnNewProcess
> 2054608164 [] in BlockContext>newProcess
> Process
> 2085972988 Semaphore>critical:
> 2085972804 Delay>scheduleEvent
> 2085972712 Delay>schedule
> 2085972620 Delay>wait
> 2085972436 [] in EventSensor>eventTickler
> 2085972344 BlockContext>on:do:
> 2064857692 EventSensor>eventTickler
> 2064857416 [] in EventSensor>installEventTickler
> 2064857600 [] in BlockContext>newProcess
> Process
> 2085973768 Semaphore>critical:
> 2085973584 Delay>scheduleEvent
> 2085973448 Delay>schedule
> 2085973356 Delay>wait
> 2085973264 [] in ApplicationService>sleepFor:
> 2085973172 >terminationOkDuring:
> 2085973080 ApplicationService>sleepFor:
> 2065158156 TdTimer>runWhile:
> 2065157788 [] in ApplicationService>start
> 2065157972 BlockContext>ensure:
> 2064894444 [] in ApplicationService>start
> 2065157696 BlockContext>on:do:
> 2065157512 BlockContext>valueWithBindingsContext:
> 2065157420 BlockContext>valueWithBindings:
> 2064894536 [] in BlockContext>newProcessWithBindings:
> 2064894628 [] in BlockContext>newProcess
> Process
> 2086064536 Semaphore>critical:
> 2086064308 Delay>scheduleEvent
> 2086064184 Delay>schedule
> 2086064092 Delay>wait
> 2086063816 [] in Semaphore>waitTimeoutMSecs:
> 2086064000 [] in BlockContext>newProcess
> Process
> 2086089116 Semaphore>critical:
> 2086088932 Delay>scheduleEvent
> 2086088796 Delay>schedule
> 2086088612 Delay>wait
> 2086088704 [] in ApplicationService>sleepFor:
> 2086088520 >terminationOkDuring:
> 2086088428 ApplicationService>sleepFor:
> 2064888880 TdTimer>runWhile:
> 2064888512 [] in ApplicationService>start
> 2064888696 BlockContext>ensure:
> 2064886188 [] in ApplicationService>start
> 2064888420 BlockContext>on:do:
> 2064888236 BlockContext>valueWithBindingsContext:
> 2064888144 BlockContext>valueWithBindings:
> 2064886280 [] in BlockContext>newProcessWithBindings:
> 2064886372 [] in BlockContext>newProcess
> Process
> 2092063728 Semaphore>critical:
> 2092063544 Delay>scheduleEvent
> 2092063408 Delay>schedule
> 2092063200 Delay>wait
> 2092063316 [] in ApplicationService>sleepFor:
> 2092063092 >terminationOkDuring:
> 2092063000 ApplicationService>sleepFor:
> 2065160244 TdTimer>runWhile:
> 2065157144 [] in ApplicationService>start
> 2065157328 BlockContext>ensure:
> 2064893660 [] in ApplicationService>start
> 2065157052 BlockContext>on:do:
> 2065156868 BlockContext>valueWithBindingsContext:
> 2065156776 BlockContext>valueWithBindings:
> 2064893752 [] in BlockContext>newProcessWithBindings:
> 2064893844 [] in BlockContext>newProcess
> Process
> 2099204904 Semaphore>critical:
> 2099204676 Delay>scheduleEvent
> 2099204552 Delay>schedule
> 2099204460 Delay>wait
> 2099187636 [] in Semaphore>waitTimeoutMSecs:
> 2099187820 [] in BlockContext>newProcess
>
> Process
> 2099550248 >handleTimerEvent
> 2059283788 [] in >runTimerEventLoop
> 2059283368 BlockContext>on:do:
> 2059283140 >runTimerEventLoop
> 2059283572 [] in >startTimerEventLoop
> 2059283664 [] in BlockContext>newProcess
> Process
> 2064857200 InputSensor>userInterruptWatcher
> 2064857016 [] in InputSensor>installInterruptWatcher
> 2064857108 [] in BlockContext>newProcess
>
>
> Process
> 2064858136 SystemDictionary>lowSpaceWatcher
> 2064858228 [] in SystemDictionary>installLowSpaceWatcher
> 2064858320 [] in BlockContext>newProcess
>
> Process
> 2086063724 Semaphore>waitTimeoutMSecs:
> 2086063632 Socket>waitForConnectionFor:ifTimedOut:
> 2086063448 Socket>waitForConnectionFor:
> 2086063172 [] in Socket>waitForAcceptFor:
> 2086063356 BlockContext>on:do:
> 2086063080 Socket>waitForAcceptFor:
> 2086062896 [] in RFBServer>runLoop
> 2086062804 BlockContext>on:do:
> 2064890492 RFBServer>runLoop
> 2064890612 [] in RFB
>
>
>
>



SemaphoreCritical-ar.1.cs (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

keith1y
In reply to this post by cdavidshaffer
David,

I concur with these observations.


Keith

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

johnmci
In reply to this post by Andreas.Raab
Mmm, I couldn't help but notice this is different code than the tweak  
code we have in sophie, even adjusting for the difference in tweak  
logic versus squeak logic.
So do you have tweak updates for Semaphore too?

Oddly these are all timestamped today, are these new? Or have been in  
use for months?


On Jul 27, 2007, at 6:27 PM, Andreas Raab wrote:

> <SemaphoreCritical-ar.1.cs>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Andreas.Raab
John M McIntosh wrote:
> Mmm, I couldn't help but notice this is different code than the tweak
> code we have in sophie, even adjusting for the difference in tweak logic
> versus squeak logic.
> So do you have tweak updates for Semaphore too?

Yes, see attachment.

> Oddly these are all timestamped today, are these new? Or have been in
> use for months?

They have new timestamps only because I had to twiddle the changes to
make them work in a non-Tweak environment.

Cheers,
   - Andreas



TweakSemaFixes.cs (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Adrian Lienhard
In reply to this post by Andreas.Raab
We have seen this exact problem as well, although not often. An  
interesteing observation was that you could bring the image back  
alive through the Seaside screenshot application...

Nice to see that these kind of production issues fixed.
Andreas, if you have more patches laying around, let us know ;)

I think it would make sense to have a wiki page to keep track of  
important fixes and their associated Mantis reports, as well as  
instructions for debugging (like gdb, VM instrumentation).

Cheers,
Adrian

On Jul 28, 2007, at 03:27 , Andreas Raab wrote:

> Ah yes, of course. You're missing another batch of fixes that we  
> have long applied to our servers. In this case it's the handling of  
> Semaphore>>critical: (which is broken in all Squeak versions). Give  
> the attached changes a try and if it still don't work I'm sure  
> there are more fixes that we've applied in the meantime ;-)
>
> Cheers,
>   - Andreas
>
> David Shaffer wrote:
>> Andreas Raab wrote:
>>> I'm pretty sure it's complete. If you want some help do this:
>>> * Launch the VM with output redirected to a file
>>> * Wait until it locks up
>>> * Attach gdb to the running process, e.g,:
>>>   gdb --pid: <pid of vm>
>>> * Have it print all the call stacks, e.g.:
>>>   p (int)printAllStacks
>>> * Look at the result output file.
>>>
>> Thanks for the gdb tip.  I can look at the processes via my  
>> Seaside server as well since it is still responding.  Anyway the  
>> debugging output is below.  The TdTimer processes are not making  
>> progress although, in one case, the sleep should only be for 60  
>> seconds.  The VNC server accepts connections but isn't responding  
>> to user input including alt-. (although the VNC cursor tracks and  
>> there is sometimes UI activity if, for example, the Transcript  
>> window is open).  If I enter (Delay forSeconds: 5) wait in  a web-
>> browser based workspace it will hang forever although, as I  
>> mentioned, I am able to interact with the image in other ways  
>> through this workspace.
>> It seems like the list below isn't complete since I have a web  
>> server process blocked waiting for connections...but anyway the  
>> image exhibits this behavior with and without your Delay patch  
>> applied.
>> David
>> Process
>> 2064888972 >idleProcess
>> 2064858556 [] in >startUp
>> 2064858648 [] in BlockContext>newProcess
>> Process
>> 2064885004 >finalizationProcess
>> 2064884820 [] in >restartFinalizationProcess
>> 2064884912 [] in BlockContext>newProcess
>> Process
>> 2085972252 Semaphore>critical:
>> 2085972068 Delay>scheduleEvent
>> 2085971932 Delay>schedule
>> 2085971840 Delay>wait
>> 2085971748 WorldState>interCyclePause:
>> 2085971656 WorldState>doOneCycleFor:
>> 2085971564 PasteUpMorph>doOneCycle
>> 2054607980 [] in >spawnNewProcess
>> 2054608164 [] in BlockContext>newProcess
>> Process
>> 2085972988 Semaphore>critical:
>> 2085972804 Delay>scheduleEvent
>> 2085972712 Delay>schedule
>> 2085972620 Delay>wait
>> 2085972436 [] in EventSensor>eventTickler
>> 2085972344 BlockContext>on:do:
>> 2064857692 EventSensor>eventTickler
>> 2064857416 [] in EventSensor>installEventTickler
>> 2064857600 [] in BlockContext>newProcess
>> Process
>> 2085973768 Semaphore>critical:
>> 2085973584 Delay>scheduleEvent
>> 2085973448 Delay>schedule
>> 2085973356 Delay>wait
>> 2085973264 [] in ApplicationService>sleepFor:
>> 2085973172 >terminationOkDuring:
>> 2085973080 ApplicationService>sleepFor:
>> 2065158156 TdTimer>runWhile:
>> 2065157788 [] in ApplicationService>start
>> 2065157972 BlockContext>ensure:
>> 2064894444 [] in ApplicationService>start
>> 2065157696 BlockContext>on:do:
>> 2065157512 BlockContext>valueWithBindingsContext:
>> 2065157420 BlockContext>valueWithBindings:
>> 2064894536 [] in BlockContext>newProcessWithBindings:
>> 2064894628 [] in BlockContext>newProcess
>> Process
>> 2086064536 Semaphore>critical:
>> 2086064308 Delay>scheduleEvent
>> 2086064184 Delay>schedule
>> 2086064092 Delay>wait
>> 2086063816 [] in Semaphore>waitTimeoutMSecs:
>> 2086064000 [] in BlockContext>newProcess
>> Process
>> 2086089116 Semaphore>critical:
>> 2086088932 Delay>scheduleEvent
>> 2086088796 Delay>schedule
>> 2086088612 Delay>wait
>> 2086088704 [] in ApplicationService>sleepFor:
>> 2086088520 >terminationOkDuring:
>> 2086088428 ApplicationService>sleepFor:
>> 2064888880 TdTimer>runWhile:
>> 2064888512 [] in ApplicationService>start
>> 2064888696 BlockContext>ensure:
>> 2064886188 [] in ApplicationService>start
>> 2064888420 BlockContext>on:do:
>> 2064888236 BlockContext>valueWithBindingsContext:
>> 2064888144 BlockContext>valueWithBindings:
>> 2064886280 [] in BlockContext>newProcessWithBindings:
>> 2064886372 [] in BlockContext>newProcess
>> Process
>> 2092063728 Semaphore>critical:
>> 2092063544 Delay>scheduleEvent
>> 2092063408 Delay>schedule
>> 2092063200 Delay>wait
>> 2092063316 [] in ApplicationService>sleepFor:
>> 2092063092 >terminationOkDuring:
>> 2092063000 ApplicationService>sleepFor:
>> 2065160244 TdTimer>runWhile:
>> 2065157144 [] in ApplicationService>start
>> 2065157328 BlockContext>ensure:
>> 2064893660 [] in ApplicationService>start
>> 2065157052 BlockContext>on:do:
>> 2065156868 BlockContext>valueWithBindingsContext:
>> 2065156776 BlockContext>valueWithBindings:
>> 2064893752 [] in BlockContext>newProcessWithBindings:
>> 2064893844 [] in BlockContext>newProcess
>> Process
>> 2099204904 Semaphore>critical:
>> 2099204676 Delay>scheduleEvent
>> 2099204552 Delay>schedule
>> 2099204460 Delay>wait
>> 2099187636 [] in Semaphore>waitTimeoutMSecs:
>> 2099187820 [] in BlockContext>newProcess
>> Process
>> 2099550248 >handleTimerEvent
>> 2059283788 [] in >runTimerEventLoop
>> 2059283368 BlockContext>on:do:
>> 2059283140 >runTimerEventLoop
>> 2059283572 [] in >startTimerEventLoop
>> 2059283664 [] in BlockContext>newProcess
>> Process
>> 2064857200 InputSensor>userInterruptWatcher
>> 2064857016 [] in InputSensor>installInterruptWatcher
>> 2064857108 [] in BlockContext>newProcess
>> Process
>> 2064858136 SystemDictionary>lowSpaceWatcher
>> 2064858228 [] in SystemDictionary>installLowSpaceWatcher
>> 2064858320 [] in BlockContext>newProcess
>> Process
>> 2086063724 Semaphore>waitTimeoutMSecs:
>> 2086063632 Socket>waitForConnectionFor:ifTimedOut:
>> 2086063448 Socket>waitForConnectionFor:
>> 2086063172 [] in Socket>waitForAcceptFor:
>> 2086063356 BlockContext>on:do:
>> 2086063080 Socket>waitForAcceptFor:
>> 2086062896 [] in RFBServer>runLoop
>> 2086062804 BlockContext>on:do:
>> 2064890492 RFBServer>runLoop
>> 2064890612 [] in RFB
>
> <SemaphoreCritical-ar.1.cs>


Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

cdavidshaffer
In reply to this post by Andreas.Raab
Andreas Raab wrote:
> Ah yes, of course. You're missing another batch of fixes that we have
> long applied to our servers. In this case it's the handling of
> Semaphore>>critical: (which is broken in all Squeak versions). Give
> the attached changes a try and if it still don't work I'm sure there
> are more fixes that we've applied in the meantime ;-)
>
> Cheers,
>   - Andreas
>
Well it's been about 12 hours since I've patched and everything seems to
be chugging along.  I'll stress it a little this afternoon to be sure.  
Thanks for the help!

David


Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Andreas.Raab
In reply to this post by Adrian Lienhard
Adrian Lienhard wrote:
> Nice to see that these kind of production issues fixed.
> Andreas, if you have more patches laying around, let us know ;)

You may want to check out the Croquet repositories[1][2]. We've posted
quite a few changes there that helped with general robustness issues.
The last round [3] had various interesting fixes some that helped with
reliability in general (like the handling of out-of-memory conditions).

[1] http://hedgehog.software.umn.edu:8888/
[2] http://jabberwocky.croquetproject.org:8889/
[3] https://lists.duke.edu/sympa/arc/croquet-dev/2007-05/msg00035.html

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Serge Stinckwich-4
Andreas Raab a écrit :

> Adrian Lienhard wrote:
>> Nice to see that these kind of production issues fixed.
>> Andreas, if you have more patches laying around, let us know ;)
>
> You may want to check out the Croquet repositories[1][2]. We've posted
> quite a few changes there that helped with general robustness issues.
> The last round [3] had various interesting fixes some that helped with
> reliability in general (like the handling of out-of-memory conditions).
>
> [1] http://hedgehog.software.umn.edu:8888/
> [2] http://jabberwocky.croquetproject.org:8889/
> [3] https://lists.duke.edu/sympa/arc/croquet-dev/2007-05/msg00035.html

Great ! Maybe we could reuse some of your modifications in the Squeak
packages.

-- Serge Stinckwich
http://doesnotunderstand.free.fr/


Reply | Threaded
Open this post in threaded view
|

RE: Delay and Server reliability

Jason Johnson-5
In reply to this post by Andreas.Raab
It will be really nice if these changes can find their way back into the base image.

> Date: Sat, 28 Jul 2007 12:00:18 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: Re: Delay and Server reliability
>
> Adrian Lienhard wrote:
> > Nice to see that these kind of production issues fixed.
> > Andreas, if you have more patches laying around, let us know ;)
>
> You may want to check out the Croquet repositories[1][2]. We've posted
> quite a few changes there that helped with general robustness issues.
> The last round [3] had various interesting fixes some that helped with
> reliability in general (like the handling of out-of-memory conditions).
>
> [1] http://hedgehog.software.umn.edu:8888/
> [2] http://jabberwocky.croquetproject.org:8889/
> [3] https://lists.duke.edu/sympa/arc/croquet-dev/2007-05/msg00035.html
>
> Cheers,
> - Andreas
>


Don't get caught with egg on your face.    Play Chicktionary! 

Reply | Threaded
Open this post in threaded view
|

Re: Delay and Server reliability

Adrian Lienhard
In reply to this post by cdavidshaffer
I created a Mantis report for this bug here: http://bugs.squeak.org/ 
view.php?id=6588

I suggest to close report "0004882: VM lockup" (http://
bugs.squeak.org/view.php?id=4882). The problems described in it seem  
to be either this bug (#6588), or the other freezing bug http://
bugs.squeak.org/view.php?id=6581.

Cheers,
Adrian


On Jul 28, 2007, at 17:04 , David Shaffer wrote:

> Andreas Raab wrote:
>> Ah yes, of course. You're missing another batch of fixes that we  
>> have long applied to our servers. In this case it's the handling  
>> of Semaphore>>critical: (which is broken in all Squeak versions).  
>> Give the attached changes a try and if it still don't work I'm  
>> sure there are more fixes that we've applied in the meantime ;-)
>>
>> Cheers,
>>   - Andreas
>>
> Well it's been about 12 hours since I've patched and everything  
> seems to be chugging along.  I'll stress it a little this afternoon  
> to be sure.  Thanks for the help!
>
> David
>
>


12