Image freeze because handleTimerEvent and Seaside process gone?!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Image freeze because handleTimerEvent and Seaside process gone?!

Adrian Lienhard

Hi all,

We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days.

Here some basic information about our setup:

Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
PharoCore 1.1
OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)

- We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
- We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
- VM does not hog the CPU and memory usage is normal
- The meantime between failure is several weeks and we haven't managed to reproduce the problem
- The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
- lsof shows
        TCP *:9093 (LISTEN)
        TCP server:9093->server:46930 (CLOSE_WAIT)

Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):

=============================================================
(gdb) bt
#0  0x08072020 in ?? ()
#1  <signal handler called>
#2  0xb766f5e0 in malloc () from /lib/libc.so.6
#3  <function called from gdb>
#4  0xb76c50c8 in select () from /lib/libc.so.6
#5  0x08071063 in aioPoll ()
#6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
#7  0x000003e8 in ?? ()
#8  0x997b5a34 in ?? ()
#9  0xbfe7cb28 in ?? ()
#10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
Backtrace stopped: frame did not save the PC

(gdb) call printCallStack()
-1719969228 >idleProcess
-1719969320 >startUp
-1740134028 BlockClosure>newProcess
$3 = -1755344892

(gdb) call (int) printAllStacks()
Process
-1719969228 >idleProcess
-1719969320 >startUp
-1740134028 BlockClosure>newProcess

Process
-1740113860 >finalizationProcess
-1740113952 >restartFinalizationProcess
-1740113532 BlockClosure>newProcess

Process
-1740134424 SmalltalkImage>lowSpaceWatcher
-1740134516 SmalltalkImage>installLowSpaceWatcher
-1740134300 BlockClosure>newProcess

Process
-1719451488 Delay>wait
-1719451580 BlockClosure>ifCurtailed:
-1719451704 Delay>wait
-1719451796 InputEventPollingFetcher>waitForInput
-1740126940 InputEventFetcher>eventLoop
-1740127032 InputEventFetcher>installEventLoop
-1740126816 BlockClosure>newProcess

Process
-1719557780 UnixOSProcessAccessor>grimReaperProcess
-1740113624 BlockClosure>repeat
-1740113716 UnixOSProcessAccessor>grimReaperProcess
-1740117340 BlockClosure>newProcess

[omitted many newlines between output above]
=============================================================

What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?

This may be related to Pharo or to the Squeak VM.

Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!

Cheers,
Adrian


CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list


Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

David T. Lewis
 
Have a look at /proc/<vmpid>/fd/* for a VM process that has been
running for a while, and check for accumulation of open file handles
over time. If sockets or files are not being closed, the open file
handles can accumulate over time and eventually hit whatever
per-process limit is in place (typically 1024 per process). I could
imagine this leading to a condition where the process that is trying
to open new session requests would fail.

If this turns out to be the case, check what your are doing with
OSProcess, as it's quite easy to e.g. use a PipeableOSProcess and
forget to close the output pipes when done.

Dave


On Mon, Dec 06, 2010 at 11:55:55AM +0100, Adrian Lienhard wrote:

>
> Hi all,
>
> We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days.
>
> Here some basic information about our setup:
>
> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
> PharoCore 1.1
> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
>
> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
> - VM does not hog the CPU and memory usage is normal
> - The meantime between failure is several weeks and we haven't managed to reproduce the problem
> - The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
> - lsof shows
> TCP *:9093 (LISTEN)
> TCP server:9093->server:46930 (CLOSE_WAIT)
>
> Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):
>
> =============================================================
> (gdb) bt
> #0  0x08072020 in ?? ()
> #1  <signal handler called>
> #2  0xb766f5e0 in malloc () from /lib/libc.so.6
> #3  <function called from gdb>
> #4  0xb76c50c8 in select () from /lib/libc.so.6
> #5  0x08071063 in aioPoll ()
> #6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
> #7  0x000003e8 in ?? ()
> #8  0x997b5a34 in ?? ()
> #9  0xbfe7cb28 in ?? ()
> #10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
> Backtrace stopped: frame did not save the PC
>
> (gdb) call printCallStack()
> -1719969228 >idleProcess
> -1719969320 >startUp
> -1740134028 BlockClosure>newProcess
> $3 = -1755344892
>
> (gdb) call (int) printAllStacks()
> Process
> -1719969228 >idleProcess
> -1719969320 >startUp
> -1740134028 BlockClosure>newProcess
>
> Process
> -1740113860 >finalizationProcess
> -1740113952 >restartFinalizationProcess
> -1740113532 BlockClosure>newProcess
>
> Process
> -1740134424 SmalltalkImage>lowSpaceWatcher
> -1740134516 SmalltalkImage>installLowSpaceWatcher
> -1740134300 BlockClosure>newProcess
>
> Process
> -1719451488 Delay>wait
> -1719451580 BlockClosure>ifCurtailed:
> -1719451704 Delay>wait
> -1719451796 InputEventPollingFetcher>waitForInput
> -1740126940 InputEventFetcher>eventLoop
> -1740127032 InputEventFetcher>installEventLoop
> -1740126816 BlockClosure>newProcess
>
> Process
> -1719557780 UnixOSProcessAccessor>grimReaperProcess
> -1740113624 BlockClosure>repeat
> -1740113716 UnixOSProcessAccessor>grimReaperProcess
> -1740117340 BlockClosure>newProcess
>
> [omitted many newlines between output above]
> =============================================================
>
> What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?
>
> This may be related to Pharo or to the Squeak VM.
>
> Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!
>
> Cheers,
> Adrian
>
>
> CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list
>
Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Adrian Lienhard

Thanks for the response, David

I checked .../fd/* when the image was frozen and there were only 7 or so open file handles. Nothing suspicious there...

Cheers,
Adrian

On Dec 6, 2010, at 12:54 , David T. Lewis wrote:

>
> Have a look at /proc/<vmpid>/fd/* for a VM process that has been
> running for a while, and check for accumulation of open file handles
> over time. If sockets or files are not being closed, the open file
> handles can accumulate over time and eventually hit whatever
> per-process limit is in place (typically 1024 per process). I could
> imagine this leading to a condition where the process that is trying
> to open new session requests would fail.
>
> If this turns out to be the case, check what your are doing with
> OSProcess, as it's quite easy to e.g. use a PipeableOSProcess and
> forget to close the output pipes when done.
>
> Dave
>
>
> On Mon, Dec 06, 2010 at 11:55:55AM +0100, Adrian Lienhard wrote:
>>
>> Hi all,
>>
>> We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days.
>>
>> Here some basic information about our setup:
>>
>> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
>> PharoCore 1.1
>> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
>>
>> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
>> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
>> - VM does not hog the CPU and memory usage is normal
>> - The meantime between failure is several weeks and we haven't managed to reproduce the problem
>> - The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
>> - lsof shows
>> TCP *:9093 (LISTEN)
>> TCP server:9093->server:46930 (CLOSE_WAIT)
>>
>> Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):
>>
>> =============================================================
>> (gdb) bt
>> #0  0x08072020 in ?? ()
>> #1  <signal handler called>
>> #2  0xb766f5e0 in malloc () from /lib/libc.so.6
>> #3  <function called from gdb>
>> #4  0xb76c50c8 in select () from /lib/libc.so.6
>> #5  0x08071063 in aioPoll ()
>> #6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
>> #7  0x000003e8 in ?? ()
>> #8  0x997b5a34 in ?? ()
>> #9  0xbfe7cb28 in ?? ()
>> #10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
>> Backtrace stopped: frame did not save the PC
>>
>> (gdb) call printCallStack()
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>> $3 = -1755344892
>>
>> (gdb) call (int) printAllStacks()
>> Process
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>>
>> Process
>> -1740113860 >finalizationProcess
>> -1740113952 >restartFinalizationProcess
>> -1740113532 BlockClosure>newProcess
>>
>> Process
>> -1740134424 SmalltalkImage>lowSpaceWatcher
>> -1740134516 SmalltalkImage>installLowSpaceWatcher
>> -1740134300 BlockClosure>newProcess
>>
>> Process
>> -1719451488 Delay>wait
>> -1719451580 BlockClosure>ifCurtailed:
>> -1719451704 Delay>wait
>> -1719451796 InputEventPollingFetcher>waitForInput
>> -1740126940 InputEventFetcher>eventLoop
>> -1740127032 InputEventFetcher>installEventLoop
>> -1740126816 BlockClosure>newProcess
>>
>> Process
>> -1719557780 UnixOSProcessAccessor>grimReaperProcess
>> -1740113624 BlockClosure>repeat
>> -1740113716 UnixOSProcessAccessor>grimReaperProcess
>> -1740117340 BlockClosure>newProcess
>>
>> [omitted many newlines between output above]
>> =============================================================
>>
>> What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?
>>
>> This may be related to Pharo or to the Squeak VM.
>>
>> Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!
>>
>> Cheers,
>> Adrian
>>
>>
>> CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list
>>

Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Andreas.Raab
In reply to this post by Adrian Lienhard
 
At a guess, I'd say it's either one of two issues:

1) Your STOP/CONT handling. This sounds suspicious and it could affect
the timer handling. I'm assuming that the issue happens after receiving
the CONT signal, no? If you can, you might want to a) make sure that you
only get the STOP signal when the VM is in ioRelinquish() and not (for
example) currently executing the delay process and b) consider to dump
the call stacks whenever the VM gets the CONT signal to see what the
status is.

2) Some set of incomplete process/delay/semaphore changes in Pharo. One
of the problems with processes and delays is that this part of the
system reacts very badly to random "cleaning". I.e., changing "foo ==
nil" to "foo isNil" can have dramatic effects (since it introduces a
suspension point) with just the kind of weird issue you're seeing.

With regards to these processes not being printed, that's a side effect
of how printAllStacks gathers the processes - it will not print
suspended processes which explains why the UI process doesn't print and
most likely handleTimerEvent is suspended in a debugger.

Depending on how important this issue is you can also try to dissect the
object memory itself. If you call writeImageFile (or is it
writeImageFileIO?) from gdb it will dump the .image file and you can use
the simulator to look at it more closely. Most likely you'll be able to
find the processes and look at their stacks.

Cheers,
   - Andreas

On 12/6/2010 2:55 AM, Adrian Lienhard wrote:

>
> Hi all,
>
> We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days.
>
> Here some basic information about our setup:
>
> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
> PharoCore 1.1
> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
>
> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
> - VM does not hog the CPU and memory usage is normal
> - The meantime between failure is several weeks and we haven't managed to reproduce the problem
> - The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
> - lsof shows
> TCP *:9093 (LISTEN)
> TCP server:9093->server:46930 (CLOSE_WAIT)
>
> Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):
>
> =============================================================
> (gdb) bt
> #0  0x08072020 in ?? ()
> #1<signal handler called>
> #2  0xb766f5e0 in malloc () from /lib/libc.so.6
> #3<function called from gdb>
> #4  0xb76c50c8 in select () from /lib/libc.so.6
> #5  0x08071063 in aioPoll ()
> #6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
> #7  0x000003e8 in ?? ()
> #8  0x997b5a34 in ?? ()
> #9  0xbfe7cb28 in ?? ()
> #10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
> Backtrace stopped: frame did not save the PC
>
> (gdb) call printCallStack()
> -1719969228>idleProcess
> -1719969320>startUp
> -1740134028 BlockClosure>newProcess
> $3 = -1755344892
>
> (gdb) call (int) printAllStacks()
> Process
> -1719969228>idleProcess
> -1719969320>startUp
> -1740134028 BlockClosure>newProcess
>
> Process
> -1740113860>finalizationProcess
> -1740113952>restartFinalizationProcess
> -1740113532 BlockClosure>newProcess
>
> Process
> -1740134424 SmalltalkImage>lowSpaceWatcher
> -1740134516 SmalltalkImage>installLowSpaceWatcher
> -1740134300 BlockClosure>newProcess
>
> Process
> -1719451488 Delay>wait
> -1719451580 BlockClosure>ifCurtailed:
> -1719451704 Delay>wait
> -1719451796 InputEventPollingFetcher>waitForInput
> -1740126940 InputEventFetcher>eventLoop
> -1740127032 InputEventFetcher>installEventLoop
> -1740126816 BlockClosure>newProcess
>
> Process
> -1719557780 UnixOSProcessAccessor>grimReaperProcess
> -1740113624 BlockClosure>repeat
> -1740113716 UnixOSProcessAccessor>grimReaperProcess
> -1740117340 BlockClosure>newProcess
>
> [omitted many newlines between output above]
> =============================================================
>
> What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?
>
> This may be related to Pharo or to the Squeak VM.
>
> Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!
>
> Cheers,
> Adrian
>
>
> CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

David T. Lewis
 
On Mon, Dec 06, 2010 at 12:33:59PM -0800, Andreas Raab wrote:

>
> At a guess, I'd say it's either one of two issues:
>
> 1) Your STOP/CONT handling. This sounds suspicious and it could affect
> the timer handling. I'm assuming that the issue happens after receiving
> the CONT signal, no? If you can, you might want to a) make sure that you
> only get the STOP signal when the VM is in ioRelinquish() and not (for
> example) currently executing the delay process and b) consider to dump
> the call stacks whenever the VM gets the CONT signal to see what the
> status is.
>
> 2) Some set of incomplete process/delay/semaphore changes in Pharo. One
> of the problems with processes and delays is that this part of the
> system reacts very badly to random "cleaning". I.e., changing "foo ==
> nil" to "foo isNil" can have dramatic effects (since it introduces a
> suspension point) with just the kind of weird issue you're seeing.

Actually #2 does seem like a likely culprit. I found a Pharo 1.1 image
and loaded the CommandShell and OSProcess test suites. The CommandShell
tests put a heavy load on process switching, and are rather timing
dependent. On Pharo 1.1 I get intermittent and non-reproducible errors
and test failures, and I can't get a clean run of the test suite. The
errors seem to be different each time.

On Pharo 1.1.1 and 1.2 I can get clean runs of the CommandShell/OSProcess
tests, so I think there must be some issues in Pharo 1.1. If you are
using PharoCore 1.1 now and have the option of moving to Pharo 1.1.1
or 1.2, I suspect you may see the problems go away.

Dave


>
> With regards to these processes not being printed, that's a side effect
> of how printAllStacks gathers the processes - it will not print
> suspended processes which explains why the UI process doesn't print and
> most likely handleTimerEvent is suspended in a debugger.
>
> Depending on how important this issue is you can also try to dissect the
> object memory itself. If you call writeImageFile (or is it
> writeImageFileIO?) from gdb it will dump the .image file and you can use
> the simulator to look at it more closely. Most likely you'll be able to
> find the processes and look at their stacks.
>
> Cheers,
>   - Andreas
>
> On 12/6/2010 2:55 AM, Adrian Lienhard wrote:
> >
> >Hi all,
> >
> >We've been experiencing an "interesting" problem: the image freezes and
> >does not response to HTTP requests anymore after it has been running for
> >days.
> >
> >Here some basic information about our setup:
> >
> >Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
> >PharoCore 1.1
> >OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
> >
> >- We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9
> >on the identical machine and with the same application source (modulo some
> >adaptations to make it run on Pharo).
> >- We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the
> >UI process is suspended (Project uiProcess suspend)
> >- VM does not hog the CPU and memory usage is normal
> >- The meantime between failure is several weeks and we haven't managed to
> >reproduce the problem
> >- The application mainly serves HTTP requests. When the image does not
> >receive requests for some time we send it a STOP signal, when a request
> >comes in it is sent a CONT signal.
> >- lsof shows
> > TCP *:9093 (LISTEN)
> > TCP server:9093->server:46930 (CLOSE_WAIT)
> >
> >Below is a GDB backtrace and the Smalltalk stacks from an image that was
> >frozen (the VM had been running for almost 100 hours):
> >
> >=============================================================
> >(gdb) bt
> >#0  0x08072020 in ?? ()
> >#1<signal handler called>
> >#2  0xb766f5e0 in malloc () from /lib/libc.so.6
> >#3<function called from gdb>
> >#4  0xb76c50c8 in select () from /lib/libc.so.6
> >#5  0x08071063 in aioPoll ()
> >#6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
> >#7  0x000003e8 in ?? ()
> >#8  0x997b5a34 in ?? ()
> >#9  0xbfe7cb28 in ?? ()
> >#10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
> >Backtrace stopped: frame did not save the PC
> >
> >(gdb) call printCallStack()
> >-1719969228>idleProcess
> >-1719969320>startUp
> >-1740134028 BlockClosure>newProcess
> >$3 = -1755344892
> >
> >(gdb) call (int) printAllStacks()
> >Process
> >-1719969228>idleProcess
> >-1719969320>startUp
> >-1740134028 BlockClosure>newProcess
> >
> >Process
> >-1740113860>finalizationProcess
> >-1740113952>restartFinalizationProcess
> >-1740113532 BlockClosure>newProcess
> >
> >Process
> >-1740134424 SmalltalkImage>lowSpaceWatcher
> >-1740134516 SmalltalkImage>installLowSpaceWatcher
> >-1740134300 BlockClosure>newProcess
> >
> >Process
> >-1719451488 Delay>wait
> >-1719451580 BlockClosure>ifCurtailed:
> >-1719451704 Delay>wait
> >-1719451796 InputEventPollingFetcher>waitForInput
> >-1740126940 InputEventFetcher>eventLoop
> >-1740127032 InputEventFetcher>installEventLoop
> >-1740126816 BlockClosure>newProcess
> >
> >Process
> >-1719557780 UnixOSProcessAccessor>grimReaperProcess
> >-1740113624 BlockClosure>repeat
> >-1740113716 UnixOSProcessAccessor>grimReaperProcess
> >-1740117340 BlockClosure>newProcess
> >
> >[omitted many newlines between output above]
> >=============================================================
> >
> >What is striking from the above process listing is that two processes are
> >missing: the handleTimerEvent process and the Seaside process (that is,
> >the TCP listener loop). How comes these processes vanished?
> >
> >This may be related to Pharo or to the Squeak VM.
> >
> >Has anybody else seen this problem? Any idea how to debug/fix this issue
> >is very much appreciated!
> >
> >Cheers,
> >Adrian
> >
> >
> >CCed to pharo-dev since this may be related to Pharo; please respond on
> >the squeak-vm list
> >
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Fwd: [Vm-dev] Image freeze because handleTimerEvent and Seaside process gone?!

Eliot Miranda-2
 


On Tue, Dec 7, 2010 at 3:41 AM, Lukas Renggli <[hidden email]> wrote:
> One change that I don't understand, although it probably is unrelated, is in [2]:
>
> LargePositiveInteger removeSelector: #=!
> LargePositiveInteger removeSelector: #bitAnd:!
> LargePositiveInteger removeSelector: #bitOr:!
> LargePositiveInteger removeSelector: #bitShift:!
> LargePositiveInteger removeSelector: #bitXor:!
> LargePositiveInteger removeSelector: #'~='!
>
> Why would one want to remove these primitive calls from large integers?

AFAIK, Cog needs those.

That's right.  Those methods used SmallInteger primitives (7, 8, 14, 15, 16 & 17) and in Cog SmallInteger primitives only work on SmallIntegers, not on up to 64-bit LargeIntegers; i.e. they don't bother to test the receiver for being a SmallInteger and hence crash if installed on LargeInteger.  I probably made a mistake in deciding to save a tag test, but I did.  Instead these methods should use the relevant LargeInteger primitives.  The issue is I think those are missing in the standard VM.  See primitives 27 28 34 35 36 & 37 primitiveEqualLargeIntegers, primitiveNotEqualLargeIntegers, primitiveBitAndLargeIntegers, primitiveBitOrLargeIntegers, primitiveBitXorLargeIntegers, primitiveBitShiftLargeIntegers in the Cog VM and perhaps the standard VM.  If they're missing in teh standard VM it should be fine to redefine those methods to include the 27 28 34 35 36 & 37 primitive numbers.  If they're defined differently someone needs to check that using those numbers works in the standard VM.

HTH
Eliot

Lukas

--
Lukas Renggli
www.lukas-renggli.ch


Reply | Threaded
Open this post in threaded view
|

LargePositiveInteger prims (was: [Pharo-project] Fwd: [Vm-dev] Image freeze because handleTimerEvent and Seaside process gone?!)

David T. Lewis
 
On Tue, Dec 07, 2010 at 10:47:39AM -0800, Eliot Miranda wrote:

>  
> On Tue, Dec 7, 2010 at 3:41 AM, Lukas Renggli <[hidden email]> wrote:
>
> > > One change that I don't understand, although it probably is unrelated, is
> > in [2]:
> > >
> > > LargePositiveInteger removeSelector: #=!
> > > LargePositiveInteger removeSelector: #bitAnd:!
> > > LargePositiveInteger removeSelector: #bitOr:!
> > > LargePositiveInteger removeSelector: #bitShift:!
> > > LargePositiveInteger removeSelector: #bitXor:!
> > > LargePositiveInteger removeSelector: #'~='!
> > >
> > > Why would one want to remove these primitive calls from large integers?
> >
> > AFAIK, Cog needs those.
> >
>
> That's right.  Those methods used SmallInteger primitives (7, 8, 14, 15, 16
> & 17) and in Cog SmallInteger primitives only work on SmallIntegers, not on
> up to 64-bit LargeIntegers; i.e. they don't bother to test the receiver for
> being a SmallInteger and hence crash if installed on LargeInteger.  I
> probably made a mistake in deciding to save a tag test, but I did.  Instead
> these methods should use the relevant LargeInteger primitives.  The issue is
> I think those are missing in the standard VM.  See primitives 27 28 34 35 36
> &
> 37 primitiveEqualLargeIntegers, primitiveNotEqualLargeIntegers,
> primitiveBitAndLargeIntegers, primitiveBitOrLargeIntegers,
> primitiveBitXorLargeIntegers, primitiveBitShiftLargeIntegers
> in the Cog VM and perhaps the standard VM.  If they're missing in teh
> standard VM it should be fine to redefine those methods to include the 27 28
> 34 35 36 & 37 primitive numbers.  If they're defined differently someone
> needs to check that using those numbers works in the standard VM.

r.e. saving the tag test on primitives 7, 8, 14, 15, 16, and 17:

Actually, this does have a significant (2X) performance hit for
LargePositiveInteger values in the 32-bit range for the interpreter
VM.  I ran the test below with a trunk image, then repeated the test
after restoring the original bindings in LargePositiveInteger for #=,
#~=, #bitAnd:, #bitOr:, #bitShift: and #bitXor:. The results show a
better than 2X improvement for large positive integers within the
32-bit range of values with the primitive bindings restored.

Conclusion: It would be good if the tag check could be added to Cog,
and the LargePositiveInteger primitive bindings should be restored at
some future time when this change is in general circulation for all VMs.

Note: The LargeInteger primitives 27, 28, 34, 35, 35, 36 & 37 are now
added to the interpreter VM but were not used in this test. These
primitives use 64-bit values and I would therefore not expect them
to be as fast as the 32-bit primitives 7, 8 ... 17 but I did not
actually test this to confirm.

- Dave

To verify the above:

large32Ints := 16r40000000 to: 16r41000000.
testBlock := [ | t1 t2 |
        t1 := Time primMillisecondClock.
        large32Ints do: [:e |
        e = e.
        e bitAnd: e.
        e bitOr: e.
        e bitShift: 1.
        e bitXor: e.
        e ~= e.
        t2 := Time primMillisecondClock].
        t2 - t1].

"Squeak trunk (large int prim bindings removed:"
testBlock value ==> 69370

"After restoring original large int prim bindings"
testBlock value ==> 30560


Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Adrian Lienhard
In reply to this post by David T. Lewis

Thanks Andreas and David for the responses!

In the meantime I've gathered more information. From the mail of Andreas I assumed that the most likely reason for the freeze is that the timer event loop throws an unhandled exception and therefore gets suspended.

So I added a guard to catch any error in handleTimerEvent, restart the loop, and then pass the exception to open a debugger:
 
runTimerEventLoop
        [RunTimerEventLoop] whileTrue: [
                [ self handleTimerEvent ]
                        on: Error
                        do: [ :e |
                                self startTimerEventLoop.
                                ...write a warning to stdout...
                                e pass ] ]

And voila, after 10 days or so I got the stack trace below.

I haven't had time to dive into it, but from the stack it seems like a concurrency issue in linked list (although I wonder whether that's possible since the timer event loop runs at the highest priority...).

Maybe something catches somebody's eye.

Cheers,
Adrian


THERE_BE_DRAGONS_HERE
Error: no such method!
30 December 2010 10:32:28 pm

VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest update: #7179]
Image: Pharo1.1 [Latest update: #11410]

Semaphore(Object)>>error:
        Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Arguments and temporary variables:
                t1: 'no such method!'
        Receiver's instance variables:
                firstLink: a Process in [] in DelayWaitTimeout>>wait
                lastLink: a Process in [] in DelayWaitTimeout>>wait
                excessSignals: 0


[] in Semaphore(LinkedList)>>removeLink:
        Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Arguments and temporary variables:

        Receiver's instance variables:
                firstLink: a Process in [] in DelayWaitTimeout>>wait
                lastLink: a Process in [] in DelayWaitTimeout>>wait
                excessSignals: 0


Semaphore(LinkedList)>>removeLink:ifAbsent:
        Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Arguments and temporary variables:
                aLink: a Process in [] in DelayWaitTimeout>>wait
                aBlock: [self error: 'no such method!']
                tempLink: nil
        Receiver's instance variables:
                firstLink: a Process in [] in DelayWaitTimeout>>wait
                lastLink: a Process in [] in DelayWaitTimeout>>wait
                excessSignals: 0


Semaphore(LinkedList)>>removeLink:
        Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Arguments and temporary variables:
                aLink: a Process in [] in DelayWaitTimeout>>wait
        Receiver's instance variables:
                firstLink: a Process in [] in DelayWaitTimeout>>wait
                lastLink: a Process in [] in DelayWaitTimeout>>wait
                excessSignals: 0


Semaphore(LinkedList)>>remove:ifAbsent:
        Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Arguments and temporary variables:
                aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
                aBlock: []
                link: a Process in [] in DelayWaitTimeout>>wait
        Receiver's instance variables:
                firstLink: a Process in [] in DelayWaitTimeout>>wait
                lastLink: a Process in [] in DelayWaitTimeout>>wait
                excessSignals: 0


Process>>suspend
        Receiver: a Process in [] in DelayWaitTimeout>>wait
        Arguments and temporary variables:
                t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
        Receiver's instance variables:
                nextLink: nil
                suspendedContext: [] in DelayWaitTimeout>>wait
                priority: 30
                myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
                errorHandler: nil
                name: 'seaside'
                env: nil


DelayWaitTimeout>>signalWaitingProcess
        Receiver: a DelayWaitTimeout(10000 msecs)
        Arguments and temporary variables:

        Receiver's instance variables:
                delayDuration: 10000
                resumptionTime: 217048389
                delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
                beingWaitedOn: false
                process: a Process in [] in DelayWaitTimeout>>wait
                expired: true


Delay class>>handleTimerEvent
        Receiver: Delay
        Arguments and temporary variables:
                t1: 217128602
                t2: nil
        Receiver's instance variables:
                superclass: Object
                methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
                format: 138
                instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
                organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
                subclasses: {MonitorDelay. DelayWaitTimeout}
                name: #Delay
                classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
                sharedPools: nil
                environment: a SystemDictionary(lots of globals)
                category: #'Kernel-Processes'
                traitComposition: {}
                localSelectors: nil


[] in Delay class>>runTimerEventLoop
        Receiver: Delay
        Arguments and temporary variables:

        Receiver's instance variables:
                superclass: Object
                methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
                format: 138
                instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
                organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
                subclasses: {MonitorDelay. DelayWaitTimeout}
                name: #Delay
                classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
                sharedPools: nil
                environment: a SystemDictionary(lots of globals)
                category: #'Kernel-Processes'
                traitComposition: {}
                localSelectors: nil


BlockClosure>>on:do:
        Receiver: [self handleTimerEvent]
        Arguments and temporary variables:
                exception: Error
                handlerAction: [:e |
self startTimerEventLoop.
        FileStream
                fileNamed: '/dev/...etc...
                handlerActive: false
        Receiver's instance variables:
                outerContext: Delay class>>runTimerEventLoop
                startpc: 108
                numArgs: 0


Delay class>>runTimerEventLoop
        Receiver: Delay
        Arguments and temporary variables:

        Receiver's instance variables:
                superclass: Object
                methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
                format: 138
                instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
                organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
                subclasses: {MonitorDelay. DelayWaitTimeout}
                name: #Delay
                classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
                sharedPools: nil
                environment: a SystemDictionary(lots of globals)
                category: #'Kernel-Processes'
                traitComposition: {}
                localSelectors: nil


[] in Delay class>>startTimerEventLoop
        Receiver: Delay
        Arguments and temporary variables:

        Receiver's instance variables:
                superclass: Object
                methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
                format: 138
                instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
                organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
                subclasses: {MonitorDelay. DelayWaitTimeout}
                name: #Delay
                classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
                sharedPools: nil
                environment: a SystemDictionary(lots of globals)
                category: #'Kernel-Processes'
                traitComposition: {}
                localSelectors: nil


[] in BlockClosure>>newProcess
        Receiver: [self runTimerEventLoop]
        Arguments and temporary variables:

        Receiver's instance variables:
                outerContext: Delay class>>startTimerEventLoop
                startpc: 144
                numArgs: 0



--- The full stack ---
Semaphore(Object)>>error:
[] in Semaphore(LinkedList)>>removeLink:
Semaphore(LinkedList)>>removeLink:ifAbsent:
Semaphore(LinkedList)>>removeLink:
Semaphore(LinkedList)>>remove:ifAbsent:
Process>>suspend
DelayWaitTimeout>>signalWaitingProcess
Delay class>>handleTimerEvent
[] in Delay class>>runTimerEventLoop
BlockClosure>>on:do:
Delay class>>runTimerEventLoop
[] in Delay class>>startTimerEventLoop
[] in BlockClosure>>newProcess
------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Andreas.Raab
 
Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and
your problems will go away.

Cheers,
   - Andreas

On 12/30/2010 11:50 PM, Adrian Lienhard wrote:

>
> Thanks Andreas and David for the responses!
>
> In the meantime I've gathered more information. From the mail of Andreas I assumed that the most likely reason for the freeze is that the timer event loop throws an unhandled exception and therefore gets suspended.
>
> So I added a guard to catch any error in handleTimerEvent, restart the loop, and then pass the exception to open a debugger:
>
> runTimerEventLoop
> [RunTimerEventLoop] whileTrue: [
> [ self handleTimerEvent ]
> on: Error
> do: [ :e |
> self startTimerEventLoop.
> ...write a warning to stdout...
> e pass ] ]
>
> And voila, after 10 days or so I got the stack trace below.
>
> I haven't had time to dive into it, but from the stack it seems like a concurrency issue in linked list (although I wonder whether that's possible since the timer event loop runs at the highest priority...).
>
> Maybe something catches somebody's eye.
>
> Cheers,
> Adrian
>
>
> THERE_BE_DRAGONS_HERE
> Error: no such method!
> 30 December 2010 10:32:28 pm
>
> VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest update: #7179]
> Image: Pharo1.1 [Latest update: #11410]
>
> Semaphore(Object)>>error:
> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Arguments and temporary variables:
> t1: 'no such method!'
> Receiver's instance variables:
> firstLink: a Process in [] in DelayWaitTimeout>>wait
> lastLink: a Process in [] in DelayWaitTimeout>>wait
> excessSignals: 0
>
>
> [] in Semaphore(LinkedList)>>removeLink:
> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Arguments and temporary variables:
>
> Receiver's instance variables:
> firstLink: a Process in [] in DelayWaitTimeout>>wait
> lastLink: a Process in [] in DelayWaitTimeout>>wait
> excessSignals: 0
>
>
> Semaphore(LinkedList)>>removeLink:ifAbsent:
> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Arguments and temporary variables:
> aLink: a Process in [] in DelayWaitTimeout>>wait
> aBlock: [self error: 'no such method!']
> tempLink: nil
> Receiver's instance variables:
> firstLink: a Process in [] in DelayWaitTimeout>>wait
> lastLink: a Process in [] in DelayWaitTimeout>>wait
> excessSignals: 0
>
>
> Semaphore(LinkedList)>>removeLink:
> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Arguments and temporary variables:
> aLink: a Process in [] in DelayWaitTimeout>>wait
> Receiver's instance variables:
> firstLink: a Process in [] in DelayWaitTimeout>>wait
> lastLink: a Process in [] in DelayWaitTimeout>>wait
> excessSignals: 0
>
>
> Semaphore(LinkedList)>>remove:ifAbsent:
> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Arguments and temporary variables:
> aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
> aBlock: []
> link: a Process in [] in DelayWaitTimeout>>wait
> Receiver's instance variables:
> firstLink: a Process in [] in DelayWaitTimeout>>wait
> lastLink: a Process in [] in DelayWaitTimeout>>wait
> excessSignals: 0
>
>
> Process>>suspend
> Receiver: a Process in [] in DelayWaitTimeout>>wait
> Arguments and temporary variables:
> t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> Receiver's instance variables:
> nextLink: nil
> suspendedContext: [] in DelayWaitTimeout>>wait
> priority: 30
> myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> errorHandler: nil
> name: 'seaside'
> env: nil
>
>
> DelayWaitTimeout>>signalWaitingProcess
> Receiver: a DelayWaitTimeout(10000 msecs)
> Arguments and temporary variables:
>
> Receiver's instance variables:
> delayDuration: 10000
> resumptionTime: 217048389
> delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
> beingWaitedOn: false
> process: a Process in [] in DelayWaitTimeout>>wait
> expired: true
>
>
> Delay class>>handleTimerEvent
> Receiver: Delay
> Arguments and temporary variables:
> t1: 217128602
> t2: nil
> Receiver's instance variables:
> superclass: Object
> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
> format: 138
> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
> subclasses: {MonitorDelay. DelayWaitTimeout}
> name: #Delay
> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
> sharedPools: nil
> environment: a SystemDictionary(lots of globals)
> category: #'Kernel-Processes'
> traitComposition: {}
> localSelectors: nil
>
>
> [] in Delay class>>runTimerEventLoop
> Receiver: Delay
> Arguments and temporary variables:
>
> Receiver's instance variables:
> superclass: Object
> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
> format: 138
> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
> subclasses: {MonitorDelay. DelayWaitTimeout}
> name: #Delay
> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
> sharedPools: nil
> environment: a SystemDictionary(lots of globals)
> category: #'Kernel-Processes'
> traitComposition: {}
> localSelectors: nil
>
>
> BlockClosure>>on:do:
> Receiver: [self handleTimerEvent]
> Arguments and temporary variables:
> exception: Error
> handlerAction: [:e |
> self startTimerEventLoop.
> FileStream
> fileNamed: '/dev/...etc...
> handlerActive: false
> Receiver's instance variables:
> outerContext: Delay class>>runTimerEventLoop
> startpc: 108
> numArgs: 0
>
>
> Delay class>>runTimerEventLoop
> Receiver: Delay
> Arguments and temporary variables:
>
> Receiver's instance variables:
> superclass: Object
> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
> format: 138
> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
> subclasses: {MonitorDelay. DelayWaitTimeout}
> name: #Delay
> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
> sharedPools: nil
> environment: a SystemDictionary(lots of globals)
> category: #'Kernel-Processes'
> traitComposition: {}
> localSelectors: nil
>
>
> [] in Delay class>>startTimerEventLoop
> Receiver: Delay
> Arguments and temporary variables:
>
> Receiver's instance variables:
> superclass: Object
> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
> format: 138
> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
> subclasses: {MonitorDelay. DelayWaitTimeout}
> name: #Delay
> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
> sharedPools: nil
> environment: a SystemDictionary(lots of globals)
> category: #'Kernel-Processes'
> traitComposition: {}
> localSelectors: nil
>
>
> [] in BlockClosure>>newProcess
> Receiver: [self runTimerEventLoop]
> Arguments and temporary variables:
>
> Receiver's instance variables:
> outerContext: Delay class>>startTimerEventLoop
> startpc: 144
> numArgs: 0
>
>
>
> --- The full stack ---
> Semaphore(Object)>>error:
> [] in Semaphore(LinkedList)>>removeLink:
> Semaphore(LinkedList)>>removeLink:ifAbsent:
> Semaphore(LinkedList)>>removeLink:
> Semaphore(LinkedList)>>remove:ifAbsent:
> Process>>suspend
> DelayWaitTimeout>>signalWaitingProcess
> Delay class>>handleTimerEvent
> [] in Delay class>>runTimerEventLoop
> BlockClosure>>on:do:
> Delay class>>runTimerEventLoop
> [] in Delay class>>startTimerEventLoop
> [] in BlockClosure>>newProcess
> ------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Andreas.Raab
 
Oh, I also just noticed something else: It seems as if the VMMaker
changes for atomic suspend never made it into the main line. I just
posted the relevant bits but that means you have another workaround. You
can use the Cog VM, which has had these changes from the beginning. By
using Cog, the primitive suspend will never fail and consequently the
(still broken) code in LinkedList>>remove:ifAbsent: will never be executed.

Cheers,
   - Andreas

On 12/31/2010 10:08 AM, Andreas Raab wrote:

>
> Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and
> your problems will go away.
>
> Cheers,
> - Andreas
>
> On 12/30/2010 11:50 PM, Adrian Lienhard wrote:
>>
>> Thanks Andreas and David for the responses!
>>
>> In the meantime I've gathered more information. From the mail of
>> Andreas I assumed that the most likely reason for the freeze is that
>> the timer event loop throws an unhandled exception and therefore gets
>> suspended.
>>
>> So I added a guard to catch any error in handleTimerEvent, restart the
>> loop, and then pass the exception to open a debugger:
>>
>> runTimerEventLoop
>> [RunTimerEventLoop] whileTrue: [
>> [ self handleTimerEvent ]
>> on: Error
>> do: [ :e |
>> self startTimerEventLoop.
>> ...write a warning to stdout...
>> e pass ] ]
>>
>> And voila, after 10 days or so I got the stack trace below.
>>
>> I haven't had time to dive into it, but from the stack it seems like a
>> concurrency issue in linked list (although I wonder whether that's
>> possible since the timer event loop runs at the highest priority...).
>>
>> Maybe something catches somebody's eye.
>>
>> Cheers,
>> Adrian
>>
>>
>> THERE_BE_DRAGONS_HERE
>> Error: no such method!
>> 30 December 2010 10:32:28 pm
>>
>> VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest
>> update: #7179]
>> Image: Pharo1.1 [Latest update: #11410]
>>
>> Semaphore(Object)>>error:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> t1: 'no such method!'
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> [] in Semaphore(LinkedList)>>removeLink:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLink: a Process in [] in DelayWaitTimeout>>wait
>> aBlock: [self error: 'no such method!']
>> tempLink: nil
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>removeLink:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLink: a Process in [] in DelayWaitTimeout>>wait
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>remove:ifAbsent:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
>> aBlock: []
>> link: a Process in [] in DelayWaitTimeout>>wait
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Process>>suspend
>> Receiver: a Process in [] in DelayWaitTimeout>>wait
>> Arguments and temporary variables:
>> t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Receiver's instance variables:
>> nextLink: nil
>> suspendedContext: [] in DelayWaitTimeout>>wait
>> priority: 30
>> myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> errorHandler: nil
>> name: 'seaside'
>> env: nil
>>
>>
>> DelayWaitTimeout>>signalWaitingProcess
>> Receiver: a DelayWaitTimeout(10000 msecs)
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> delayDuration: 10000
>> resumptionTime: 217048389
>> delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> beingWaitedOn: false
>> process: a Process in [] in DelayWaitTimeout>>wait
>> expired: true
>>
>>
>> Delay class>>handleTimerEvent
>> Receiver: Delay
>> Arguments and temporary variables:
>> t1: 217128602
>> t2: nil
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a
>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>> 'beingWa...etc...
>> organization: ('as yet unclassified'
>> adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>> Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in Delay class>>runTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a
>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>> 'beingWa...etc...
>> organization: ('as yet unclassified'
>> adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>> Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> BlockClosure>>on:do:
>> Receiver: [self handleTimerEvent]
>> Arguments and temporary variables:
>> exception: Error
>> handlerAction: [:e |
>> self startTimerEventLoop.
>> FileStream
>> fileNamed: '/dev/...etc...
>> handlerActive: false
>> Receiver's instance variables:
>> outerContext: Delay class>>runTimerEventLoop
>> startpc: 108
>> numArgs: 0
>>
>>
>> Delay class>>runTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a
>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>> 'beingWa...etc...
>> organization: ('as yet unclassified'
>> adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>> Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in Delay class>>startTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a
>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>> 'beingWa...etc...
>> organization: ('as yet unclassified'
>> adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>> Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in BlockClosure>>newProcess
>> Receiver: [self runTimerEventLoop]
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> outerContext: Delay class>>startTimerEventLoop
>> startpc: 144
>> numArgs: 0
>>
>>
>>
>> --- The full stack ---
>> Semaphore(Object)>>error:
>> [] in Semaphore(LinkedList)>>removeLink:
>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>> Semaphore(LinkedList)>>removeLink:
>> Semaphore(LinkedList)>>remove:ifAbsent:
>> Process>>suspend
>> DelayWaitTimeout>>signalWaitingProcess
>> Delay class>>handleTimerEvent
>> [] in Delay class>>runTimerEventLoop
>> BlockClosure>>on:do:
>> Delay class>>runTimerEventLoop
>> [] in Delay class>>startTimerEventLoop
>> [] in BlockClosure>>newProcess
>> ------------------------------------------------------------
>
Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

stephane ducasse-2

Andreas

thanks for the fix. Now I imagine that henrik is on holidays (I cannot ask him what was the intention of his changes).
Am'I correct to suspect that the bug is due to the fact that == are not preemptable while the code of pharo definition using
plain message sends can be interrupted?

Stef

On Dec 31, 2010, at 10:32 AM, Andreas Raab wrote:

> Oh, I also just noticed something else: It seems as if the VMMaker changes for atomic suspend never made it into the main line. I just posted the relevant bits but that means you have another workaround. You can use the Cog VM, which has had these changes from the beginning. By using Cog, the primitive suspend will never fail and consequently the (still broken) code in LinkedList>>remove:ifAbsent: will never be executed.
>
> Cheers,
>  - Andreas
>
> On 12/31/2010 10:08 AM, Andreas Raab wrote:
>>
>> Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and
>> your problems will go away.
>>
>> Cheers,
>> - Andreas
>>
>> On 12/30/2010 11:50 PM, Adrian Lienhard wrote:
>>>
>>> Thanks Andreas and David for the responses!
>>>
>>> In the meantime I've gathered more information. From the mail of
>>> Andreas I assumed that the most likely reason for the freeze is that
>>> the timer event loop throws an unhandled exception and therefore gets
>>> suspended.
>>>
>>> So I added a guard to catch any error in handleTimerEvent, restart the
>>> loop, and then pass the exception to open a debugger:
>>>
>>> runTimerEventLoop
>>> [RunTimerEventLoop] whileTrue: [
>>> [ self handleTimerEvent ]
>>> on: Error
>>> do: [ :e |
>>> self startTimerEventLoop.
>>> ...write a warning to stdout...
>>> e pass ] ]
>>>
>>> And voila, after 10 days or so I got the stack trace below.
>>>
>>> I haven't had time to dive into it, but from the stack it seems like a
>>> concurrency issue in linked list (although I wonder whether that's
>>> possible since the timer event loop runs at the highest priority...).
>>>
>>> Maybe something catches somebody's eye.
>>>
>>> Cheers,
>>> Adrian
>>>
>>>
>>> THERE_BE_DRAGONS_HERE
>>> Error: no such method!
>>> 30 December 2010 10:32:28 pm
>>>
>>> VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest
>>> update: #7179]
>>> Image: Pharo1.1 [Latest update: #11410]
>>>
>>> Semaphore(Object)>>error:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> t1: 'no such method!'
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> [] in Semaphore(LinkedList)>>removeLink:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLink: a Process in [] in DelayWaitTimeout>>wait
>>> aBlock: [self error: 'no such method!']
>>> tempLink: nil
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>removeLink:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLink: a Process in [] in DelayWaitTimeout>>wait
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>remove:ifAbsent:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
>>> aBlock: []
>>> link: a Process in [] in DelayWaitTimeout>>wait
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Process>>suspend
>>> Receiver: a Process in [] in DelayWaitTimeout>>wait
>>> Arguments and temporary variables:
>>> t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Receiver's instance variables:
>>> nextLink: nil
>>> suspendedContext: [] in DelayWaitTimeout>>wait
>>> priority: 30
>>> myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> errorHandler: nil
>>> name: 'seaside'
>>> env: nil
>>>
>>>
>>> DelayWaitTimeout>>signalWaitingProcess
>>> Receiver: a DelayWaitTimeout(10000 msecs)
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> delayDuration: 10000
>>> resumptionTime: 217048389
>>> delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> beingWaitedOn: false
>>> process: a Process in [] in DelayWaitTimeout>>wait
>>> expired: true
>>>
>>>
>>> Delay class>>handleTimerEvent
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>> t1: 217128602
>>> t2: nil
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a
>>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>>
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>>> 'beingWa...etc...
>>> organization: ('as yet unclassified'
>>> adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>>> Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in Delay class>>runTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a
>>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>>
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>>> 'beingWa...etc...
>>> organization: ('as yet unclassified'
>>> adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>>> Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> BlockClosure>>on:do:
>>> Receiver: [self handleTimerEvent]
>>> Arguments and temporary variables:
>>> exception: Error
>>> handlerAction: [:e |
>>> self startTimerEventLoop.
>>> FileStream
>>> fileNamed: '/dev/...etc...
>>> handlerActive: false
>>> Receiver's instance variables:
>>> outerContext: Delay class>>runTimerEventLoop
>>> startpc: 108
>>> numArgs: 0
>>>
>>>
>>> Delay class>>runTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a
>>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>>
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>>> 'beingWa...etc...
>>> organization: ('as yet unclassified'
>>> adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>>> Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in Delay class>>startTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a
>>> MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>>
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore'
>>> 'beingWa...etc...
>>> organization: ('as yet unclassified'
>>> adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a
>>> Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in BlockClosure>>newProcess
>>> Receiver: [self runTimerEventLoop]
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> outerContext: Delay class>>startTimerEventLoop
>>> startpc: 144
>>> numArgs: 0
>>>
>>>
>>>
>>> --- The full stack ---
>>> Semaphore(Object)>>error:
>>> [] in Semaphore(LinkedList)>>removeLink:
>>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>>> Semaphore(LinkedList)>>removeLink:
>>> Semaphore(LinkedList)>>remove:ifAbsent:
>>> Process>>suspend
>>> DelayWaitTimeout>>signalWaitingProcess
>>> Delay class>>handleTimerEvent
>>> [] in Delay class>>runTimerEventLoop
>>> BlockClosure>>on:do:
>>> Delay class>>runTimerEventLoop
>>> [] in Delay class>>startTimerEventLoop
>>> [] in BlockClosure>>newProcess
>>> ------------------------------------------------------------
>>

Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Adrian Lienhard
In reply to this post by Andreas.Raab

I'll try that...

But I still don't understand why the change in Pharo that makes LinkedList>>remove:ifAbsent: non thread safe can cause the problem since this code is executed by the timerEvent process, which runs at the highest priority. This process should never be suspended during the execution of remove:ifAbsent:. What do I miss?

Cheers,
Adrian

On Dec 31, 2010, at 10:08 , Andreas Raab wrote:

> Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and your problems will go away.
>
> Cheers,
>  - Andreas
>
> On 12/30/2010 11:50 PM, Adrian Lienhard wrote:
>>
>> Thanks Andreas and David for the responses!
>>
>> In the meantime I've gathered more information. From the mail of Andreas I assumed that the most likely reason for the freeze is that the timer event loop throws an unhandled exception and therefore gets suspended.
>>
>> So I added a guard to catch any error in handleTimerEvent, restart the loop, and then pass the exception to open a debugger:
>>
>> runTimerEventLoop
>> [RunTimerEventLoop] whileTrue: [
>> [ self handleTimerEvent ]
>> on: Error
>> do: [ :e |
>> self startTimerEventLoop.
>> ...write a warning to stdout...
>> e pass ] ]
>>
>> And voila, after 10 days or so I got the stack trace below.
>>
>> I haven't had time to dive into it, but from the stack it seems like a concurrency issue in linked list (although I wonder whether that's possible since the timer event loop runs at the highest priority...).
>>
>> Maybe something catches somebody's eye.
>>
>> Cheers,
>> Adrian
>>
>>
>> THERE_BE_DRAGONS_HERE
>> Error: no such method!
>> 30 December 2010 10:32:28 pm
>>
>> VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest update: #7179]
>> Image: Pharo1.1 [Latest update: #11410]
>>
>> Semaphore(Object)>>error:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> t1: 'no such method!'
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> [] in Semaphore(LinkedList)>>removeLink:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLink: a Process in [] in DelayWaitTimeout>>wait
>> aBlock: [self error: 'no such method!']
>> tempLink: nil
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>removeLink:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLink: a Process in [] in DelayWaitTimeout>>wait
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Semaphore(LinkedList)>>remove:ifAbsent:
>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Arguments and temporary variables:
>> aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
>> aBlock: []
>> link: a Process in [] in DelayWaitTimeout>>wait
>> Receiver's instance variables:
>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>> excessSignals: 0
>>
>>
>> Process>>suspend
>> Receiver: a Process in [] in DelayWaitTimeout>>wait
>> Arguments and temporary variables:
>> t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> Receiver's instance variables:
>> nextLink: nil
>> suspendedContext: [] in DelayWaitTimeout>>wait
>> priority: 30
>> myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> errorHandler: nil
>> name: 'seaside'
>> env: nil
>>
>>
>> DelayWaitTimeout>>signalWaitingProcess
>> Receiver: a DelayWaitTimeout(10000 msecs)
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> delayDuration: 10000
>> resumptionTime: 217048389
>> delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>> beingWaitedOn: false
>> process: a Process in [] in DelayWaitTimeout>>wait
>> expired: true
>>
>>
>> Delay class>>handleTimerEvent
>> Receiver: Delay
>> Arguments and temporary variables:
>> t1: 217128602
>> t2: nil
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in Delay class>>runTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> BlockClosure>>on:do:
>> Receiver: [self handleTimerEvent]
>> Arguments and temporary variables:
>> exception: Error
>> handlerAction: [:e |
>> self startTimerEventLoop.
>> FileStream
>> fileNamed: '/dev/...etc...
>> handlerActive: false
>> Receiver's instance variables:
>> outerContext: Delay class>>runTimerEventLoop
>> startpc: 108
>> numArgs: 0
>>
>>
>> Delay class>>runTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in Delay class>>startTimerEventLoop
>> Receiver: Delay
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> superclass: Object
>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>> format: 138
>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>> subclasses: {MonitorDelay. DelayWaitTimeout}
>> name: #Delay
>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>> sharedPools: nil
>> environment: a SystemDictionary(lots of globals)
>> category: #'Kernel-Processes'
>> traitComposition: {}
>> localSelectors: nil
>>
>>
>> [] in BlockClosure>>newProcess
>> Receiver: [self runTimerEventLoop]
>> Arguments and temporary variables:
>>
>> Receiver's instance variables:
>> outerContext: Delay class>>startTimerEventLoop
>> startpc: 144
>> numArgs: 0
>>
>>
>>
>> --- The full stack ---
>> Semaphore(Object)>>error:
>> [] in Semaphore(LinkedList)>>removeLink:
>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>> Semaphore(LinkedList)>>removeLink:
>> Semaphore(LinkedList)>>remove:ifAbsent:
>> Process>>suspend
>> DelayWaitTimeout>>signalWaitingProcess
>> Delay class>>handleTimerEvent
>> [] in Delay class>>runTimerEventLoop
>> BlockClosure>>on:do:
>> Delay class>>runTimerEventLoop
>> [] in Delay class>>startTimerEventLoop
>> [] in BlockClosure>>newProcess
>> ------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Image freeze because handleTimerEvent and Seaside process gone?!

Andreas.Raab
 
On 1/3/2011 10:09 AM, Adrian Lienhard wrote:
> I'll try that...
>
> But I still don't understand why the change in Pharo that makes LinkedList>>remove:ifAbsent: non thread safe can cause the problem since this code is executed by the timerEvent process, which runs at the highest priority. This process should never be suspended during the execution of remove:ifAbsent:. What do I miss?

The problem isn't thread-safety, at least in the classical definition.
What happens is that if you're removing processes by using
LinkedList>>remove: you are subject to a race condition where the
semaphore gets signaled *while* you are removing the the process.
Obviously, hillarity ensues at this point, which is why I made primitive
suspend do the Right Thing (i.e., remove the process primitively). There
are two parameters which affect if you're likely to see the effect or
not: One is the number of suspension points (real sends) in the method.
The more you have, the more likely you're affected. The second one is
whether the method can tolerate having the process removed "underneith
its feet". Both are far worse in Pharo.

Cheers,
   - Andreas

>
> Cheers,
> Adrian
>
> On Dec 31, 2010, at 10:08 , Andreas Raab wrote:
>
>> Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and your problems will go away.
>>
>> Cheers,
>>   - Andreas
>>
>> On 12/30/2010 11:50 PM, Adrian Lienhard wrote:
>>>
>>> Thanks Andreas and David for the responses!
>>>
>>> In the meantime I've gathered more information. From the mail of Andreas I assumed that the most likely reason for the freeze is that the timer event loop throws an unhandled exception and therefore gets suspended.
>>>
>>> So I added a guard to catch any error in handleTimerEvent, restart the loop, and then pass the exception to open a debugger:
>>>
>>> runTimerEventLoop
>>> [RunTimerEventLoop] whileTrue: [
>>> [ self handleTimerEvent ]
>>> on: Error
>>> do: [ :e |
>>> self startTimerEventLoop.
>>> ...write a warning to stdout...
>>> e pass ] ]
>>>
>>> And voila, after 10 days or so I got the stack trace below.
>>>
>>> I haven't had time to dive into it, but from the stack it seems like a concurrency issue in linked list (although I wonder whether that's possible since the timer event loop runs at the highest priority...).
>>>
>>> Maybe something catches somebody's eye.
>>>
>>> Cheers,
>>> Adrian
>>>
>>>
>>> THERE_BE_DRAGONS_HERE
>>> Error: no such method!
>>> 30 December 2010 10:32:28 pm
>>>
>>> VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest update: #7179]
>>> Image: Pharo1.1 [Latest update: #11410]
>>>
>>> Semaphore(Object)>>error:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> t1: 'no such method!'
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> [] in Semaphore(LinkedList)>>removeLink:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLink: a Process in [] in DelayWaitTimeout>>wait
>>> aBlock: [self error: 'no such method!']
>>> tempLink: nil
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>removeLink:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLink: a Process in [] in DelayWaitTimeout>>wait
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Semaphore(LinkedList)>>remove:ifAbsent:
>>> Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Arguments and temporary variables:
>>> aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait
>>> aBlock: []
>>> link: a Process in [] in DelayWaitTimeout>>wait
>>> Receiver's instance variables:
>>> firstLink: a Process in [] in DelayWaitTimeout>>wait
>>> lastLink: a Process in [] in DelayWaitTimeout>>wait
>>> excessSignals: 0
>>>
>>>
>>> Process>>suspend
>>> Receiver: a Process in [] in DelayWaitTimeout>>wait
>>> Arguments and temporary variables:
>>> t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> Receiver's instance variables:
>>> nextLink: nil
>>> suspendedContext: [] in DelayWaitTimeout>>wait
>>> priority: 30
>>> myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> errorHandler: nil
>>> name: 'seaside'
>>> env: nil
>>>
>>>
>>> DelayWaitTimeout>>signalWaitingProcess
>>> Receiver: a DelayWaitTimeout(10000 msecs)
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> delayDuration: 10000
>>> resumptionTime: 217048389
>>> delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait)
>>> beingWaitedOn: false
>>> process: a Process in [] in DelayWaitTimeout>>wait
>>> expired: true
>>>
>>>
>>> Delay class>>handleTimerEvent
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>> t1: 217128602
>>> t2: nil
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in Delay class>>runTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> BlockClosure>>on:do:
>>> Receiver: [self handleTimerEvent]
>>> Arguments and temporary variables:
>>> exception: Error
>>> handlerAction: [:e |
>>> self startTimerEventLoop.
>>> FileStream
>>> fileNamed: '/dev/...etc...
>>> handlerActive: false
>>> Receiver's instance variables:
>>> outerContext: Delay class>>runTimerEventLoop
>>> startpc: 108
>>> numArgs: 0
>>>
>>>
>>> Delay class>>runTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in Delay class>>startTimerEventLoop
>>> Receiver: Delay
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> superclass: Object
>>> methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc...
>>> format: 138
>>> instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc...
>>> organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc...
>>> subclasses: {MonitorDelay. DelayWaitTimeout}
>>> name: #Delay
>>> classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc...
>>> sharedPools: nil
>>> environment: a SystemDictionary(lots of globals)
>>> category: #'Kernel-Processes'
>>> traitComposition: {}
>>> localSelectors: nil
>>>
>>>
>>> [] in BlockClosure>>newProcess
>>> Receiver: [self runTimerEventLoop]
>>> Arguments and temporary variables:
>>>
>>> Receiver's instance variables:
>>> outerContext: Delay class>>startTimerEventLoop
>>> startpc: 144
>>> numArgs: 0
>>>
>>>
>>>
>>> --- The full stack ---
>>> Semaphore(Object)>>error:
>>> [] in Semaphore(LinkedList)>>removeLink:
>>> Semaphore(LinkedList)>>removeLink:ifAbsent:
>>> Semaphore(LinkedList)>>removeLink:
>>> Semaphore(LinkedList)>>remove:ifAbsent:
>>> Process>>suspend
>>> DelayWaitTimeout>>signalWaitingProcess
>>> Delay class>>handleTimerEvent
>>> [] in Delay class>>runTimerEventLoop
>>> BlockClosure>>on:do:
>>> Delay class>>runTimerEventLoop
>>> [] in Delay class>>startTimerEventLoop
>>> [] in BlockClosure>>newProcess
>>> ------------------------------------------------------------
>
>