Crashes due the async queue running over

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Crashes due the async queue running over

Holger Freyther
Hi Paolo,

When using sockets and the gst readline I run into my app being aborted. I
have written about this earlier but couldn't reproduce this back then. Right
now I was having a TCP Connection open in the background (most likely closed
by the remote as I didn't respond to a ping) and I try to load some code with
FileStream fileIn.

I have attached the GDB backtrace and the two variables from frame 3 that
decide that we end up in the abort. I have also the strace output.

do you have any idea of what is going wrong?


(gdb) bt
#0  0x00ca9424 in __kernel_vsyscall ()
#1  0x00a432f1 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x00a44d5e in abort () at abort.c:92
#3  0x00f91d07 in _gst_async_call (func=0xf71a40 <async_signal_polled_files>,
arg=0x0) at interp.c:1625
#4  0x00f72a9e in file_polling_handler (sig=29) at sysdep/posix/events.c:345
#5  <signal handler called>
#6  0x00ca9424 in __kernel_vsyscall ()
#7  0x00ae4bfb in __poll (fds=0xbf99df58, nfds=1, timeout=-1) at
../sysdeps/unix/sysv/linux/poll.c:83
#8  0x00f72c9e in _gst_wait_for_input (fd=0) at sysdep/posix/events.c:467
#9  0x00f6fa81 in poll_and_read (fd=0, buf=0xbf99dfdf "", n=1) at input.c:790
#10 0x00f6fb0c in readline_getc (file=0xb9e440) at input.c:1118
#11 0x004c8393 in rl_read_key () at ../input.c:446
#12 0x004b26be in readline_internal_char () at ../readline.c:517

(gdb) p async_queue_index_sig
$1 = 65

(gdb) p _gst_signal_count
$2 = 1

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

strace-async-queue.txt.gz (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Paolo Bonzini-2
On Wed, Apr 13, 2011 at 18:12, Holger Hans Peter Freyther
<[hidden email]> wrote:
> I have attached the GDB backtrace and the two variables from frame 3 that
> decide that we end up in the abort. I have also the strace output.
>
> do you have any idea of what is going wrong?

How many sockets are open?  In any case, the simplest thing is to
register the signal handler itself as an async call (a kind of "bottom
half") so that the signals can then be done synchronously.

The problem is that _gst_async_call must be signal-safe, so it cannot
call malloc.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 04/14/2011 08:38 AM, Paolo Bonzini wrote:

> On Wed, Apr 13, 2011 at 18:12, Holger Hans Peter Freyther
> <[hidden email]> wrote:
>> I have attached the GDB backtrace and the two variables from frame 3 that
>> decide that we end up in the abort. I have also the strace output.
>>
>> do you have any idea of what is going wrong?
>
> How many sockets are open?  In any case, the simplest thing is to
> register the signal handler itself as an async call (a kind of "bottom
> half") so that the signals can then be done synchronously.

I might have multiple instances of 'Socket' around (I should add them to the
finalizer list to have close called) but only _one_ should be connected.


>
> The problem is that _gst_async_call must be signal-safe, so it cannot
> call malloc.

Ah okay, this is why we are checking if we are in GST. The question I have
is... do we get the SIGIO for the _same_ socket/fd all over again. Can you see
this from the strace or should I inspect the 'bottom half' queue?

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Paolo Bonzini-2
On 04/14/2011 09:23 AM, Holger Hans Peter Freyther wrote:
>> >  The problem is that _gst_async_call must be signal-safe, so it cannot
>> >  call malloc.
>
> Ah okay, this is why we are checking if we are in GST. The question I have
> is... do we get the SIGIO for the_same_  socket/fd all over again. Can you see
> this from the strace or should I inspect the 'bottom half' queue?

I think you may be getting many SIGIO for the same socket.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 04/14/2011 09:24 AM, Paolo Bonzini wrote:
> On 04/14/2011 09:23 AM, Holger Hans Peter Freyther wrote:
>>> >  The problem is that _gst_async_call must be signal-safe, so it cannot
>>> >  call malloc.
>>
>> Ah okay, this is why we are checking if we are in GST. The question I have
>> is... do we get the SIGIO for the_same_  socket/fd all over again. Can you see
>> this from the strace or should I inspect the 'bottom half' queue?
>
> I think you may be getting many SIGIO for the same socket.

Hi Paolo,

do you have any idea? I am not sure if I am trying to cover up a bigger
problem, what about not queuing the call when it is already the last one in
the queue? The other question is why is my kernel sending me SIGIO multiple
times for the same socket?

How to proceed from here?

(gdb) p queued_async_signals_sig[38]
$6 = {func = 0xf71a40 <async_signal_polled_files>, data = 0x0}
(gdb) p queued_async_signals_sig[39]
$7 = {func = 0xf71a40 <async_signal_polled_files>, data = 0x0}
(gdb) p queued_async_signals_sig[40]
$8 = {func = 0xf71a40 <async_signal_polled_files>, data = 0x0}
(gdb) p queued_async_signals_sig[41]
$9 = {func = 0xf71a40 <async_signal_polled_files>, data = 0x0}

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 04/14/2011 12:09 PM, Holger Hans Peter Freyther wrote:
ve any idea? I am not sure if I am trying to cover up a bigger
> problem, what about not queuing the call when it is already the last one in
> the queue? The other question is why is my kernel sending me SIGIO multiple
> times for the same socket?
>
> How to proceed from here?
>

Hey,
I tried to not queue a signal if the func/data is the same but that seems to
cause some test failures. What do you think about disabling SIGIO the first
time they happen as we are going to poll anyway?

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Paolo Bonzini-2
On 04/18/2011 04:43 PM, Holger Hans Peter Freyther wrote:

> On 04/14/2011 12:09 PM, Holger Hans Peter Freyther wrote:
> ve any idea? I am not sure if I am trying to cover up a bigger
>> problem, what about not queuing the call when it is already the last one in
>> the queue? The other question is why is my kernel sending me SIGIO multiple
>> times for the same socket?
>>
>> How to proceed from here?
>>
>
> Hey,
> I tried to not queue a signal if the func/data is the same but that seems to
> cause some test failures.

I was thinking of something similar.  I'll look at what failures are these.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 04/18/2011 06:10 PM, Paolo Bonzini wrote:

> I was thinking of something similar.  I'll look at what failures are these.

make check hangs on the Delay.st test with the most simple hack to check if
the last entry is the same as the previous. I assume (just a blind guess) that
we fill the queue with timers from itimer or such... or setting the exception
flag even if we did nothing.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Paolo Bonzini-2
On 04/18/2011 06:14 PM, Holger Hans Peter Freyther wrote:
>> >  I was thinking of something similar.  I'll look at what failures are these.
> make check hangs on the Delay.st test with the most simple hack to check if
> the last entry is the same as the previous. I assume (just a blind guess) that
> we fill the queue with timers from itimer or such... or setting the exception
> flag even if we did nothing.

This should fix it by removing a lot of broken code and getting
{thread,signal}-safety right.  I removed the arrays altogether (replaced
them by lists) so the error you were getting is not there at all anymore! :)

Testing is welcome.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk

0001-rewrite-async-call-queue.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 06/07/2011 06:23 PM, Paolo Bonzini wrote:

>
> This should fix it by removing a lot of broken code and getting
> {thread,signal}-safety right.  I removed the arrays altogether (replaced them
> by lists) so the error you were getting is not there at all anymore! :)


wow, big change. Do you think you resolved the underlying issue of the poll
always getting the SIGIO for the fd? So in theory we can still go OOM as the
list is getting very long?

holger

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Paolo Bonzini-2
On Wed, Jun 8, 2011 at 16:18, Holger Hans Peter Freyther
<[hidden email]> wrote:
> wow, big change. Do you think you resolved the underlying issue of the poll
> always getting the SIGIO for the fd? So in theory we can still go OOM as the
> list is getting very long?

No, you will have only one callback for multiple SIGIOs.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

Holger Freyther
On 06/08/2011 06:13 PM, Paolo Bonzini wrote:
> On Wed, Jun 8, 2011 at 16:18, Holger Hans Peter Freyther
> <[hidden email]> wrote:
>> wow, big change. Do you think you resolved the underlying issue of the poll
>> always getting the SIGIO for the fd? So in theory we can still go OOM as the
>> list is getting very long?
>
> No, you will have only one callback for multiple SIGIOs.

Thanks,

I missed that async_queue_entry is static, so either e.next is NULL, then it
will be added, if not... it is already in the list. It is a bit scary both
lists have the same last element but it looks sound here.


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: Crashes due the async queue running over

luluxiu
In reply to this post by Holger Freyther
In any case, the easiest is to register the signal handler itself as an asynchronous call to the "lower" species, such signals can be synchronized......
[url=http://www.topbagssale.net/On_sales.html]coach outlet[/url],[url=http://www.topbagssale.net/C-32-b0]Coach shoulder bags[/url],[url=http://www.topbagssale.net/C-22-b0]Coach boots on sale[/url]