Re: Socket>>#sendData:count: does no error checking and hence locks up the system

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Socket>>#sendData:count: does no error checking and hence locks up the system

Eliot Miranda-2
Hi All,

    one thing this lock-up suggests is that interrupting should interrupt all processes running at user priority, not just the uiProcess.  Does that make sense?  It does to me, but would be something I'd control by a preference for testing its effects.

On Mon, Feb 29, 2016 at 7:23 PM, Eliot Miranda <[hidden email]> wrote:
Hi Levente, Hi All,

    I'm trying to investigate the socket issues in aio.c but have found a much moire basic issue.  With my recent changes to Network that more carefully checked for errors the SocketTest>>testSocketReuse test appears to lock up.  In fact, the VM is fine, happily doing what it's being told by Socket>>#sendData:count:

Socket>>sendData: buffer count: n
"Send the amount of data from the given buffer"
| sent |
sent := 0.
[sent < n] whileTrue:[
sent := sent + (self sendSomeData: buffer startIndex: sent+1 count: (n-sent))].

The VM keeps trying to send data on a socket that is being reused and gets an error from sendto, answers 0 as the number of bytes sent, as required, but Socket>>#sendData:count: pays no heed and spins hard.  Here's the traces:

The test is SocketTest>>testSocketReuse which spawns two processes, one to send and one to receive data.  Here are the processes:

Process  0x48641f8 priority 40
0xbfec0498 M Socket>sendSomeData:startIndex:count:for: 0x4864d18: a(n) Socket
0xbfec04c0 M Socket>sendSomeData:startIndex:count: 0x4864d18: a(n) Socket
0xbfec04ec M Socket>sendData:count: 0x4864d18: a(n) Socket
0xbfec0520 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest
0xbfec0540 I [] in BlockClosure>newProcess 0x4864df0: a(n) BlockClosure

Process  0x6543178 priority 40
0xbfec22c8 I [] in Delay>wait 0x4864ea0: a(n) Delay
0xbfec22f0 I BlockClosure>ifCurtailed: 0x4864eb8: a(n) BlockClosure
0xbfec2314 I Delay>wait 0x4864ea0: a(n) Delay
0xbfec2340 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest
0xbfec2360 M BlockClosure>ensure: 0x4864fa8: a(n) BlockClosure
0xbfec2390 I SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest

Process  0x4864168 priority 40
0xbfec3438 I [] in DelayWaitTimeout>wait 0x48652f8: a(n) DelayWaitTimeout
0xbfec3458 M BlockClosure>ensure: 0x4865378: a(n) BlockClosure
0xbfec347c I DelayWaitTimeout>wait 0x48652f8: a(n) DelayWaitTimeout
0xbfec34a0 I Semaphore>waitTimeoutMSecs: 0x48652e0: a(n) Semaphore
0xbfec34c4 I Socket>waitForDataIfClosed: 0x4865408: a(n) Socket
0xbfec34f0 I Socket>receiveDataInto:startingAt: 0x4865408: a(n) Socket
0xbfec3520 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest
0xbfec3540 I [] in BlockClosure>newProcess 0x48654c0: a(n) BlockClosure

And here's the VM spinning:
   15726    0 sqUnixSocket.c:1128 UDP sendData(11, 16)
   15726    0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected
   15726    0 sqUnixSocket.c:1128 UDP sendData(11, 16)
   15726    0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected
   15726    0 sqUnixSocket.c:1128 UDP sendData(11, 16)
   15726    0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected
   ...etc...

Ah!! Of course.  Because I have changed the default scheduling semantics in Squeak 5 to make preemption not a yield point, Socket>>#sendData:count:  never yields to the other processes.  Previously when the Delay process woke up this would implicitly yield the process spinning in Socket>>#sendData:count:.

So Socket>>#sendData:count: needs to do a yield if no data is sent.  However, shouldn't but also check for errors if no data is sent and do something like return an error if it discovers, via Socket>>primSocketError:, that the socket is not happy?

_,,,^..^,,,_
best, Eliot



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Socket>>#sendData:count: does no error checking and hence locks up the system

Bert Freudenberg
On 01.03.2016, at 21:14, Eliot Miranda <[hidden email]> wrote:
>
> Hi All,
>
>     one thing this lock-up suggests is that interrupting should interrupt all processes running at user priority, not just the uiProcess.  Does that make sense?  It does to me, but would be something I'd control by a preference for testing its effects.


I thought it interrupted the active process? Wouldn’t that make most sense?

- Bert -





smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Re: Socket>>#sendData:count: does no error checking and hence locks up the system

Eliot Miranda-2


On Tue, Mar 1, 2016 at 12:32 PM, Bert Freudenberg <[hidden email]> wrote:
 
On 01.03.2016, at 21:14, Eliot Miranda <[hidden email]> wrote:
>
> Hi All,
>
>     one thing this lock-up suggests is that interrupting should interrupt all processes running at user priority, not just the uiProcess.  Does that make sense?  It does to me, but would be something I'd control by a preference for testing its effects.


I thought it interrupted the active process? Wouldn’t that make most sense?

Not necessarily.  For example, in the test I referred to, testSocketReuse, the ui process (the process running the test) spawns two other processes that spin hard, one trying to write to a socket and one trying to read form a socket.  If the socket code doesn't detect errors properly then these processes continue to spin hard.  If one interrupts then /nothing/ appears to happen.  The ui process is indeed interrupted, but because the other two processes continue to spin hard they shut out the notifier which doesn't appear.  And even if the notifier did appear those processes would still be spinning hard, making it difficult for the user to interact with the notifier.  So in this case it makes sense to interrupt all processes running at user priority.  Arguably it makes sense to interrupt any and all processes running at or above user priority and below user interrupt priority.  Usually there's only the ui process in this range, but occasionally there are more and errors can cause them to make an interrupt ineffective if it only interrupts the ui process.


- Bert -






--
_,,,^..^,,,_
best, Eliot