Hi Guys -
I debugged a *really* interesting problem today. For some reason, our Croquet sessions failed seemingly random with socket timeouts in strange places. The main clue we had was that it was somehow related to a rather large space being replicated over a rather slow line (a DSL uplink as the source for replication). Tracking this down into its gory details I ended up with a test case like here: data := ByteArray new: 10000000. socket := Socket newTCP. socket connectTo: 'myHost' port: myPort. socket sendData: data count: count. socket sendData: 'Hello' count: 5. When I did this over a slow uplink this would *reliably* time out on the second sendData:count: call. But why? Simply put, because the windows sockets interface doesn't quite function like I *thought* it would. I had expected the Windows send() call to accept only a "TCP packet size" full of data but it turns out it takes *everything* right down to the last byte in the first call. Meaning that the first sendData: call returns immediately but after that call it's chugging along trying to get the data out to the interface and the next sendData: call really wants a response with the default ConnectionTimeOut (which is less than the time it needs to complete the previous send). Why is this relevant? I believe pretty much all code we currently have is written under the assumption that the primitive will only accept "reasonable" amounts of data. Any code that pushes large amounts of data and expects the socket interface to handle it will be affected by this problem. I also suspect that other platforms may show similar behavior so some testing is in order. If you had random unexplained timeouts when sending large data buffers over slow lines, splitting them up into smaller ones as a workaround may just be your ticket until I fixed this problem in the VM, e.g., make the VM only take "reasonable" amounts of data in each call such that the caller can rest assured that the time out values are meaningful. I would also be interested in what other platforms do. Basically, the question is whether the primitive returns immediately in a single call, consuming all the data, or whether it will loop in Socket>>sendData:count:. If you have evidence towards either end please post your results to VM-dev (incl. the precise version of your OS). Cheers, - Andreas |
On 2-Oct-06, at 8:01 PM, Andreas Raab wrote: > data := ByteArray new: 10000000. > socket := Socket newTCP. > socket connectTo: 'myHost' port: myPort. > socket sendData: data count: count. > socket sendData: 'Hello' count: 5. You mean a example like serverAddr := NetNameResolver addressForName: 'localhost' timeout: 54323. count := 10000000. > data := ByteArray new: count. > socket := Socket newTCP. > socket connectTo: serverAddr port: myPort. > socket sendData: data count: count. > socket sendData: 'Hello' count: 5. man send Depending on your flavor of unix it may or may not allow you to grab 10,000,000 bytes of storage. If it does then If no messages space is available at the socket to hold the message to be transmitted, then send() normally blocks, unless the socket has been placed in non-blocking I/O mode. The select(2) call may be used to determine when it is possible to send more data. We don't run the socket in non-blocking mode so the socket will block if there isn't space to transmit the message. 10MB will block and timeout I note if I don't read the data on the server, but at 100K it will cheerfully accept and say it sent the bytes, how much it will accept before blocking is dependent on window size etc, likely my home gigabit infranet (RFC 1323) is configured to allow lots of bytes in flight btw, so other networks might abort at 100K. Iin this case the socket won't block until I've sent the agreed window size of data which is > 100K We have no idea if the data has been received by the other side yet, and if sending to Mars we still have many minutes to wait. One problem people have encounter in the past is sending oh say 64K then closing the socket on a slow connection, the socket *lingers* around open for the linger time for a few seconds after the close request to flush any data, but on a slow connection this linger time is insufficient to ensure all the data is transmitted beofre the close. If your model is send 10MB, then close the socket, I suspect it's terminating before the data is fully sent on unix based machines. btw sqSocketSendDone on the unix platforms checks to see if sending more data would block if not then it says the send is done, which isn't quite true since the send might not be done because send done doesn't mean all the bytes are sent and delivered to the remote host, that is a different question. sendDone ~= messageDelivered -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
Oops forgot to mention we don't really use send, rather the unix
socket plugin uses async write() , however it checks the ability for the socket to accept data via sendDone logic in order not to block and/or retry the write. However the ability for the socket to accept upto the window size is a feature On 2-Oct-06, at 8:01 PM, Andreas Raab wrote: > Hi Guys - > > I debugged a *really* interesting problem today. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
fyi, the socket will digest 146,988 bytes before saying it can't
accept more data on my machine in my intranet configuration. On 2-Oct-06, at 8:01 PM, Andreas Raab wrote: > Hi Guys - > > I debugged a *really* interesting problem today. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
Andreas Raab <[hidden email]> writes:
> Why is this relevant? I believe pretty much all code we currently have > is written under the assumption that the primitive will only accept > "reasonable" amounts of data. Any code that pushes large amounts of > data and expects the socket interface to handle it will be affected by > this problem. I don't know the actual behavior of current platforms, but it would sure be nice if you did not have to make this assumption. If I was maintaining an OS and/or a router, I might well think it is a *feature* to support large buffers.... -Lex |
Free forum by Nabble | Edit this page |