I know that D6 is scheduled to have a new Sockets implementation, but maybe
this would still be of some use for D6 as interesting example, and of course for old implementation. So here is something I have occasionally experienced under more heavy usage of sockets.This is under D3.06 but I think sockets have not changed significantly since. Occassionally Socket>>receiveByteArray would raise exception, with socket error code 0, which is not valid socket error code. Sniffing network traffic, showed that other side of TCP connection did not close the socket, and that there is no apparent reason why would connection go belly up. After some debugging it appeared that: Socket>>basicReceiveByteArray: anInteger "Private - Reads anInteger bytes from the socket. Answers a ByteArray representing the bytes read." | bytesReceived byteArray grijeska | byteArray := ByteArray new: anInteger. bytesReceived := WSockLibrary default recv: self asParameter buf: byteArray len: byteArray size flags: 0. bytesReceived > 0 ifTrue: [ "Success." ^byteArray copyFrom: 1 to: bytesReceived ]. bytesReceived = 0 ifTrue: [ "Socket has been closed." SocketClosed signal ]. "Some other error." self error. ---------- receives -1 from the socket call, but consequent SocketAbstract>>error "Private - Throw a SocketError exception. We MUST do the wsaGetLastError here rather than leaving it to the SocketError class. Otherwise it is possible (especially with loading classes from STC files) that the last error is lost by the time it is fished out by SocketError." | err | err := WSockLibrary default wsaGetLastError. SocketError signalWith: err. ---------- gets 0 into err which stands for no error, and that last operation was ok. My bet is that original error whas WOULDBLOCK one, but because of some race condition, the error code has ben reset to 0 before it has been read. The consequence is that exception is raised on the socket, while apropriate action would probably be to retry read. As far as I am aware there is no other user level socket operation going on at that time. Moving reading wsaGetLastError a little bit before is possible but I am not sure it would aliveate situation. Maybe protecting recv call and getLastError with critical section would help, but I am not also completely sure this would protect me frome something that is happening oin Dolphin VM or windows socket implementation. I am actually incluned to interpret socket error as WOULDBLOCK. rush -- http://www.templatetamer.com/ |
"rush" <[hidden email]> wrote in message
news:cmd3rk$les$[hidden email]... > I know that D6 is scheduled to have a new Sockets implementation, but maybe > this would still be of some use for D6 as interesting example, and of course > for old implementation. So here is something I have occasionally experienced > under more heavy usage of sockets.This is under D3.06 but I think sockets > have not changed significantly since. just to add, I have checked, 5.1 has same relevant socket code, so it would have same problems. rush -- http://www.templatetamer.com/ |
In reply to this post by rush
rush wrote:
> [details snipped] [I'm not sure how strongly relevant this is -- only tangentially, perhaps -- but you've reminded me of a point I've been meaning to raise for some time, so...] There's a general problem with external interfacing with libraries that use the "errno" or GetLastError() style of error reporting. Even if those libraries are carefully designed so that the status flag is thread-safe (thread-local) the way the Dolphin multiplexes its Processes onto a single OS thread makes it difficult or impossible to avoid the risk that some Process will pre-emptively see or overwrite an error intended for some other Process. I have exactly that problem (and no solution) in JNIPort. There, one is expected to check after every call into Java to see if an exception was thrown (so I do), but there's a risk, theoretically at least, that the wrong Process will see the exception. A related problem is that when handling a callback from Java into Smalltalk, I have to change some "global" (to Smalltalk) state changes for the duration of the callback before anything is "allowed" to talk to Java again. And in a similar way to the above, there's the possibility that once the Dolphin VM is in charge again (in the callback) it will schedule some other Process /before/ the Smalltalk code that is actually handling the callback itself, and that that Process will make use of Java before the necessary changes have been put in place. (Or similarly at the end of the callback, may make use of the global state after it has been returned to "normal" but before the VM has really returned from the callback) At least the above are theoretical possibilities according to my understanding of how the VM works. I have to admit that I've not been able to make either of them manifest in practise (and I've tried pretty hard) so maybe there's something I'm missing. If not then I think what's needed is some way to mark an external call so that when it returns the calling Process is initially non-interruptible. Similarly it should be possible to mark an ExternalCallback so that the VM enters it in a state where the handling process is initially non-interruptible (and has a way to return to that state to clean up at the end of the callback, before returning to the VM). -- chris |
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]... > [details snipped] Yes, that also seems as a (related or similar) problem. Anyway, in case I am seeing it seems there is no other smalltalk Process that might clear error status. I can not be 100% sure at this moment, but let's say it is in 95% range that when this happens there is only one Socket>>receiveByteArray call outstanding (blocked on read), and no other socket operations are issued during that time by my program. What seems like a slightly possible candidate for clearing error status is WinAsyncSocket>>wsaEvent: message wParam: wParam lParam: lParam windows message handler in Sockets implementation. maybe some of these events under some circumstances cause error to be cleared. It seems that many of these messages are delivered late in problem case; i.e. the socket has allready consumed received data, and many notifications come after on. I am currently testing a workaround that raises wouldblock in case error = 0 , and it seems to solve the problem, without bad side effects, but I will conduct more testing. As far as I understand it is pretty safe work around for sockets, since if there really is error on the socket subsequent calls will fail anyway. If it was actually wouldblock, raising wouldblock is right thing to do anyway. Only concern is that undres some circumstance it might cause socket to lag, until new wsaEvent for read is generated. But since it seems there is generally surplus of those messages, and that in my case I more or less allways have something comming to the socket, it seems this will not be an issue. But if there would be, probably periodically "just in case" deblocking sockets processes would do the trick. rush -- http://www.templatetamer.com/ |
In reply to this post by Chris Uppal-3
Chris,
> There's a general problem with external interfacing with libraries that use the > "errno" or GetLastError() style of error reporting. Even if those libraries > are carefully designed so that the status flag is thread-safe (thread-local) > the way the Dolphin multiplexes its Processes onto a single OS thread makes it > difficult or impossible to avoid the risk that some Process will pre-emptively > see or overwrite an error intended for some other Process. > > I have exactly that problem (and no solution) in JNIPort. There, one is > expected to check after every call into Java to see if an exception was thrown > (so I do), but there's a risk, theoretically at least, that the wrong Process > will see the exception. I don't know whether you have this option available to you, but I wrote a very simple wrapper DLL around a library that forced me to do such error checking. In my case, I was having problems not so much because anything over-wrote the error, but because the error was stored per OS thread, and Dolphin was potentially[*] giving me a different OS thread to check for the error, in which case it didn't work at all. My wrapper DLL looks suspiciously like what (IMHO) the wrapped library should have been exporting. It is a handful of functions that make the "real" call, check the error status, and return that into a buffer provided by Dolphin. It works regardless of which threads Dolphin uses to make the calls. [*] IIRC, typically Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill,
> I don't know whether you have this option available to you, but I wrote > a very simple wrapper DLL around a library that forced me to do such > error checking. Mmmyess... I /could/ do that, but it isn't really the solution I want. For one thing, it doesn't solve the problem with callbacks. For another thing there are some 250 methods that would need wrapping in this way... But mainly, I think that this is a general problem and needs a general solution (not necessarily the one I suggested). I mean if the line is "Dolphin has excellent abilities to interface with external code, all you have to do is write a wrapper DLL for it" then something's wrong... > My wrapper DLL looks suspiciously like what (IMHO) the wrapped library > should have been exporting. I think it's fair not to expect Dolphin to be able to compensate for every oddly/wrongly designed external library. If the library's design is bad enough then creating a sensible wrapper for it may be the only feasible approach. But in the cases I described I don't think the external library /is/ badly designed (at least, not in this way ;-) so I'd like to be able to connect to it without "messing". -- chris |
Chris,
> Mmmyess... I /could/ do that, but it isn't really the solution I want. > > For one thing, it doesn't solve the problem with callbacks. > > For another thing there are some 250 methods that would need wrapping in this > way... I would consider that impractical, to say the least. In my case, I needed one callback (trivial thing too, I have *no* idea why they didn't provide a default - more bad design), and only a few functions. However, it saved my bacon at least in this one case. > But mainly, I think that this is a general problem and needs a general solution > (not necessarily the one I suggested). I mean if the line is "Dolphin has > excellent abilities to interface with external code, all you have to do is > write a wrapper DLL for it" then something's wrong... Indeed. IIRC, Blair was intending to allow D6 to associate an OS thread with each Process (presumably that makes overlapped calls) to avoid the problem I had. But that does not sound ideal either, unless... Blair, would it make sense to have an overlapSameThread (hopefully you can think of a better name) call type that would signal the VM to begin associating a particular OS thread with the calling Process? That might allow most other Proceses to benefit from pooling. Put another way, it would punish only systems that make calls to libraries with thread affinities. > I think it's fair not to expect Dolphin to be able to compensate for every > oddly/wrongly designed external library. If the library's design is bad enough > then creating a sensible wrapper for it may be the only feasible approach. But > in the cases I described I don't think the external library /is/ badly designed > (at least, not in this way ;-) so I'd like to be able to connect to it without > "messing". Reasonable on both counts. However, I wish that "Writing Solid Code" were required reading somewhere. We would have less trouble if that were the case. Maybe we should have it printed on leaflets and drop them over Redmond :) Yes, I know it is an MS Press book, which only adds to the satire :( Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Chris Uppal-3
Chris Uppal wrote:
> rush wrote: > > >>[details snipped] > > > [I'm not sure how strongly relevant this is -- only tangentially, perhaps -- > but you've reminded me of a point I've been meaning to raise for some time, > so...] > > There's a general problem with external interfacing with libraries that use the > "errno" or GetLastError() style of error reporting. Even if those libraries > are carefully designed so that the status flag is thread-safe (thread-local) > the way the Dolphin multiplexes its Processes onto a single OS thread makes it It seems to me that we have a limited number of possibilities here: 1. Dolphin always runs on the same single OS thread. 2. Dolphin runs on a single OS thread at a time, but which one (which OS thread) is effectively random. 3. Dolphin runs on more than one OS thread, but the mapping from Dolphin process to OS thread is not fixed. 4. Dolphin maps Processes 1-1 onto OS threads. Seems the actual situation must be (3). {Because (4) would, if true, be a well known feature, and because (2) would, it appears, make it impossible to receive and/or react to (at least the) conditions reported to the originating, but now wrong, thread; and because (1) would mean that all of Dolphin was blocked until the external call completed ). Even for case (3), Dolphin should always have the correct lastError *somewhere*, if, as you said, the external library is storing it thread-local. And Dolphin needs only to deliver said flag to the associated (initiating) process. Meaning Dolphin must record that "expectedInfo" on "thisOSThread" goes to "thisProcess" (the running process which is about to cause such info to be expected), before the external call is issued. At that point, the lastError processing is safe, because any overlapped call will occur on a different OS thread. > difficult or impossible to avoid the risk that > some Process will pre-emptively see or overwrite > an error intended for some other Process. Presumably, each process has a unique "lastError" slot, into which the arriving per-OS-thread flag is stored, by way of the mapping above? > I have exactly that problem (and no solution) in JNIPort. There, one is > expected to check after every call into Java to see if an exception was thrown > (so I do), but there's a risk, theoretically at least, that the wrong Process > will see the exception. I don't quite see where this specific concern originates? > > A related problem is that when handling a callback from Java into Smalltalk, I > have to change some "global" (to Smalltalk) state changes for the duration of > the callback before anything is "allowed" to talk to Java again. And in a > similar way to the above, there's the possibility that once the Dolphin VM is > in charge again (in the callback) it will schedule some other Process /before/ > the Smalltalk code that is actually handling the callback itself, and that that > Process will make use of Java before the necessary changes have been put in > place. (Or similarly at the end of the callback, may make use of the global > state after it has been returned to "normal" but before the VM has really > returned from the callback) Sounds like a possible circular embrace to me - you're allowing reentry, but you aren't reentrant. St -> Java \ ---- calls back to St \ ---- changes Global state *after which* Java is allowed to be reentered. Is this what you mean? If so, then you either need to "giantLock" the whole St->Java bridge: <PRE> St -> critical: [ isJavaBlocked? /\ EWouldBlock <- Y N -> beJavaBlocked ] \ Java \ -> callback St / Change Global State <- \ -> critical: [beJavaBlockedNot] / proceed <- </PRE> === or ensure that the entirety of the Java->St callback runs to completion (NEVER yields) and runs at higher priority than anything else. > > At least the above are theoretical possibilities according to my understanding > of how the VM works. I have to admit that I've not been able to make either of > them manifest in practise (and I've tried pretty hard) so maybe there's > something I'm missing. Reentrancy is nearly impossible to test for, without special hardware. Usually, the best you can do is "soak test" under known load, for periods which are long enough to have encountered known historical failures with some statistically defensible comfort. (Murphy's law, and corollaries, apply more strongly in this area than perhaps anywhere else.) > > If not then I think what's needed is some way to mark an external call so that > when it returns the calling Process is initially non-interruptible. Probably too late - if you're coordinating w/r/t other processes, you'll need to protect the entry-into and returning-from parts of the external call. Once the external call completes a transition back to the caller, things had better already be in the necessary (process) state - i.e. the same state as before the call. Because there is no way to prevent another process from having *already* breached the safety of the thing you're concerned about, by the time this call has even begun the returning-from transition. { The exception to this is if the "special stuff" is entirely local to the caller, and is *only* referenced *after* the external call. } Similarly > it should be possible to mark an ExternalCallback so that the VM enters it in a > state where the handling process is initially non-interruptible (and has a way > to return to that state to clean up at the end of the callback, before > returning to the VM). Right. Regards, -cstb |
In reply to this post by Schwab,Wilhelm K
Bill,
> [...]I wish that "Writing Solid Code" > were required reading somewhere. We would have less trouble if that > were the case. Maybe we should have it printed on leaflets and drop > them over Redmond :) Yes, I know it is an MS Press book, which only > adds to the satire :( Whisper it, but I've never read that. Not even on my bookshelf waiting to be read. I've heard that it's pretty good, but frankly I find the provenance off-putting... -- chris |
In reply to this post by jas
cstb wrote:
> It seems to me that we have a limited number of possibilities here: > [...] > Seems the actual situation must be (3). Well, we don't have to guess -- Blair has gone over this in some detail before. The following is my understanding. Corrections and amplifications from anyone very much appreciated... All Dolphin Processes share one OS-thread. Always the same thread. Hence all Smalltalk code is running (as far as the OS knows) as one thread -- call that the "Smalltalk thread". When you issue an external call that has been marked "overlapped" the Dolphin VM will execute that call on a separate OS-thread. That thread is either started specially for the occasion, or is found in a pool of threads that are used for this purpose. Thus for overlapped calls there is no "thread affinity". For non-overlapped calls (the bulk of them) there is perfect thread affinity since the external call will always happen on the Smalltalk thread. While an external call is outstanding, if it is not overlapped, all Dolphin processes are blocked (since they all run on the same thread, and that thread is doing something else). If it /is/ overlapped, then other Dolphin Processes can proceeed (running on the Smalltalk thread), and only the issuing Process is blocked (by the VM) until such time as the thread for the overlapped call signals the VM that it has completed. At which time the VM passes the response to the Smalltalk thread where the calling Process is unblocked. The overlapped call thread then falls back into the thread pool, where it will die if it is not fished out and reused in a small time (a few seconds, I think). The overlapped call mechanism is basically a slightly hacky way of allowing one to issue slow external calls without blocking the whole image, but without the very considerable complexity (and probably slowness) of using OS-threads for Smalltalk Processes. On the Smalltalk thread, the VM schedules the Processes preemptively, taking due note of Process priorities. (I don't know what the algorithm is). This is /not/ like the classic mostly-non-preemptive scheduling in (as I understand it) ST-80 and VW. (And a bloody good thing too, IMO). The Dolphin VM uses a few more OS-threads internally (Windows Task Manager shows that it uses 5 threads before any overlapped calls are issued), but I don't know what for. I think Blair mentioned something to do with the "background" garbage collector once, but I'm not clear on why it needs an OS-thread when it isn't actually a classic "backgound" gc algorithm (it still halts the VM). Maybe the COM server stuff needs an OS-thread or two as well. Just guessing... That architecure does make it difficult to interact with external libraries that use a the common errno-style, or GetLastError()-style, way of reporting errors (and other data) back to the called. If such a library is badly designed then it'll use a single global variable to hold the data. Such libraries are becoming rare, and these days the data is normally held thread-local. That's OS-thread-local, since OS-threads are the lingua-franca of multiprocessing (in much the same way as files are the lingua-franca of data persistance). Since Dolphin is running several Processes on the same OS-thread, and since they are preemptive, the possibility exists[*] that some Process will issue an external call (not overlapped), that call will write an error status (or an exception record, in the case of JNIPort) into some thread-local place. If the VM schedules another Process to run before the calling Process reads that data, then there's a chance that the other Process will issue another external call (to the same library) that will overwrite the error flag incorrectly. ([*] or at least, it /might/ exist if my understanding's correct and I'm not missing something -- I'm hoping Blair will pop up to clarify ;-) OTOH, if the external call /is/ overlapped, then the error status will be saved nicely, but in the thread-local storage associated with the helper thread that is probably now back in the thread pool, or is otherwise unavailable. I think this second scenario is the one that is affecting Bill, and for which he needed a helper DLL. My own theoretical problems, and possibly the real ones that 'rush' is seeing too (though not necessarily) are from the first scenario. The Dolphin VM does have the ability to run non-preemptively (see BlockClosure>>critical), and my suggestion is that the above problems (for non-overlapped calls) can be avoided by allowing us to tell the VM that it should disable pre-emption before returning from some external calls, and similarly that it should disable pre-emption before executing the Smalltalk code for some ExternalCallbacks. > > > > A related problem is that when handling a callback from Java into > > Smalltalk, I > > have to change some "global" (to Smalltalk) state changes for the > > duration of > > the callback before anything is "allowed" to talk to Java again. > [...] > Sounds like a possible circular embrace to me - > you're allowing reentry, but you aren't reentrant. There are several possibilities for deadlock (see http://www.metagnostic.org/DolphinSmalltalk/JNIPort/threading.html for lots of detail, if you are interested). And putting a "giant lock" around something that might issue a callback would make it worse. What I need to do (at least theoretically) is something like: Dolphin calls Java { Java calls back into Dolphin { The Dolphin VM disables pre-emption then calls my (Smalltalk) code. My code executes, changes the global pointer, and then re-enables pre-emption. .... stuff happens, including handling the request, but also including other Processes running, until... My code disables pre-emption, changes the global pointer back, and then returns to the VM The VM clears the no-preemption state (which has no immediate effect since no Smalltalk code is executing). } Dolphin returns to Java. } Java returns to Dolphin, where the normal, preemtive, sheduling of Smalltalk code continues. (BTW, it's not obvious from the above that it's safe for other Processes that were using the global pointer before the beginning of the sequence to be allowed to continue, using a different pointer, while the callback is executing. Or, similarly, to start an operation during the callback and complete it after the sequence has finished. In fact it is safe since the system's largely stateless (the JNI external library is designed correctly in this sense), so provided I take care never to cache a local copy of the global pointer (and take a couple of other precautions that are only relevant to JNI), I /think/ it should be OK.) -- chris |
In reply to this post by jas
> Presumably, each process has a unique "lastError"
> slot, into which the arriving per-OS-thread flag > is stored, by way of the mapping above? My understanding is that it indeed does this, but lastError is not enough for all cases, as there are libraries that do similar things with their own flags. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Chris Uppal-3
Chris,
>>[...]I wish that "Writing Solid Code" >>were required reading somewhere. We would have less trouble if that >>were the case. Maybe we should have it printed on leaflets and drop >>them over Redmond :) Yes, I know it is an MS Press book, which only >>adds to the satire :( > > > Whisper it, but I've never read that. Not even on my bookshelf waiting to be > read. I've heard that it's pretty good, but frankly I find the provenance > off-putting... Actually, it's the sub-title ("Microsoft's techniques for writing bug-free C programs") that gets me. However, just because they apparently don't read it, is no reason for you to ignore it :) It might help that the author gives up some nice dirt on the early years of Word, all with a focus on how to learn from the problems they had. Better yet, it is an excellent book. The disassembler (or is it a CPU emulator?? - something like that) discussion alone is worth the price of the book. Anything that goes from candy machines to how to avoid the problems we've been discussing in this thread is a must-read, right? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Free forum by Nabble | Edit this page |