Status: Accepted
Owner: [hidden email] New issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Tested in Pharo 1.3 one click on Mac OS X and Linux. Evaluating the following code hangs with no errors, log file and/or way to interrupt: ZnServer removeFromSystem. Gofer new squeaksource: 'ZincHTTPComponents'; package: 'Zinc-HTTP'; load. (Smalltalk at: #ZnServer) startDefaultOn: 1701. Proceed from the warnings, the progress bar says 'Initializing...'. --- Apparently, just loading the code using gofer and starting the server is not enough to cause problems; there has to be a significant change, hence the removeFromSystem, before trouble begins. Could someone please confirm this crash/hang ? This reminds me of the other problem that we had with the units tests, where the trait compilation tests before a Zn test involving a server caused problems. http://code.google.com/p/pharo/issues/detail?id=4495 I still don't see a relation, but there has to be something very strange going on here. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #1 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Hangs Pharo 1.4 #14028 with 100% CPU as well. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Updates:
Labels: Milestone-1.4 Importance-High Comment #2 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 (No comment was entered for this change.) _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #3 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Issue 4495 has been merged into this issue. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #4 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Thanks for upgrading the importance of this issue, Stef. But what I am looking for is a confirmation from you, Marcus, Igor, Mariano or anybody else that the above workspace code does hang on your machine and that it *is* indeed a serious issue. It took me more than a full day to isolate and reduce these problems to this simple case. I tried to reduce it further but was not successful. I absolutely understand that everybody's resources are limited, that everybody has its own priorities and that most of us cannot spent 100% of our time on Pharo. I am sure we can tackle this together, it's just that I cannot do this alone. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #5 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Yes, crashed on Cog Jit and Stack (latest Build server) and 14126. But only when executed as one expression, doing it step by step works. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #6 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Further to my comment #17 on issue 4495 (http://code.google.com/p/pharo/issues/detail?id=4495#c17) I see that the following line socket := (listeningSocket waitForAcceptFor: self acceptWaitTimeout) in ZnMultiThreadedServer>>serveConnectionsOn: has an issue in that the socket in always invalid. i.e. the status code in Socket>>waitForConnectionFor: timeout ifTimedOut: timeoutBlock status := self primSocketConnectionStatus: socketHandle always returns 1 which is WaitingForConnection. The default timeout is 300 seconds and it will loop indefinitely, so the VM seems stuck at HighIOPriority. If as mentioned in (comment #17 on issue 4495) the socket returns the correct status. Also if you insert a self halt anywhere before this code and then click 'proceed' it will work. Anyone familiar with the socket primitives care to look? BTW the way I pinpointed this is by sending output to Transcript (set to read only mode and make sure Transcript window visible before running { TraitTest run: #testErrorClassCreation. ZnClientTests run: #testDelete }) [(status = WaitingForConnection) and: [(msecsEllapsed := Time millisecondsSince: startTime) < msecsDelta]] whileTrue: [ semaphore waitTimeoutMSecs: msecsDelta - msecsEllapsed. Transcript show: status printString; show: ' ' , msecsEllapsed printString, ' ', msecsDelta printString; cr. status := self primSocketConnectionStatus: socketHandle]. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #7 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Hangs on my machine as well (Windows), recent 1.4 image, both Cog from Jenkins and 4.1.1 (non-Cog) release from squeak.org. Also works fine if not removing ZnServer first. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #8 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Carlo (I think that is your name), I don't understand your remark about the loop in Socket>>#waitForConnectionFor:ifTimedOut: Even if status always remains the same (WaitingForConnection), the loop will end because of a timeout. no ? Unless Semphore>>#waitTimeoutMSecs: never comes back of course. Still, the strange thing remains the fact that all this only causes trouble if and only if something related to compilation or code loading (as far as we know) happens before it. Sven _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #9 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 As far as I can see this is an endless loop as there is a repeat call applied. So for some reason the primitive always returns a status of WaitingForConnection and then it repeats this over and over. If there is a halt-then-proceed before the Zn code runs then the primitive behaves. Seems like the compiler is affecting the primitive call... ZnMultiThreadedServer>>listenLoop self serveConnectionsOn: serverSocket ] repeat ] ZnMultiThreadedServer>>serveConnectionsOn: socket := (listeningSocket waitForAcceptFor: self acceptWaitTimeout) ifNil: [ ^ self noteAcceptWaitTimedOut ]. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #10 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 All TCP servers are in an infinite loop: their server process waits indefinitevely (or in a loop with a timeout as ZnMultiThreadedServer) for incoming connections. Most servers, like ZnMultiThreadedServer then spawn worker threads to handle each connections. As far as I can see, execution never leaves the #waitForAcceptFor: call, in other words the infinite loop has to occur there (or inside the VM). You can see that by running a server variant with #logToTranscript: ZnServer removeFromSystem. Gofer new squeaksource: 'ZincHTTPComponents'; package: 'Zinc-HTTP'; load. (Smalltalk at: #ZnServer) logToTranscript; startDefaultOn: 1701. The #noteAcceptWaitTimedOut writes to the log, it never appears (although this is not 100% proof since we have a hang). Like you note, 'Seems like the compiler is affecting the primitive call...' _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #11 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Another point: I tried to reduce this even further, to something like recompile some class and then start a server process, open a server socket and then call #waitForAcceptFor: But I failed. I don't think there is something wrong with how the server works (but who knows), this bug is about a weird interaction. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #12 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Very strange... One thing to check is if there are other Socket related bug reports or fixes on the tracker (there is one with fixes: Issue 3346) _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Updates:
Cc: [hidden email] Comment #13 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 I'm really really puzzled by this bug. Igor do you have an idea? _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #14 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Looks like semaphore is broken. Try: { TraitTest run: #testErrorClassCreation. self runSem.} where runSem>> |sem| sem := Semaphore new. sem waitTimeoutMSecs: 5000. Transcript show: 'now.' You will notice that running runSem on it's own will take 5 seconds before outputting to Transcript, whereas running it after the trait test will make it return straight away. This causes that 'hot' loop I discussed above in the listenLoop. Also note that putting a halt before runSem executes will make it run as expected. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #15 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Excellent! Thanks a lot. I rewrote your test a little bit to make it easier to copy and paste: { TraitTest run: #testErrorClassCreation. [ Transcript cr; show: 'Start: '; show: TimeStamp now. Semaphore new waitTimeoutMSecs: 5000. Transcript cr; show: 'Stop: '; show: TimeStamp now. ] value } I can confirm that it seems as if Semphore waiting is broken right after this test, on both 1.3 and 1.4. Executing both elements seperately keeps on working. Also, the above problem seems to be very temporary. Now maybe we should also try to reduce the first element to something simpler. Because we still don't know what seems to be interacting with the Semaphore waiting: it might be compiler, error handling or unit test releated. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #16 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Delay>>schedule ... AccessProtect critical:[ ScheduledDelay := self. TimingSemaphore signal. ]. Also it looks like TimingSemaphore is 'temporarily' empty before it is signalled in the broken case whereas it is not empty in the happy scenario. This means that is is never actually 'scheduled' (in Delay class>handleTimerEvent). i.e. you can copy this method onto DelayWaitTimeout and then add a 'Transcript show: 'isEmpty? ', TimingSemaphore isEmpty printString.' just before it is signalled to see what I mean. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
In reply to this post by pharo
Comment #18 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 We minimized the code from the test a bit more. We can reproduce it with this: | tmpCategory t1 t2 | tmpCategory := 'TemporaryGeneratedClasses'. Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ]. nil subclass: #AClass instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: tmpCategory. "----------------" Object subclass: #AClass instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: tmpCategory. Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ]. [ t1 := Time millisecondClockValue. Semaphore new waitTimeoutMSecs: 5000. t2 := Time millisecondClockValue. ] value. Transcript cr; show: 'Start: '; show: t1. Transcript cr; show: 'Stop: '; show: t2. _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #19 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 Even simpler: | t1 t2 | Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ]. Number subclass: #AClass instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'TemporaryGeneratedClasses'. Object subclass: #AClass instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'TemporaryGeneratedClasses'. Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ]. t1 := Time millisecondClockValue. Semaphore new waitTimeoutMSecs: 5000. t2 := Time millisecondClockValue. Transcript cr; show: 'Delta: '; show: t2 - t1 _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Comment #20 on issue 4768 by [hidden email]: Code loading/compilation followed by starting a TCP server results in a hang http://code.google.com/p/pharo/issues/detail?id=4768 So this means: ClassBuilder. Horror... and people really think that cleaning up is not the way to go? _______________________________________________ Pharo-bugtracker mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker |
Free forum by Nabble | Edit this page |