Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo
Status: Accepted
Owner: [hidden email]

New issue 4768 by [hidden email]: Code loading/compilation followed  
by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Tested in Pharo 1.3 one click on Mac OS X and Linux.

Evaluating the following code hangs with no errors, log file and/or way to  
interrupt:

ZnServer removeFromSystem.
Gofer new
       squeaksource: 'ZincHTTPComponents';
       package: 'Zinc-HTTP';
       load.
(Smalltalk at: #ZnServer) startDefaultOn: 1701.

Proceed from the warnings, the progress bar says 'Initializing...'.

---

Apparently, just loading the code using gofer and starting the server is  
not enough to cause problems; there has to be a significant change, hence  
the removeFromSystem, before trouble begins.

Could someone please confirm this crash/hang ?

This reminds me of the other problem that we had with the units tests,  
where the trait compilation tests before a Zn test involving a server  
caused problems.

http://code.google.com/p/pharo/issues/detail?id=4495

I still don't see a relation, but there has to be something very strange  
going on here.



_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #1 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Hangs Pharo 1.4 #14028 with 100% CPU as well.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo
Updates:
        Labels: Milestone-1.4 Importance-High

Comment #2 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

(No comment was entered for this change.)


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #3 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Issue 4495 has been merged into this issue.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #4 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Thanks for upgrading the importance of this issue, Stef.

But what I am looking for is a confirmation from you, Marcus, Igor, Mariano  
or anybody else that the above workspace code does hang on your machine and  
that it *is* indeed a serious issue.

It took me more than a full day to isolate and reduce these problems to  
this simple case. I tried to reduce it further but was not successful.

I absolutely understand that everybody's resources are limited, that  
everybody has its own priorities and that most of us cannot spent 100% of  
our time on Pharo.

I am sure we can tackle this together, it's just that I cannot do this  
alone.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #5 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Yes, crashed on Cog Jit and Stack (latest Build server) and 14126.

But only when executed as one expression, doing it step by step works.



_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #6 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Further to my comment #17 on issue 4495  
(http://code.google.com/p/pharo/issues/detail?id=4495#c17)

I see that  the following line
socket := (listeningSocket waitForAcceptFor: self acceptWaitTimeout)
in
ZnMultiThreadedServer>>serveConnectionsOn:
has an issue in that the socket in always invalid.
i.e.  the status code in Socket>>waitForConnectionFor: timeout ifTimedOut:  
timeoutBlock
status := self primSocketConnectionStatus: socketHandle
always returns 1 which is WaitingForConnection.
The default timeout is 300 seconds and it will loop indefinitely, so the VM  
seems stuck at HighIOPriority.
If as mentioned in (comment #17 on issue 4495) the socket returns the  
correct status.
Also if you insert a self halt anywhere before this code and then  
click 'proceed' it will work.

Anyone familiar with the socket primitives care to look?

BTW the way I pinpointed this is by sending output to Transcript (set to  
read only mode and make sure Transcript window visible before running {  
TraitTest run: #testErrorClassCreation. ZnClientTests run: #testDelete })

[(status = WaitingForConnection) and: [(msecsEllapsed := Time  
millisecondsSince: startTime) < msecsDelta]]
                whileTrue: [
                        semaphore waitTimeoutMSecs: msecsDelta - msecsEllapsed.
                        Transcript show: status printString; show: ' ' , msecsEllapsed  
printString, ' ', msecsDelta printString; cr.
                        status := self primSocketConnectionStatus: socketHandle].



_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #7 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Hangs on my machine as well (Windows), recent 1.4 image, both Cog from  
Jenkins and 4.1.1 (non-Cog) release from squeak.org.

Also works fine if not removing ZnServer first.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #8 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Carlo (I think that is your name),

I don't understand your remark about the loop in  
Socket>>#waitForConnectionFor:ifTimedOut:

Even if status always remains the same (WaitingForConnection), the loop  
will end because of a timeout. no ? Unless Semphore>>#waitTimeoutMSecs:  
never comes back of course.

Still, the strange thing remains the fact that all this only causes trouble  
if and only if something related to compilation or code loading (as far as  
we know) happens before it.

Sven


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #9 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

As far as I can see this is an endless loop as there is a repeat call  
applied.
So for some reason the primitive always returns a status of  
WaitingForConnection and then it repeats this over and over. If there is a  
halt-then-proceed before the Zn code runs then the primitive behaves. Seems  
like the compiler is affecting the primitive call...

ZnMultiThreadedServer>>listenLoop
self serveConnectionsOn: serverSocket ] repeat ]

ZnMultiThreadedServer>>serveConnectionsOn:
socket := (listeningSocket waitForAcceptFor: self acceptWaitTimeout) ifNil:  
[
                ^ self noteAcceptWaitTimedOut ].


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #10 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

All TCP servers are in an infinite loop: their server process waits  
indefinitevely (or in a loop with a timeout as ZnMultiThreadedServer) for  
incoming connections. Most servers, like ZnMultiThreadedServer then spawn  
worker threads to handle each connections.

As far as I can see, execution never leaves the #waitForAcceptFor: call, in  
other words the infinite loop has to occur there (or inside the VM). You  
can see that by running a server variant with #logToTranscript:

ZnServer removeFromSystem.
Gofer new
       squeaksource: 'ZincHTTPComponents';
       package: 'Zinc-HTTP';
       load.
(Smalltalk at: #ZnServer) logToTranscript; startDefaultOn: 1701.

The #noteAcceptWaitTimedOut writes to the log, it never appears (although  
this is not 100% proof since we have a hang).

Like you note, 'Seems like the compiler is affecting the primitive call...'


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #11 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Another point: I tried to reduce this even further, to something like

   recompile some class and then start a server process, open a server  
socket and then call #waitForAcceptFor:

But I failed.

I don't think there is something wrong with how the server works (but who  
knows), this bug is about a weird interaction.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #12 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Very strange...

One thing to check is if there are other Socket related bug reports or  
fixes on the tracker
(there is one with fixes: Issue 3346)


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo
Updates:
        Cc: [hidden email]

Comment #13 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

I'm really really puzzled by this bug.  Igor do you have an idea?



_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #14 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Looks like semaphore is broken.

Try:
{ TraitTest run: #testErrorClassCreation. self runSem.}

where runSem>>
|sem|
sem := Semaphore new.
sem waitTimeoutMSecs: 5000.
Transcript show: 'now.'

You will notice that running runSem on it's own will take 5 seconds before  
outputting to Transcript, whereas running it after the trait test will make  
it return straight away.
This causes that 'hot' loop I discussed above in the listenLoop.
Also note that putting a halt before runSem executes will make it run as  
expected.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #15 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Excellent! Thanks a lot.

I rewrote your test a little bit to make it easier to copy and paste:

{
        TraitTest run: #testErrorClassCreation.
        [
                Transcript cr; show: 'Start: '; show: TimeStamp now.
                Semaphore new waitTimeoutMSecs: 5000.
                Transcript cr; show: 'Stop: '; show: TimeStamp now.
        ] value
}

I can confirm that it seems as if Semphore waiting is broken right after  
this test, on both 1.3 and 1.4.

Executing both elements seperately keeps on working. Also, the above  
problem seems to be very temporary.

Now maybe we should also try to reduce the first element to something  
simpler. Because we still don't know what seems to be interacting with the  
Semaphore waiting: it might be compiler, error handling or unit test  
releated.


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #16 on issue 4768 by [hidden email]: Code loading/compilation  
followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Delay>>schedule
...
AccessProtect critical:[
                ScheduledDelay := self.
                TimingSemaphore signal.
        ].

Also it looks like TimingSemaphore is 'temporarily' empty before it is  
signalled in the broken case whereas it is not empty in the happy scenario.  
This means that is is never actually 'scheduled' (in Delay  
class>handleTimerEvent).

i.e. you can copy this method onto DelayWaitTimeout and then add a
'Transcript show: 'isEmpty? ', TimingSemaphore isEmpty printString.'
just before it is signalled to see what  I mean.





_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo
In reply to this post by pharo

Comment #18 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

We minimized the code from the test a bit more. We can reproduce it with  
this:


| tmpCategory t1 t2 |
        tmpCategory := 'TemporaryGeneratedClasses'.
        Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ].
        nil
                subclass: #AClass
                instanceVariableNames: ''
                classVariableNames: ''
                poolDictionaries: ''
                category: tmpCategory. "----------------"
        Object
                subclass: #AClass
                instanceVariableNames: ''
                classVariableNames: ''
                poolDictionaries: ''
                category: tmpCategory.
        Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ].
        [
                t1 :=  Time millisecondClockValue.
                Semaphore new waitTimeoutMSecs: 5000.
                t2 :=  Time millisecondClockValue.
        ] value.

Transcript cr; show: 'Start: '; show: t1.
Transcript cr; show: 'Stop: '; show: t2.




_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #19 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

Even simpler:

        |  t1 t2 |
        Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ].
        Number
                subclass: #AClass
                instanceVariableNames: ''
                classVariableNames: ''
                poolDictionaries: ''
                category: 'TemporaryGeneratedClasses'.
        Object
                subclass: #AClass
                instanceVariableNames: ''
                classVariableNames: ''
                poolDictionaries: ''
                category: 'TemporaryGeneratedClasses'.
        Smalltalk globals at: #AClass ifPresent: [ :v | v removeFromSystem ].
        t1 :=  Time millisecondClockValue.
        Semaphore new waitTimeoutMSecs: 5000.
        t2 :=  Time millisecondClockValue.

        Transcript cr; show: 'Delta: '; show: t2 - t1





_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
Reply | Threaded
Open this post in threaded view
|

Re: Issue 4768 in pharo: Code loading/compilation followed by starting a TCP server results in a hang

pharo

Comment #20 on issue 4768 by [hidden email]: Code  
loading/compilation followed by starting a TCP server results in a hang
http://code.google.com/p/pharo/issues/detail?id=4768

So this means: ClassBuilder. Horror... and people really think that  
cleaning up is not the way to go?


_______________________________________________
Pharo-bugtracker mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-bugtracker
12