How do diagnose image locks up (cpu 100%) on save?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker
Hi -

The plain Pharo 20619 + RFB image in my dropbox here:
https://dl.dropboxusercontent.com/u/4460862/pharo2RFB.zip freezes when
you save it while the RFB server is running.  The freeze occurs in the
#snapshotPrimitive.


This is the VM info I'm using:

3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
git://gitorious.org/cogvm/blessed.git Commit:
412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100
By: Esteban Lorenzano <[hidden email]> Jenkins build #14535
Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59
UTC 2012 x86_64 GNU/Linux
plugin path: /home/paul/pharo/pharo2.0/bin [default:
/home/paul/pharo/pharo2.0/bin/]


How can I diagnose/fix what is going wrong?


I'm reluctant to make it stop and start the RFB server through the
snapshot because it will kick off all attached clients. Of which there
is at most one and its me, so it wouldn't be too bad but its not
desirable.

It freezes whether there is a client connection or not.


Thanks

Paul

Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Mariano Martinez Peck
If you run the VM from command line and you send a kill -s SIGUSR1  ... it should display the stacktrace of the VM in the console. 
item "Send kill signal:"

hope this help. 

Cheers,



On Thu, Aug 22, 2013 at 2:34 PM, Paul DeBruicker <[hidden email]> wrote:
Hi -

The plain Pharo 20619 + RFB image in my dropbox here:
https://dl.dropboxusercontent.com/u/4460862/pharo2RFB.zip freezes when
you save it while the RFB server is running.  The freeze occurs in the
#snapshotPrimitive.


This is the VM info I'm using:

3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
git://gitorious.org/cogvm/blessed.git Commit:
412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100
By: Esteban Lorenzano <[hidden email]> Jenkins build #14535
Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59
UTC 2012 x86_64 GNU/Linux
plugin path: /home/paul/pharo/pharo2.0/bin [default:
/home/paul/pharo/pharo2.0/bin/]


How can I diagnose/fix what is going wrong?


I'm reluctant to make it stop and start the RFB server through the
snapshot because it will kick off all attached clients. Of which there
is at most one and its me, so it wouldn't be too bad but its not
desirable.

It freezes whether there is a client connection or not.


Thanks

Paul




--
Mariano
http://marianopeck.wordpress.com
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker
In this instance, that doesn't output anything.  Specifically:


$  ps -A | grep pharo
 6001 pts/0    00:00:45 pharo
$ kill -s SIGUSR1 6001
$








Mariano Martinez Peck wrote
If you run the VM from command line and you send a kill -s SIGUSR1  ... it
should display the stacktrace of the VM in the console.
See http://marianopeck.wordpress.com/2012/05/19/pharo-tips-and-tricks/
item "Send kill signal:"

hope this help.

Cheers,



On Thu, Aug 22, 2013 at 2:34 PM, Paul DeBruicker <[hidden email]> wrote:

> Hi -
>
> The plain Pharo 20619 + RFB image in my dropbox here:
> https://dl.dropboxusercontent.com/u/4460862/pharo2RFB.zip freezes when
> you save it while the RFB server is running.  The freeze occurs in the
> #snapshotPrimitive.
>
>
> This is the VM info I'm using:
>
> 3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
> NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> git://gitorious.org/cogvm/blessed.git Commit:
> 412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100
> By: Esteban Lorenzano <[hidden email]> Jenkins build #14535
> Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59
> UTC 2012 x86_64 GNU/Linux
> plugin path: /home/paul/pharo/pharo2.0/bin [default:
> /home/paul/pharo/pharo2.0/bin/]
>
>
> How can I diagnose/fix what is going wrong?
>
>
> I'm reluctant to make it stop and start the RFB server through the
> snapshot because it will kick off all attached clients. Of which there
> is at most one and its me, so it wouldn't be too bad but its not
> desirable.
>
> It freezes whether there is a client connection or not.
>
>
> Thanks
>
> Paul
>
>


--
Mariano
http://marianopeck.wordpress.com
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker
Paul DeBruicker wrote
In this instance, that doesn't output anything.  Specifically:


$  ps -A | grep pharo
 6001 pts/0    00:00:45 pharo
$ kill -s SIGUSR1 6001
$

Oh no wait.  I'm an idiot.  It spits out this in the terminal where the pharo process is running:

stack page bytes 4096 available headroom 3300 minimum unused headroom 3504

        (SIGUSR1)

SIGUSR1 Thu Aug 22 11:18:58 2013


pharo VM version: 3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
Built from: NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid: a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
With: NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid: a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
Revision: git://gitorious.org/cogvm/blessed.git Commit: 412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100 By: Esteban Lorenzano <estebanlm@gmail.com> Jenkins build #14535
Build host: Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59 UTC 2012 x86_64 GNU/Linux
plugin path: /home/paul/Downloads/pharo2.0/bin [default: /home/paul/Downloads/pharo2.0/bin/]


C stack backtrace:
/home/paul/Downloads/pharo2.0/bin/pharo[0x80a0c0c]
/home/paul/Downloads/pharo2.0/bin/pharo[0x80a0e67]
[0xf7771410]
/home/paul/Downloads/pharo2.0/bin/vm-display-X11(+0x10d51)[0xf7765d51]
/home/paul/Downloads/pharo2.0/bin/pharo(ioRelinquishProcessorForMicroseconds+0x14)[0x809e674]
/home/paul/Downloads/pharo2.0/bin/pharo[0x8081a0a]
[0xb7010d11]
/home/paul/Downloads/pharo2.0/bin/pharo(interpret+0x7a6)[0x8094f36]
/home/paul/Downloads/pharo2.0/bin/pharo(main+0x2b3)[0x80a18b3]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7554935]


All Smalltalk process stacks (active first):
Process 0xb899e15c priority 10
0xfff24370 M ProcessorScheduler class>idleProcess 0xb7347240: a(n) ProcessorScheduler class
0xfff24390 I [] in ProcessorScheduler class>startUp 0xb7347240: a(n) ProcessorScheduler class
0xfff243b0 I [] in BlockClosure>newProcess 0xb899e080: a(n) BlockClosure

Process 0xb856be14 priority 50
0xfff263b0 I WeakArray class>finalizationProcess 0xb7347450: a(n) WeakArray class
0xb85ce458 s [] in WeakArray class>restartFinalizationProcess
0xb856bdb4 s [] in BlockClosure>newProcess

Process 0xb85ced20 priority 80
0xfff29350 M Delay class>handleTimerEvent 0xb7349a3c: a(n) Delay class
0xfff29370 I Delay class>runTimerEventLoop 0xb7349a3c: a(n) Delay class
0xfff29390 I [] in Delay class>startTimerEventLoop 0xb7349a3c: a(n) Delay class
0xfff293b0 I [] in BlockClosure>newProcess 0xb85cec44: a(n) BlockClosure

Process 0xb899dc7c priority 60
0xfff2a344 I InputEventFetcher>waitForInput 0xb7326fd8: a(n) InputEventFetcher
0xfff2a370 I InputEventFetcher>eventLoop 0xb7326fd8: a(n) InputEventFetcher
0xfff2a390 I [] in InputEventFetcher>installEventLoop 0xb7326fd8: a(n) InputEventFetcher
0xfff2a3b0 I [] in BlockClosure>newProcess 0xb899dba0: a(n) BlockClosure

Process 0xb899df90 priority 60
0xfff1c370 I SmalltalkImage>lowSpaceWatcher 0xb764de94: a(n) SmalltalkImage
0xfff1c390 I [] in SmalltalkImage>installLowSpaceWatcher 0xb764de94: a(n) SmalltalkImage
0xfff1c3b0 I [] in BlockClosure>newProcess 0xb899deb4: a(n) BlockClosure

Process 0xb74c5848 priority 40
0xfff282fc M [] in Delay>wait 0xb8ae610c: a(n) Delay
0xfff2831c M BlockClosure>ifCurtailed: 0xb8ae6348: a(n) BlockClosure
0xfff28338 M Delay>wait 0xb8ae610c: a(n) Delay
0xfff28358 M WorldState>interCyclePause: 0xb7182620: a(n) WorldState
0xfff28374 M WorldState>doOneCycleFor: 0xb7182620: a(n) WorldState
0xfff28390 M PasteUpMorph>doOneCycle 0xb7173150: a(n) PasteUpMorph
0xfff283b0 I [] in MorphicUIManager>? 0xb7186a5c: a(n) MorphicUIManager
0xb74c57e8 s [] in BlockClosure>?

Most recent primitives
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:

stack page bytes 4096 available headroom 3300 minimum unused headroom 3504

        (SIGUSR1)

Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Sven Van Caekenberghe-2
In reply to this post by Paul DeBruicker

On 22 Aug 2013, at 19:34, Paul DeBruicker <[hidden email]> wrote:

> Hi -
>
> The plain Pharo 20619 + RFB image in my dropbox here:
> https://dl.dropboxusercontent.com/u/4460862/pharo2RFB.zip freezes when
> you save it while the RFB server is running.  The freeze occurs in the
> #snapshotPrimitive.
>
>
> This is the VM info I'm using:
>
> 3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
> NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> git://gitorious.org/cogvm/blessed.git Commit:
> 412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100
> By: Esteban Lorenzano <[hidden email]> Jenkins build #14535
> Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59
> UTC 2012 x86_64 GNU/Linux
> plugin path: /home/paul/pharo/pharo2.0/bin [default:
> /home/paul/pharo/pharo2.0/bin/]
>
>
> How can I diagnose/fix what is going wrong?
>
>
> I'm reluctant to make it stop and start the RFB server through the
> snapshot because it will kick off all attached clients. Of which there
> is at most one and its me, so it wouldn't be too bad but its not
> desirable.
>
> It freezes whether there is a client connection or not.

Paul,

Zinc HTTP Server are stopped/started on each image save. For HTTP 1.1 that is OK, protocol wise. I think that RFB should do something similar to prevent issues like the one you are reporting (and there have been many in the past as well).

Consider this: if you save but do not quit, and you later abort the image hard, you would expect the saved image to work, right. That can only be with a fresh server socket.

Sven

> Thanks
>
> Paul
>


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker
Sven Van Caekenberghe-2 wrote
On 22 Aug 2013, at 19:34, Paul DeBruicker <[hidden email]> wrote:

> Hi -
>
> The plain Pharo 20619 + RFB image in my dropbox here:
> https://dl.dropboxusercontent.com/u/4460862/pharo2RFB.zip freezes when
> you save it while the RFB server is running.  The freeze occurs in the
> #snapshotPrimitive.
>
>
> This is the VM info I'm using:
>
> 3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
> NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
> a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
> git://gitorious.org/cogvm/blessed.git Commit:
> 412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100
> By: Esteban Lorenzano <[hidden email]> Jenkins build #14535
> Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59
> UTC 2012 x86_64 GNU/Linux
> plugin path: /home/paul/pharo/pharo2.0/bin [default:
> /home/paul/pharo/pharo2.0/bin/]
>
>
> How can I diagnose/fix what is going wrong?
>
>
> I'm reluctant to make it stop and start the RFB server through the
> snapshot because it will kick off all attached clients. Of which there
> is at most one and its me, so it wouldn't be too bad but its not
> desirable.
>
> It freezes whether there is a client connection or not.

Paul,

Zinc HTTP Server are stopped/started on each image save. For HTTP 1.1 that is OK, protocol wise. I think that RFB should do something similar to prevent issues like the one you are reporting (and there have been many in the past as well).

Consider this: if you save but do not quit, and you later abort the image hard, you would expect the saved image to work, right. That can only be with a fresh server socket.

Sven

> Thanks
>
> Paul
>


I think the 'abort the image hard' problem is taken care of in the #startUp: method on the RFBServer.  It checks if the image is resuming and the server is/was running and if so stopping then restarting the RFB server.  I don't want to stop and restart the server every time there is a save.  I'm fine doing it on startup after a quit or abort as you describe.  

Anyway, from the stack dump I posted I cannot tell for sure but it seems like a VM bug.  


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Igor Stasenko
In reply to this post by Paul DeBruicker
looks quite healthy to me. image is idle doing nothing.


On 22 August 2013 20:21, Paul DeBruicker <[hidden email]> wrote:
Paul DeBruicker wrote
> In this instance, that doesn't output anything.  Specifically:
>
>
> $  ps -A | grep pharo
>  6001 pts/0    00:00:45 pharo
> $ kill -s SIGUSR1 6001
> $


Oh no wait.  I'm an idiot.  It spits out this in the terminal where the
pharo process is running:

stack page bytes 4096 available headroom 3300 minimum unused headroom 3504

        (SIGUSR1)

SIGUSR1 Thu Aug 22 11:18:58 2013


pharo VM version: 3.9-7 #1 Wed Mar 13 18:22:44 CET 2013 gcc 4.4.3
Built from: NBCoInterpreter NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
With: NBCogit NativeBoost-CogPlugin-EstebanLorenzano.18 uuid:
a53445f9-c0c0-4015-97a3-be7db8d9ed6b Mar 13 2013
Revision: git://gitorious.org/cogvm/blessed.git Commit:
412abef33cbed05cf1d75329e451d71c0c6aa5a7 Date: 2013-03-13 17:48:50 +0100 By:
Esteban Lorenzano <[hidden email]> Jenkins build #14535
Build host: Linux linux-ubuntu-10 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4
11:26:59 UTC 2012 x86_64 GNU/Linux
plugin path: /home/paul/Downloads/pharo2.0/bin [default:
/home/paul/Downloads/pharo2.0/bin/]


C stack backtrace:
/home/paul/Downloads/pharo2.0/bin/pharo[0x80a0c0c]
/home/paul/Downloads/pharo2.0/bin/pharo[0x80a0e67]
[0xf7771410]
/home/paul/Downloads/pharo2.0/bin/vm-display-X11(+0x10d51)[0xf7765d51]
/home/paul/Downloads/pharo2.0/bin/pharo(ioRelinquishProcessorForMicroseconds+0x14)[0x809e674]
/home/paul/Downloads/pharo2.0/bin/pharo[0x8081a0a]
[0xb7010d11]
/home/paul/Downloads/pharo2.0/bin/pharo(interpret+0x7a6)[0x8094f36]
/home/paul/Downloads/pharo2.0/bin/pharo(main+0x2b3)[0x80a18b3]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0xf7554935]


All Smalltalk process stacks (active first):
Process 0xb899e15c priority 10
0xfff24370 M ProcessorScheduler class>idleProcess 0xb7347240: a(n)
ProcessorScheduler class
0xfff24390 I [] in ProcessorScheduler class>startUp 0xb7347240: a(n)
ProcessorScheduler class
0xfff243b0 I [] in BlockClosure>newProcess 0xb899e080: a(n) BlockClosure

Process 0xb856be14 priority 50
0xfff263b0 I WeakArray class>finalizationProcess 0xb7347450: a(n) WeakArray
class
0xb85ce458 s [] in WeakArray class>restartFinalizationProcess
0xb856bdb4 s [] in BlockClosure>newProcess

Process 0xb85ced20 priority 80
0xfff29350 M Delay class>handleTimerEvent 0xb7349a3c: a(n) Delay class
0xfff29370 I Delay class>runTimerEventLoop 0xb7349a3c: a(n) Delay class
0xfff29390 I [] in Delay class>startTimerEventLoop 0xb7349a3c: a(n) Delay
class
0xfff293b0 I [] in BlockClosure>newProcess 0xb85cec44: a(n) BlockClosure

Process 0xb899dc7c priority 60
0xfff2a344 I InputEventFetcher>waitForInput 0xb7326fd8: a(n)
InputEventFetcher
0xfff2a370 I InputEventFetcher>eventLoop 0xb7326fd8: a(n) InputEventFetcher
0xfff2a390 I [] in InputEventFetcher>installEventLoop 0xb7326fd8: a(n)
InputEventFetcher
0xfff2a3b0 I [] in BlockClosure>newProcess 0xb899dba0: a(n) BlockClosure

Process 0xb899df90 priority 60
0xfff1c370 I SmalltalkImage>lowSpaceWatcher 0xb764de94: a(n) SmalltalkImage
0xfff1c390 I [] in SmalltalkImage>installLowSpaceWatcher 0xb764de94: a(n)
SmalltalkImage
0xfff1c3b0 I [] in BlockClosure>newProcess 0xb899deb4: a(n) BlockClosure

Process 0xb74c5848 priority 40
0xfff282fc M [] in Delay>wait 0xb8ae610c: a(n) Delay
0xfff2831c M BlockClosure>ifCurtailed: 0xb8ae6348: a(n) BlockClosure
0xfff28338 M Delay>wait 0xb8ae610c: a(n) Delay
0xfff28358 M WorldState>interCyclePause: 0xb7182620: a(n) WorldState
0xfff28374 M WorldState>doOneCycleFor: 0xb7182620: a(n) WorldState
0xfff28390 M PasteUpMorph>doOneCycle 0xb7173150: a(n) PasteUpMorph
0xfff283b0 I [] in MorphicUIManager>? 0xb7186a5c: a(n) MorphicUIManager
0xb74c57e8 s [] in BlockClosure>?

Most recent primitives
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:
relinquishProcessorForMicroseconds:

stack page bytes 4096 available headroom 3300 minimum unused headroom 3504

        (SIGUSR1)





--
View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704653.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.




--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker
So when you open the image I posted and in the workspace run

RFBServer start.
Smalltalk snapshot: true andQuit: false.


Everything works fine?  It doesn't go to 100% cpu use?
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

NorbertHartl
strange but true I have a similar problem as of today. I don't have RFB installed I just installed zinc and use it. I can reproduce the behavior partially:

Opening the image and saving works. Opening, starting a zinc server does as well. But opening, starting the zinc server and issue a request from a browser freezes the image when saving it. If I only issue one request from a browser the image freezes for something between half a minute and a minute. That smells like a timeout problem to me. The issue requested from the browser ends in "self halt" so there is an exception going on. I didn't switch zinc into debugMode for this.
I wanted to get some more information in the loop by issuing a USR1 signal to the vm when it hangs. But in my case it does not write a dump file into my working directory.

This should be assured behavior that whenever a USR1 signal is received by the vm that it always writes a file? I have plenty of space left on my device.

Norbert

Am 23.08.2013 um 02:13 schrieb Paul DeBruicker <[hidden email]>:

> So when you open the image I posted and in the workspace run
>
> RFBServer start.
> Smalltalk snapshot: true andQuit: false.
>
>
> Everything works fine?  It doesn't go to 100% cpu use?
>
>
>
> --
> View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704698.html
> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

NorbertHartl
I'm not sure me saying "open and saving worked" was right. I cannot open any image that I have saved in the meantime. I only get a white window and no world.


Norbert

Am 23.08.2013 um 12:45 schrieb Norbert Hartl <[hidden email]>:

> strange but true I have a similar problem as of today. I don't have RFB installed I just installed zinc and use it. I can reproduce the behavior partially:
>
> Opening the image and saving works. Opening, starting a zinc server does as well. But opening, starting the zinc server and issue a request from a browser freezes the image when saving it. If I only issue one request from a browser the image freezes for something between half a minute and a minute. That smells like a timeout problem to me. The issue requested from the browser ends in "self halt" so there is an exception going on. I didn't switch zinc into debugMode for this.
> I wanted to get some more information in the loop by issuing a USR1 signal to the vm when it hangs. But in my case it does not write a dump file into my working directory.
>
> This should be assured behavior that whenever a USR1 signal is received by the vm that it always writes a file? I have plenty of space left on my device.
>
> Norbert
>
> Am 23.08.2013 um 02:13 schrieb Paul DeBruicker <[hidden email]>:
>
>> So when you open the image I posted and in the workspace run
>>
>> RFBServer start.
>> Smalltalk snapshot: true andQuit: false.
>>
>>
>> Everything works fine?  It doesn't go to 100% cpu use?
>>
>>
>>
>> --
>> View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704698.html
>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

NorbertHartl
Wooh, it is getting weirder and weirder. I can open the image, start zinc, issue a request, shutdown zinc manually and save. The saved image I can reopen but then the keyboard is broken. I cannot type ^ and the like.

Norbert

Am 23.08.2013 um 12:50 schrieb Norbert Hartl <[hidden email]>:

> I'm not sure me saying "open and saving worked" was right. I cannot open any image that I have saved in the meantime. I only get a white window and no world.
>
>
> Norbert
>
> Am 23.08.2013 um 12:45 schrieb Norbert Hartl <[hidden email]>:
>
>> strange but true I have a similar problem as of today. I don't have RFB installed I just installed zinc and use it. I can reproduce the behavior partially:
>>
>> Opening the image and saving works. Opening, starting a zinc server does as well. But opening, starting the zinc server and issue a request from a browser freezes the image when saving it. If I only issue one request from a browser the image freezes for something between half a minute and a minute. That smells like a timeout problem to me. The issue requested from the browser ends in "self halt" so there is an exception going on. I didn't switch zinc into debugMode for this.
>> I wanted to get some more information in the loop by issuing a USR1 signal to the vm when it hangs. But in my case it does not write a dump file into my working directory.
>>
>> This should be assured behavior that whenever a USR1 signal is received by the vm that it always writes a file? I have plenty of space left on my device.
>>
>> Norbert
>>
>> Am 23.08.2013 um 02:13 schrieb Paul DeBruicker <[hidden email]>:
>>
>>> So when you open the image I posted and in the workspace run
>>>
>>> RFBServer start.
>>> Smalltalk snapshot: true andQuit: false.
>>>
>>>
>>> Everything works fine?  It doesn't go to 100% cpu use?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704698.html
>>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Sven Van Caekenberghe-2
In reply to this post by NorbertHartl

On 23 Aug 2013, at 12:45, Norbert Hartl <[hidden email]> wrote:

> strange but true I have a similar problem as of today. I don't have RFB installed I just installed zinc and use it. I can reproduce the behavior partially:
>
> Opening the image and saving works. Opening, starting a zinc server does as well. But opening, starting the zinc server and issue a request from a browser freezes the image when saving it. If I only issue one request from a browser the image freezes for something between half a minute and a minute. That smells like a timeout problem to me. The issue requested from the browser ends in "self halt" so there is an exception going on. I didn't switch zinc into debugMode for this.
> I wanted to get some more information in the loop by issuing a USR1 signal to the vm when it hangs. But in my case it does not write a dump file into my working directory.
>
> This should be assured behavior that whenever a USR1 signal is received by the vm that it always writes a file? I have plenty of space left on my device.
>
> Norbert

I don't know what is happening, but I just tried something similar in Paul's image:

ZnServer startDefaultOn: 1701
ZnServer default logToTranscript

Open http://localhost:1701/dw-bench in a browser (which keeps the connection open for a while)

Save the image from the World menu, all is OK, with this on the Transcript

2013-08-23 13:14:05 269265 D Executing request/response loop
2013-08-23 13:14:05 269265 I Read a ZnRequest(GET /dw-bench)
2013-08-23 13:14:05 269265 T GET /dw-bench 200 7738B 3ms
2013-08-23 13:14:05 269265 I Wrote a ZnResponse(200 OK text/html;charset=utf-8 7738B)

----SNAPSHOT----an Array(23 August 2013 1:14:19 pm) rfb.image priorSource: 992874

2013-08-23 13:14:19 660707 D Releasing server socket
2013-08-23 13:14:19 660707 I Stopped ZnManagingMultiThreadedServer HTTP port 1701
2013-08-23 13:14:19 660707 D Closing SocketStream[inbuf:4kb/outbuf:16kb]
2013-08-23 13:14:19 269265 D PrimitiveFailed: primitive #primSocketReceiveDataAvailable: in Socket failed while reading request
2013-08-23 13:14:19 269265 D Closing stream
2013-08-23 13:14:19 269265 D Could not remove SocketStream[inbuf:4kb/outbuf:16kb] ignoring

2013-08-23 13:14:19 660707 I Starting ZnManagingMultiThreadedServer HTTP port 1701
2013-08-23 13:14:19 605507 D Initializing server socket

After the save, the number of Processes is OK (one server listener process) and the number of Sockets is OK (the server socket, with its finalization double).

As you can see above, Zn restarts running (managed) servers and tries to close open (worker) connections, swallowing any failures, and eventually cleaning up and recovering.

Now, this is the normal behaviour, maybe something did change somewhere.

Sven

> Am 23.08.2013 um 02:13 schrieb Paul DeBruicker <[hidden email]>:
>
>> So when you open the image I posted and in the workspace run
>>
>> RFBServer start.
>> Smalltalk snapshot: true andQuit: false.
>>
>>
>> Everything works fine?  It doesn't go to 100% cpu use?
>>
>>
>>
>> --
>> View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704698.html
>> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

NorbertHartl

Am 23.08.2013 um 13:21 schrieb Sven Van Caekenberghe <[hidden email]>:

Now, this is the normal behaviour, maybe something did change somewhere.

most likely :)

Norbert
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Paul DeBruicker

Norbert Hartl wrote
Am 23.08.2013 um 13:21 schrieb Sven Van Caekenberghe <[hidden email]>:

> Now, this is the normal behaviour, maybe something did change somewhere.

most likely :)

Norbert

This:

RFBServer start.
Smalltalk snapshot:true andQuit: false.


works fine in Pharo 1.4 on eliots vm ( version 2732) but locks up in the pharo vm.



The SIGUSR1 report looks the similar to the one I posted above.


Should I just downgrade to Pharo 1.4 if I want RFB?  


Thanks

Paul
Reply | Threaded
Open this post in threaded view
|

Re: How do diagnose image locks up (cpu 100%) on save?

Eliot Miranda-2
In reply to this post by NorbertHartl
Hi Norbert,


On Fri, Aug 23, 2013 at 3:45 AM, Norbert Hartl <[hidden email]> wrote:
strange but true I have a similar problem as of today. I don't have RFB installed I just installed zinc and use it. I can reproduce the behavior partially:

Opening the image and saving works. Opening, starting a zinc server does as well. But opening, starting the zinc server and issue a request from a browser freezes the image when saving it. If I only issue one request from a browser the image freezes for something between half a minute and a minute. That smells like a timeout problem to me. The issue requested from the browser ends in "self halt" so there is an exception going on. I didn't switch zinc into debugMode for this.
I wanted to get some more information in the loop by issuing a USR1 signal to the vm when it hangs. But in my case it does not write a dump file into my working directory.

This should be assured behavior that whenever a USR1 signal is received by the vm that it always writes a file? I have plenty of space left on my device.

Weeelllll....  the problem is that if the signal is received during the right phase of garbage collection (specifically the pointer-reversal mark phase) then printing the back-trace can cause a crash.  The VM writes the stack trace to crash.dmp first, and then writes it to stdout.  So if you see an incomplete crash.dmp file you can probably safely infer that the VM crashed when trying to print the stack trace.


The VM could be written to defer the stack trace generation until GC finishes, which would be safer.  It could perhaps immediately generate "in GC, stack trace generation deferred" or some such.  Whne time allows I'll get to this (volunteer efforts welcome).


Norbert

Am 23.08.2013 um 02:13 schrieb Paul DeBruicker <[hidden email]>:

> So when you open the image I posted and in the workspace run
>
> RFBServer start.
> Smalltalk snapshot: true andQuit: false.
>
>
> Everything works fine?  It doesn't go to 100% cpu use?
>
>
>
> --
> View this message in context: http://forum.world.st/How-do-diagnose-image-locks-up-cpu-100-on-save-tp4704639p4704698.html
> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
>





--
best,
Eliot