Re: [Seaside] my site is completely dead..

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Seaside] my site is completely dead..

Sven Van Caekenberghe-2

On 29 Jan 2013, at 19:29, Gastón Dall' Oglio <[hidden email]> wrote:

> Hi Sergio.
>
> Some weeks ago I had deal with an image that works normally, whereas an Seaside app within it was not responding (until that time when an app was not responding always was because the image was hung).
>
> I dis some forense analisis in this image :), and I saw several zombies forked process, in really with an very long timeout in semaphore. See screenshot, in left side I inspected the Semaphore to see DelayWaitTimeout at one sane forked process, while the right same for one broken. Plus, note that these broken process don't die if you stop and start the Seaside server adaptor.
>
> To see that in your image, open an Process Browser (and turn on auto-update) and see if there are several process "ZnManagingMultiThreadedServer HTTP worker", if so, then terminate some of them and see if site begin to respond. My app began to respond after termine several of them.
>
> I guess that this problem occurred when I save AND QUIT the image whereas exist those forked processes.
>
> The solution to that the image begin to respond again was kill (terminate) manually all processes "ZnManagingMultiThreadedServer HTTP worker", and in the future be aware that there isn't workers running when I save the imagen.
>
> I don't know if it is a bug, if we think that yes I can give more data about context (my image, package versions, SO, …).

Some clarifications: ZnManagingMultiThreadedServer has one server process listening for and accepting incoming requests, forking a worker process each time. Such a worker process will loop over HTTP 1.1 request/response cycles until the other end closes or something goes wrong. There is currently no timeout as such but of course the socket connection dies eventually, so that is almost the same thing.

The 'Managing' aspect means that the server keeps track of all open connections or socket streams. When the server is stopped, all the connections will be closed. The idea is that all the worker processes using these connections (a one to one mapping) will eventually get an exception that is then handled by cleaning up and finally stopping.

This last mechanism, the closing of a socket stream from another process resulting in an exception in a process using that connection does not work identically or equally well on the different platforms (Mac, Windows, Linux) because these have completely different socket implementations in the VM. Saving an image interacts with this is various subtle ways.

On my main development platform, Mac, I see no problems. In my production deploys on Linux things are fine too. But I do various things to minimise problems:

- my images hold no 'running' server(s), these are always created and started freshly using a startup script
- I never save images after that
- all the images are controlled by init.d scripts to start automatically with the machine
- all my images are controlled by monit so that they restart automatically when they stop working
- most of the time, I have multiple images under a load balancer, statefull or stateless, to improve availability and capacity
- the load balancer also functions as a sanitizer and controller of incoming requests protecting the images
- the load balancer can handle static resources directly, off loading work from the images

http://zn.stfx.eu/zn/index.html#livedemo
http://stfx.eu/pharo-server/

Yes, like any computer program, a Smalltalk vm+image combination has limits: there is some maximum number of processes and connections that can be running and open at the same time and there are general memory limits. I am pretty sure that with a setup like the one I described above production systems handling hundreds to thousands requests per second are possible.

Sven

> 2013/1/29 sergio_101 <[hidden email]>
> i think i need to bring the image local, and see what's going on.. i am moving it to a new server this week anyway..
>
> thanks!
>
>
> On Tue, Jan 29, 2013 at 8:18 AM, sergio_101 <[hidden email]> wrote:
> hey, dale.. it seems like lately, i am seeing this problem at least once a week. there were times when i would run problem free for months, but not lately..
>
>
> On Tue, Jan 29, 2013 at 1:24 AM, Dale Henrichs <[hidden email]> wrote:
> Sergio,
>
> Most of my experience is from working with GemStone, which is different animal, so take what I say with a grain of salt.
>
> If the running image is completely frozen, then you don't have much choice but to kill it and restart ... hopefully you haven't lost any data ...
>
> If after restart you see the problem again, then you might be able to debug the issue by copying the image to a local machine and bringing it up ... If the problem doesn't reproduce, I'd still be inclined to take a copy of the image and attempt to understand the particular problem.
>
> It's hard to tell from the screen shot what the thread is doing or even which thread it is ... it's not likely that the thread is a seaside application thread because those are normally forked and will sit around with an open debugger, but not necessarily affect the image itself. So I can't really guess what operation is causing trouble ...
>
> If you're lucky you can reproduce the problem on your local machine ... If you search the pharo bug list you might find a bug in this area and from that we might be able to figure out which thread is the bad boy and there might even be a fix ..
>
> You mentioned stability...are you seeing this particular problem occur often or are you seeing different issues?
>
> Dale
>
> ----- Original Message -----
> | From: "sergio t. ruiz" <[hidden email]>
> | To: "discussion" <[hidden email]>
> | Sent: Monday, January 28, 2013 9:59:11 PM
> | Subject: [Seaside] my site is completely dead..
> |
> |
> | my site completely died today. i tried logging in with vnc, and it
> | seems just stuck.. i can't do anything to it..
> |
> | anyone have any ideas?  i really need this thing to run consistently
> | ..
> |
> | here is a screenshot of its current state:
> |
> | http://db.tt/eVxJX6lr
> |
> | thanks!
> |
> |
> | ----
> | peace,
> | sergio
> | photographer, journalist, visionary
> |
> | http://www.ThoseOptimizeGuys.com
> | http://www.CodingForHire.com
> | http://www.coffee-black.com
> | http://www.painlessfrugality.com
> | http://www.twitter.com/sergio_101
> | http://www.facebook.com/sergio101
> |
> |
> |
> | _______________________________________________
> | seaside mailing list
> | [hidden email]
> | http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> |
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>
>
> --
> ----
> peace,
> sergio
> photographer, journalist, visionary
>
> http://www.ThoseOptimizeGuys.com
> http://www.CodingForHire.com
> http://www.coffee-black.com
> http://www.painlessfrugality.com
> http://www.twitter.com/sergio_101
> http://www.facebook.com/sergio101
>
>
>
> --
> ----
> peace,
> sergio
> photographer, journalist, visionary
>
> http://www.ThoseOptimizeGuys.com
> http://www.CodingForHire.com
> http://www.coffee-black.com
> http://www.painlessfrugality.com
> http://www.twitter.com/sergio_101
> http://www.facebook.com/sergio101
>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>
> <image1.png>_______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill