Squeak and Seaside Stability

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Squeak and Seaside Stability

Ramon Leon-4
Has anyone had an issue with a deployment of Seaside locking up, and
killing access to the box via networking when hosting on windows server
2003?  I'm running a 3.9 image in prod and on occasion it stops working,
but when it dies, it seems to take out the network and other scheduled
processes that copy files across the network start failing as well.  
Killing Squeak and restarting it fixes it and allows the network to
start working again as well, I haven't a clue what's going on, and
Squeak itself, goes black and I loose the UI, so I can't even debug the
issue.  I've had to resort to running Squeak as a service and resetting
it on a schedule to give the appearance of stability, but I'm not too
happy with that solution.  Anyone have any ideas?

Ramon Leon
http://onsmalltalk.com


Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Peter Crowther-2
> From: Ramon Leon
> Has anyone had an issue with a deployment of Seaside locking up, and
> killing access to the box via networking when hosting on windows
server
> 2003?
[...]
> I've had to resort to running Squeak as a service and resetting
> it on a schedule to give the appearance of stability, but I'm not too
> happy with that solution.  Anyone have any ideas?

Sounds like exhaustion of some OS network-related resource, that is then
released when the process exits.  This is reinforced by your observation
that regular restarts of the process remove the problem.  Naively I'd
suggest monitoring the handle count for the Squeak process as a first
step, but Andreas probably has some much better ideas for monitoring!

                - Peter

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Ramon Leon-4
> Sounds like exhaustion of some OS network-related resource, that is then
> released when the process exits.  This is reinforced by your observation
> that regular restarts of the process remove the problem.  Naively I'd
> suggest monitoring the handle count for the Squeak process as a first
> step, but Andreas probably has some much better ideas for monitoring!
>
> - Peter

Looking at the handle count, I'm seeing a fresh image start with around
40 or so, and an image that's been up a bit, pushing 5000, much more
than any other process on the box.  I'm still waiting for another crash,
but this seems a likely suspect, any more ideas?


Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Peter Crowther-2
> From: Ramon Leon
> Looking at the handle count, I'm seeing a fresh image start with
around
> 40 or so, and an image that's been up a bit, pushing 5000, much more
> than any other process on the box.

OK, this feels like a promising source.

> I'm still waiting for another crash,
> but this seems a likely suspect, any more ideas?

If it's a networking issue, copies of handle.exe and tdimon.exe (both
from www.sysinternals.com) may be useful - if I recall correctly, they
may tell you for what the handles are being used.  Then it's a case of
reviewing the code that opens that kind of thing and seeing whether it
disposes of the object correctly afterwards - could be a VM issue, could
be an image issue.

                - Peter

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Ramon Leon-4
Peter Crowther wrote:

>>From: Ramon Leon
>>Looking at the handle count, I'm seeing a fresh image start with
>
> around
>
>>40 or so, and an image that's been up a bit, pushing 5000, much more
>>than any other process on the box.
>
>
> OK, this feels like a promising source.
>
>
>>I'm still waiting for another crash,
>>but this seems a likely suspect, any more ideas?
>
>
> If it's a networking issue, copies of handle.exe and tdimon.exe (both
> from www.sysinternals.com) may be useful - if I recall correctly, they
> may tell you for what the handles are being used.  Then it's a case of
> reviewing the code that opens that kind of thing and seeing whether it
> disposes of the object correctly afterwards - could be a VM issue, could
> be an image issue.
>
> - Peter

Appreciate the tips, I'll try them out and see if it leads anywhere when
I go to work tomorrow.


Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
In reply to this post by Peter Crowther-2
> If it's a networking issue, copies of handle.exe and
> tdimon.exe (both from www.sysinternals.com) may be useful -
> if I recall correctly, they may tell you for what the handles
> are being used.  Then it's a case of reviewing the code that
> opens that kind of thing and seeing whether it disposes of
> the object correctly afterwards - could be a VM issue, could
> be an image issue.
>
> - Peter

OK, playing with handle, and seems they are thread handles.

Squeak.exe in task manager has 14,702 handles, 7 threads.

Yet handle -s shows 15000+ Thread handles

Locally, I can see both the handle count and thread count spike when I do a
soap call in a loop forking each call, which would kind of simulate my live
environment, the Seaside app doing soap calls on a forked process and
polling for the result.  Seems somehow I'm leaving thread handles hanging
around, any idea what might cause this or how I can track it down?

Ramon Leon
http://onsmalltalk.com 


Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
> OK, playing with handle, and seems they are thread handles.
>
> Squeak.exe in task manager has 14,702 handles, 7 threads.
>
> Yet handle -s shows 15000+ Thread handles
>
> Locally, I can see both the handle count and thread count
> spike when I do a soap call in a loop forking each call,
> which would kind of simulate my live environment, the Seaside
> app doing soap calls on a forked process and polling for the
> result.  Seems somehow I'm leaving thread handles hanging
> around, any idea what might cause this or how I can track it down?
>
> Ramon Leon
> http://onsmalltalk.com 
>
>

OK, I've found the offending line of code.  I'm using

NetNameResolver localHostName

To print the web server name in the html source code for debugging purposes,
and turns out each time it's called, it leaves a handle hanging.

10000 timesRepeat: [NetNameResolver localHostName]

Confirms to me that this is my bug.  So... Anyone know a reliable method of
getting the computers name that doesn't leak like a sieve?

Ramon Leon
http://onsmalltalk.com 


Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Andreas.Raab
Ramon Leon wrote:
> 10000 timesRepeat: [NetNameResolver localHostName]
>
> Confirms to me that this is my bug.  So... Anyone know a reliable method of
> getting the computers name that doesn't leak like a sieve?

What Windows version are you running? I've just run the above code on XP
and everything went fine, i.e., no handles were leaked.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
> Ramon Leon wrote:
> > 10000 timesRepeat: [NetNameResolver localHostName]
> >
> > Confirms to me that this is my bug.  So... Anyone know a reliable
> > method of getting the computers name that doesn't leak like a sieve?
>
> What Windows version are you running? I've just run the above
> code on XP and everything went fine, i.e., no handles were leaked.
>
> Cheers,
>    - Andreas

I'm using XP Professional Service Pack 2, and I just reconfirmed that this
leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me.

Ramon Leon
http://onsmalltalk.com 


Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
> > Ramon Leon wrote:
> > > 10000 timesRepeat: [NetNameResolver localHostName]
> > >
> > > Confirms to me that this is my bug.  So... Anyone know a reliable
> > > method of getting the computers name that doesn't leak
> like a sieve?
> >
> > What Windows version are you running? I've just run the
> above code on
> > XP and everything went fine, i.e., no handles were leaked.
> >
> > Cheers,
> >    - Andreas
>
> I'm using XP Professional Service Pack 2, and I just
> reconfirmed that this leaks handles for both Squeak 3.8.1 and
> Squeak 3.9 for me.

And in production, I'm using Windows Server 2003, also leaks.

Ramon Leon
http://onsmalltalk.com 


Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Jon Hylands
In reply to this post by Andreas.Raab
On Tue, 19 Dec 2006 10:08:10 -0800, Andreas Raab <[hidden email]>
wrote:

> What Windows version are you running? I've just run the above code on XP
> and everything went fine, i.e., no handles were leaked.

I am running Windows XP Pro (Version 5.1.2600 Service Pack 2 Build 2600)
and confirmed that the same thing happens to me as what happened to Ramon.

The handle count went from 142 to 10,142...

Later,
Jon

--------------------------------------------------------------
   Jon Hylands      [hidden email]      http://www.huv.com/jon

  Project: Micro Raptor (Small Biped Velociraptor Robot)
           http://www.huv.com/blog

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Andreas.Raab
In reply to this post by Ramon Leon-5
Ramon Leon wrote:
> I'm using XP Professional Service Pack 2, and I just reconfirmed that this
> leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me.

Then I need you to debug some more stuff:

1) Which VM are you using?

2) Does the following leak?

      1000 timesRepeat: [NetNameResolver localHostAddress]

3) Does the following leak?
    How fast does it execute?
    How many stars do you get in the transcript?

    addr := NetNameResolver localHostAddress.
    [
      1000 timesRepeat: [
          (NetNameResolver
              nameForAddress: addr
              timeout: 5) ifNil:[Transcript show: '*'].
      ]
    ] timeToRun.

4) If test 3) does leak, increase the timeout to 50 and re-run. Does it
still leak? How fast does it execute?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Jon Hylands
In reply to this post by Jon Hylands
On Tue, 19 Dec 2006 13:23:04 -0500, Jon Hylands <[hidden email]> wrote:

> I am running Windows XP Pro (Version 5.1.2600 Service Pack 2 Build 2600)
> and confirmed that the same thing happens to me as what happened to Ramon.

I also tried the same thing on my other laptop, which is running XP Home
Edition (also Version 5.1.2600 Service Pack 2 Build 2600).

Same results - handle count went up by 10,000.

Later,
Jon

--------------------------------------------------------------
   Jon Hylands      [hidden email]      http://www.huv.com/jon

  Project: Micro Raptor (Small Biped Velociraptor Robot)
           http://www.huv.com/blog

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Jon Hylands
In reply to this post by Andreas.Raab
On Tue, 19 Dec 2006 10:26:25 -0800, Andreas Raab <[hidden email]>
wrote:

> 1) Which VM are you using?

3.7.1 (release) from Sept 23, 2004
Compiler: gcc 2.95.2 19991024 (release)

> 2) Does the following leak?
>
>       1000 timesRepeat: [NetNameResolver localHostAddress]

No.

> 3) Does the following leak?

Yes.

>     How fast does it execute?

283 ms

>     How many stars do you get in the transcript?

None.

>     addr := NetNameResolver localHostAddress.
>     [
>       1000 timesRepeat: [
>           (NetNameResolver
>               nameForAddress: addr
>               timeout: 5) ifNil:[Transcript show: '*'].
>       ]
>     ] timeToRun.
>
> 4) If test 3) does leak, increase the timeout to 50 and re-run. Does it
> still leak? How fast does it execute?

Same results, 1000 more handles, no stars on the transcript, run time was
265 ms.

This is on Squeak 3.8 (6665) on XP Pro.

Later,
Jon

--------------------------------------------------------------
   Jon Hylands      [hidden email]      http://www.huv.com/jon

  Project: Micro Raptor (Small Biped Velociraptor Robot)
           http://www.huv.com/blog

Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
In reply to this post by Andreas.Raab
> Then I need you to debug some more stuff:
>
> 1) Which VM are you using?

3.7.1

> 2) Does the following leak?
>
>       1000 timesRepeat: [NetNameResolver localHostAddress]

Nope.

> 3) Does the following leak?

Yup

>     How fast does it execute?

387

>     How many stars do you get in the transcript?

None.

>
> 4) If test 3) does leak, increase the timeout to 50 and
> re-run. Does it still leak? How fast does it execute?

Still leaks, finishes in about the same on average.

>
> Cheers,
>    - Andreas
 


Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

keith1y
Ramon Leon wrote:
>> Then I need you to debug some more stuff:
>>
>> 1) Which VM are you using?
>>    
>
>  
If the millisecondClock is close to rolling over then the deadline may
be set in the future to a number greater than SmallInteger maxVal and
timeouts will never complete.

For Socket the default timeout is 45 seconds. So thinking about it this
error is only likely to occur for 45 seconds every 12 days or so, but it
will occur if there is code which relies upon the timeout itself.

Now this is a complete guess but it might be your explanation. What if
somehow your millisecond clock has failed to roll over and might be
stuck in that dangerous region ie. at SmallInteger maxVal.

Keith




               
___________________________________________________________
Try the all-new Yahoo! Mail. "The New Version is radically easier to use" – The Wall Street Journal
http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

keith1y
A fix for the smalltalk bug is to make sure that the timeout calculation
similarly rolls over.
i.e.

deadlineSecs: secs
"Return a deadline time the given number of seconds from now."

^ (Time millisecondClockValue + (secs * 1000) truncated) \\ SmallInteger
maxVal.


The code in question does not use Socket-#deadlineSecs: and there are
many many places in the image that could be caught by this. The solution
is to put this code on Time and encourage its use.

Time-deadlineSecs:
Time-pastDeadline: deadline

of course if your millisecondClock has got stuck then its a vm problem

Keith

> If the millisecondClock is close to rolling over then the deadline may
> be set in the future to a number greater than SmallInteger maxVal and
> timeouts will never complete.
>
> For Socket the default timeout is 45 seconds. So thinking about it
> this error is only likely to occur for 45 seconds every 12 days or so,
> but it will occur if there is code which relies upon the timeout itself.
>
> Now this is a complete guess but it might be your explanation. What if
> somehow your millisecond clock has failed to roll over and might be
> stuck in that dangerous region ie. at SmallInteger maxVal.
>
> Keith
>
>
>
>
>
> ___________________________________________________________ Try the
> all-new Yahoo! Mail. "The New Version is radically easier to use" –
> The Wall Street Journal http://uk.docs.yahoo.com/nowyoucan.html
>
>


               
___________________________________________________________
All New Yahoo! Mail – Tired of Vi@gr@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html

Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Philippe Marschall
In reply to this post by Ramon Leon-5
2006/12/19, Ramon Leon <[hidden email]>:

> > Ramon Leon wrote:
> > > 10000 timesRepeat: [NetNameResolver localHostName]
> > >
> > > Confirms to me that this is my bug.  So... Anyone know a reliable
> > > method of getting the computers name that doesn't leak like a sieve?
> >
> > What Windows version are you running? I've just run the above
> > code on XP and everything went fine, i.e., no handles were leaked.
> >
> > Cheers,
> >    - Andreas
>
> I'm using XP Professional Service Pack 2, and I just reconfirmed that this
> leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me.

You don't happen to have the Windows "firewall" on, do you?

Philippe

Reply | Threaded
Open this post in threaded view
|

RE: Squeak and Seaside Stability

Ramon Leon-5
> > I'm using XP Professional Service Pack 2, and I just
> reconfirmed that
> > this leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me.
>
> You don't happen to have the Windows "firewall" on, do you?
>
> Philippe

Nope, I hate that thing.

Ramon Leon
http://onsmalltalk.com 


Reply | Threaded
Open this post in threaded view
|

Re: Squeak and Seaside Stability

Andreas.Raab
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
> You don't happen to have the Windows "firewall" on, do you?

I don't, but have you seen problem when Windows firewall was turned on?

Cheers,
   - Andreas

12