Has anyone had an issue with a deployment of Seaside locking up, and
killing access to the box via networking when hosting on windows server 2003? I'm running a 3.9 image in prod and on occasion it stops working, but when it dies, it seems to take out the network and other scheduled processes that copy files across the network start failing as well. Killing Squeak and restarting it fixes it and allows the network to start working again as well, I haven't a clue what's going on, and Squeak itself, goes black and I loose the UI, so I can't even debug the issue. I've had to resort to running Squeak as a service and resetting it on a schedule to give the appearance of stability, but I'm not too happy with that solution. Anyone have any ideas? Ramon Leon http://onsmalltalk.com |
> From: Ramon Leon
> Has anyone had an issue with a deployment of Seaside locking up, and > killing access to the box via networking when hosting on windows server > 2003? [...] > I've had to resort to running Squeak as a service and resetting > it on a schedule to give the appearance of stability, but I'm not too > happy with that solution. Anyone have any ideas? Sounds like exhaustion of some OS network-related resource, that is then released when the process exits. This is reinforced by your observation that regular restarts of the process remove the problem. Naively I'd suggest monitoring the handle count for the Squeak process as a first step, but Andreas probably has some much better ideas for monitoring! - Peter |
> Sounds like exhaustion of some OS network-related resource, that is then
> released when the process exits. This is reinforced by your observation > that regular restarts of the process remove the problem. Naively I'd > suggest monitoring the handle count for the Squeak process as a first > step, but Andreas probably has some much better ideas for monitoring! > > - Peter Looking at the handle count, I'm seeing a fresh image start with around 40 or so, and an image that's been up a bit, pushing 5000, much more than any other process on the box. I'm still waiting for another crash, but this seems a likely suspect, any more ideas? |
> From: Ramon Leon
> Looking at the handle count, I'm seeing a fresh image start with around > 40 or so, and an image that's been up a bit, pushing 5000, much more > than any other process on the box. OK, this feels like a promising source. > I'm still waiting for another crash, > but this seems a likely suspect, any more ideas? If it's a networking issue, copies of handle.exe and tdimon.exe (both from www.sysinternals.com) may be useful - if I recall correctly, they may tell you for what the handles are being used. Then it's a case of reviewing the code that opens that kind of thing and seeing whether it disposes of the object correctly afterwards - could be a VM issue, could be an image issue. - Peter |
Peter Crowther wrote:
>>From: Ramon Leon >>Looking at the handle count, I'm seeing a fresh image start with > > around > >>40 or so, and an image that's been up a bit, pushing 5000, much more >>than any other process on the box. > > > OK, this feels like a promising source. > > >>I'm still waiting for another crash, >>but this seems a likely suspect, any more ideas? > > > If it's a networking issue, copies of handle.exe and tdimon.exe (both > from www.sysinternals.com) may be useful - if I recall correctly, they > may tell you for what the handles are being used. Then it's a case of > reviewing the code that opens that kind of thing and seeing whether it > disposes of the object correctly afterwards - could be a VM issue, could > be an image issue. > > - Peter Appreciate the tips, I'll try them out and see if it leads anywhere when I go to work tomorrow. |
In reply to this post by Peter Crowther-2
> If it's a networking issue, copies of handle.exe and
> tdimon.exe (both from www.sysinternals.com) may be useful - > if I recall correctly, they may tell you for what the handles > are being used. Then it's a case of reviewing the code that > opens that kind of thing and seeing whether it disposes of > the object correctly afterwards - could be a VM issue, could > be an image issue. > > - Peter OK, playing with handle, and seems they are thread handles. Squeak.exe in task manager has 14,702 handles, 7 threads. Yet handle -s shows 15000+ Thread handles Locally, I can see both the handle count and thread count spike when I do a soap call in a loop forking each call, which would kind of simulate my live environment, the Seaside app doing soap calls on a forked process and polling for the result. Seems somehow I'm leaving thread handles hanging around, any idea what might cause this or how I can track it down? Ramon Leon http://onsmalltalk.com |
> OK, playing with handle, and seems they are thread handles.
> > Squeak.exe in task manager has 14,702 handles, 7 threads. > > Yet handle -s shows 15000+ Thread handles > > Locally, I can see both the handle count and thread count > spike when I do a soap call in a loop forking each call, > which would kind of simulate my live environment, the Seaside > app doing soap calls on a forked process and polling for the > result. Seems somehow I'm leaving thread handles hanging > around, any idea what might cause this or how I can track it down? > > Ramon Leon > http://onsmalltalk.com > > OK, I've found the offending line of code. I'm using NetNameResolver localHostName To print the web server name in the html source code for debugging purposes, and turns out each time it's called, it leaves a handle hanging. 10000 timesRepeat: [NetNameResolver localHostName] Confirms to me that this is my bug. So... Anyone know a reliable method of getting the computers name that doesn't leak like a sieve? Ramon Leon http://onsmalltalk.com |
Ramon Leon wrote:
> 10000 timesRepeat: [NetNameResolver localHostName] > > Confirms to me that this is my bug. So... Anyone know a reliable method of > getting the computers name that doesn't leak like a sieve? What Windows version are you running? I've just run the above code on XP and everything went fine, i.e., no handles were leaked. Cheers, - Andreas |
> Ramon Leon wrote:
> > 10000 timesRepeat: [NetNameResolver localHostName] > > > > Confirms to me that this is my bug. So... Anyone know a reliable > > method of getting the computers name that doesn't leak like a sieve? > > What Windows version are you running? I've just run the above > code on XP and everything went fine, i.e., no handles were leaked. > > Cheers, > - Andreas I'm using XP Professional Service Pack 2, and I just reconfirmed that this leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me. Ramon Leon http://onsmalltalk.com |
> > Ramon Leon wrote:
> > > 10000 timesRepeat: [NetNameResolver localHostName] > > > > > > Confirms to me that this is my bug. So... Anyone know a reliable > > > method of getting the computers name that doesn't leak > like a sieve? > > > > What Windows version are you running? I've just run the > above code on > > XP and everything went fine, i.e., no handles were leaked. > > > > Cheers, > > - Andreas > > I'm using XP Professional Service Pack 2, and I just > reconfirmed that this leaks handles for both Squeak 3.8.1 and > Squeak 3.9 for me. And in production, I'm using Windows Server 2003, also leaks. Ramon Leon http://onsmalltalk.com |
In reply to this post by Andreas.Raab
On Tue, 19 Dec 2006 10:08:10 -0800, Andreas Raab <[hidden email]>
wrote: > What Windows version are you running? I've just run the above code on XP > and everything went fine, i.e., no handles were leaked. I am running Windows XP Pro (Version 5.1.2600 Service Pack 2 Build 2600) and confirmed that the same thing happens to me as what happened to Ramon. The handle count went from 142 to 10,142... Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog |
In reply to this post by Ramon Leon-5
Ramon Leon wrote:
> I'm using XP Professional Service Pack 2, and I just reconfirmed that this > leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me. Then I need you to debug some more stuff: 1) Which VM are you using? 2) Does the following leak? 1000 timesRepeat: [NetNameResolver localHostAddress] 3) Does the following leak? How fast does it execute? How many stars do you get in the transcript? addr := NetNameResolver localHostAddress. [ 1000 timesRepeat: [ (NetNameResolver nameForAddress: addr timeout: 5) ifNil:[Transcript show: '*']. ] ] timeToRun. 4) If test 3) does leak, increase the timeout to 50 and re-run. Does it still leak? How fast does it execute? Cheers, - Andreas |
In reply to this post by Jon Hylands
On Tue, 19 Dec 2006 13:23:04 -0500, Jon Hylands <[hidden email]> wrote:
> I am running Windows XP Pro (Version 5.1.2600 Service Pack 2 Build 2600) > and confirmed that the same thing happens to me as what happened to Ramon. I also tried the same thing on my other laptop, which is running XP Home Edition (also Version 5.1.2600 Service Pack 2 Build 2600). Same results - handle count went up by 10,000. Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog |
In reply to this post by Andreas.Raab
On Tue, 19 Dec 2006 10:26:25 -0800, Andreas Raab <[hidden email]>
wrote: > 1) Which VM are you using? 3.7.1 (release) from Sept 23, 2004 Compiler: gcc 2.95.2 19991024 (release) > 2) Does the following leak? > > 1000 timesRepeat: [NetNameResolver localHostAddress] No. > 3) Does the following leak? Yes. > How fast does it execute? 283 ms > How many stars do you get in the transcript? None. > addr := NetNameResolver localHostAddress. > [ > 1000 timesRepeat: [ > (NetNameResolver > nameForAddress: addr > timeout: 5) ifNil:[Transcript show: '*']. > ] > ] timeToRun. > > 4) If test 3) does leak, increase the timeout to 50 and re-run. Does it > still leak? How fast does it execute? Same results, 1000 more handles, no stars on the transcript, run time was 265 ms. This is on Squeak 3.8 (6665) on XP Pro. Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog |
In reply to this post by Andreas.Raab
> Then I need you to debug some more stuff:
> > 1) Which VM are you using? 3.7.1 > 2) Does the following leak? > > 1000 timesRepeat: [NetNameResolver localHostAddress] Nope. > 3) Does the following leak? Yup > How fast does it execute? 387 > How many stars do you get in the transcript? None. > > 4) If test 3) does leak, increase the timeout to 50 and > re-run. Does it still leak? How fast does it execute? Still leaks, finishes in about the same on average. > > Cheers, > - Andreas |
Ramon Leon wrote:
>> Then I need you to debug some more stuff: >> >> 1) Which VM are you using? >> > > If the millisecondClock is close to rolling over then the deadline may be set in the future to a number greater than SmallInteger maxVal and timeouts will never complete. For Socket the default timeout is 45 seconds. So thinking about it this error is only likely to occur for 45 seconds every 12 days or so, but it will occur if there is code which relies upon the timeout itself. Now this is a complete guess but it might be your explanation. What if somehow your millisecond clock has failed to roll over and might be stuck in that dangerous region ie. at SmallInteger maxVal. Keith ___________________________________________________________ Try the all-new Yahoo! Mail. "The New Version is radically easier to use" The Wall Street Journal http://uk.docs.yahoo.com/nowyoucan.html |
A fix for the smalltalk bug is to make sure that the timeout calculation
similarly rolls over. i.e. deadlineSecs: secs "Return a deadline time the given number of seconds from now." ^ (Time millisecondClockValue + (secs * 1000) truncated) \\ SmallInteger maxVal. The code in question does not use Socket-#deadlineSecs: and there are many many places in the image that could be caught by this. The solution is to put this code on Time and encourage its use. Time-deadlineSecs: Time-pastDeadline: deadline of course if your millisecondClock has got stuck then its a vm problem Keith > If the millisecondClock is close to rolling over then the deadline may > be set in the future to a number greater than SmallInteger maxVal and > timeouts will never complete. > > For Socket the default timeout is 45 seconds. So thinking about it > this error is only likely to occur for 45 seconds every 12 days or so, > but it will occur if there is code which relies upon the timeout itself. > > Now this is a complete guess but it might be your explanation. What if > somehow your millisecond clock has failed to roll over and might be > stuck in that dangerous region ie. at SmallInteger maxVal. > > Keith > > > > > > ___________________________________________________________ Try the > all-new Yahoo! Mail. "The New Version is radically easier to use" – > The Wall Street Journal http://uk.docs.yahoo.com/nowyoucan.html > > ___________________________________________________________ All New Yahoo! Mail – Tired of Vi@gr@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html |
In reply to this post by Ramon Leon-5
2006/12/19, Ramon Leon <[hidden email]>:
> > Ramon Leon wrote: > > > 10000 timesRepeat: [NetNameResolver localHostName] > > > > > > Confirms to me that this is my bug. So... Anyone know a reliable > > > method of getting the computers name that doesn't leak like a sieve? > > > > What Windows version are you running? I've just run the above > > code on XP and everything went fine, i.e., no handles were leaked. > > > > Cheers, > > - Andreas > > I'm using XP Professional Service Pack 2, and I just reconfirmed that this > leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me. You don't happen to have the Windows "firewall" on, do you? Philippe |
> > I'm using XP Professional Service Pack 2, and I just
> reconfirmed that > > this leaks handles for both Squeak 3.8.1 and Squeak 3.9 for me. > > You don't happen to have the Windows "firewall" on, do you? > > Philippe Nope, I hate that thing. Ramon Leon http://onsmalltalk.com |
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
> You don't happen to have the Windows "firewall" on, do you? I don't, but have you seen problem when Windows firewall was turned on? Cheers, - Andreas |
Free forum by Nabble | Edit this page |