[vwnc] Strange case of hanging image (100% CPU)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Strange case of hanging image (100% CPU)

Boris Popov, DeepCove Labs (SNN)
I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second at 100 priority and attached are last 3 files before image goes haywire.
 
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]
 
OS Name:                   Microsoft(R) Windows(R) Server 2003, Standard Edition
OS Version:                5.2.3790 Service Pack 2 Build 3790
 
I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.
 
The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.
 
"Semaphore>>signal
Delay class>>postSnapshot
DelaySystem>>resumeAction
DelaySystem(Subsystem)>>privateResume"
 
Here's the snippet I used to dump process stacks regularly,
 
[120
 timesRepeat:
  [(Delay forSeconds: 1) wait.
  Processor printProcessStacks]] forkAt: 100.
 
Here's a modified version of stock #pringProcessStacks
 
printProcessStacks
 | stream |
 stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream.
 [stream
  nextPutAll: 'ping';
  cr.
 Process allInstances
  do:
   [:proc |
   stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority).
   stream cr.
   proc printStackOn: stream.
   stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk"
 
Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

stkdump.20090424-144035.txt (3K) Download Attachment
stkdump.20090424-144038.txt (7K) Download Attachment
stkdump.20090424-144039.txt Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Strange case of hanging image (100% CPU)

Paul Baumann
Most likely tight recursion in a #printOn:. VA debugging is more reliable than VW because VA debugging tools use #debugPrintOn: instead of the common #printOn: that everyone wants to override. Some time ago VW added #basicPrintOn: to do a similar thing but in not nearly as nice a way. Startup is attempting to print the bad object. If you can get to a debugger then it does the same thing. Hosed either way.
 
The approach to debug this is to modify the debug tools to use #basicPrintOn: so you can see what is going on without causing secondary recursion. The stack you dump will then show the process and printing code causing the problem. Avoid stack dumping with details. Flush the file buffer more frequently. If you need to load those debug changes into a locking image then look into the image startup commands that filein code from a file.
 
Paul Baumann 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov
Sent: Friday, April 24, 2009 2:52 PM
To: VWNC
Subject: [vwnc] Strange case of hanging image (100% CPU)

I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second at 100 priority and attached are last 3 files before image goes haywire.
 
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]
 
OS Name:                   Microsoft(R) Windows(R) Server 2003, Standard Edition
OS Version:                5.2.3790 Service Pack 2 Build 3790
 
I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.
 
The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.
 
"Semaphore>>signal
Delay class>>postSnapshot
DelaySystem>>resumeAction
DelaySystem(Subsystem)>>privateResume"
 
Here's the snippet I used to dump process stacks regularly,
 
[120
 timesRepeat:
  [(Delay forSeconds: 1) wait.
  Processor printProcessStacks]] forkAt: 100.
 
Here's a modified version of stock #pringProcessStacks
 
printProcessStacks
 | stream |
 stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream.
 [stream
  nextPutAll: 'ping';
  cr.
 Process allInstances
  do:
   [:proc |
   stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority).
   stream cr.
   proc printStackOn: stream.
   stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk"
 
Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Strange case of hanging image (100% CPU)

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Boris Popov, DeepCove Labs (SNN)
Terry,
 
It's not the subsystem resume that hangs the image, it's trying to do things after snapshot finishes. If I have a workspace up, I can still do basic things like 1/1 Ctrl-P, but as soon I do anything that would bring up new window, be it hover over toolbar or right-click the image hangs and pins the CPU. So something gets messed up in the snapshot sequence...

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 


From: Terry Raymond [mailto:[hidden email]]
Sent: Friday, April 24, 2009 1:34 PM
To: Boris Popov
Subject: RE: [vwnc] Strange case of hanging image (100% CPU)

Boris

 

If you can do file I/O then try putting probes into different places in

the subsystem startup so each probe writes a unique file containing the

microsecond clock value. Then you can find out far the startup got

and the sequence it took.

 

Terry

===========================================================
Terry Raymond
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI  02878
(401) 624-4517      [hidden email]
<http://www.craftedsmalltalk.com>
===========================================================


From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov
Sent: Friday, April 24, 2009 2:52 PM
To: VWNC
Subject: [vwnc] Strange case of hanging image (100% CPU)

 

I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second at 100 priority and attached are last 3 files before image goes haywire.

 

ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]

 

OS Name:                   Microsoft(R) Windows(R) Server 2003, Standard Edition

OS Version:                5.2.3790 Service Pack 2 Build 3790

 

I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.

 

The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.

 

"Semaphore>>signal
Delay class>>postSnapshot
DelaySystem>>resumeAction
DelaySystem(Subsystem)>>privateResume"

 

Here's the snippet I used to dump process stacks regularly,

 

[120
 timesRepeat:
  [(Delay forSeconds: 1) wait.
  Processor printProcessStacks]] forkAt: 100.

 

Here's a modified version of stock #pringProcessStacks

 

printProcessStacks
 | stream |
 stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream.
 [stream
  nextPutAll: 'ping';
  cr.
 Process allInstances
  do:
   [:proc |
   stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority).
   stream cr.
   proc printStackOn: stream.
   stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk"

 

Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Strange case of hanging image (100% CPU)

Andres Valloud-6
In reply to this post by Boris Popov, DeepCove Labs (SNN)
If memory usage increases over time, then that is a symptom compatible with the infinite recursion Paul mentions.  Start the image with the -xq switch, let it get into the recursion, and wait until it runs out of memory and crashes.  At that point, you will have the stack dump and the recursion should be easier to identify.  If it is a tight loop that does not necessarily increase memory use, then this approach will not tell you much.


From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov
Sent: Friday, April 24, 2009 11:52 AM
To: VWNC
Subject: [vwnc] Strange case of hanging image (100% CPU)

I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second at 100 priority and attached are last 3 files before image goes haywire.
 
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]
 
OS Name:                   Microsoft(R) Windows(R) Server 2003, Standard Edition
OS Version:                5.2.3790 Service Pack 2 Build 3790
 
I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.
 
The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.
 
"Semaphore>>signal
Delay class>>postSnapshot
DelaySystem>>resumeAction
DelaySystem(Subsystem)>>privateResume"
 
Here's the snippet I used to dump process stacks regularly,
 
[120
 timesRepeat:
  [(Delay forSeconds: 1) wait.
  Processor printProcessStacks]] forkAt: 100.
 
Here's a modified version of stock #pringProcessStacks
 
printProcessStacks
 | stream |
 stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream.
 [stream
  nextPutAll: 'ping';
  cr.
 Process allInstances
  do:
   [:proc |
   stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority).
   stream cr.
   proc printStackOn: stream.
   stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk"
 
Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Strange case of hanging image (100% CPU)

Boris Popov, DeepCove Labs (SNN)
Andres,
 
You are a savior! It wasn't growing too quickly, but I just put the mem max really low to trigger the out of memory sooner. The recursion happens as soon as there's an exception while reopening MC_FileBTree, specifically in our case the catalogs weren't flushed, so paths that were stored in indices didn't match the deployment environment and unhandled exception needs catalogs to print its own description,
 
[...]
context 0x7f56aee4 UserMessage>>asString
context 0x7f56aecc UserMessage>>expandMacrosWithArguments:
context 0x7f56aeb4 UserMessage>>expandMacrosWith:
context 0x7f56ae9c UnhandledException>>description
context 0x7f56ae84 UnhandledException>>defaultAction
context 0x7f56ae6c UnhandledException(GenericException)>>performHandler:
context 0x7f56ae54 UnhandledException(GenericException)>>propagatePrivateFrom:
context 0x7f573bf0 UnhandledException(GenericException)>>propagateFrom:
context 0x7f56af44 UnhandledException(GenericException)>>propagate
context 0x7f573bcc UnhandledException(GenericException)>>raiseSignal
context 0x7f56b0c4 OsInaccessibleError(GenericException)>>noHandler
context 0x7f56b0ac OsInaccessibleError(GenericException)>>defaultAction
context 0x7f56b094 OsInaccessibleError(GenericException)>>performHandler:
context 0x7f56b07c OsInaccessibleError(GenericException)>>propagatePrivateFrom:
context 0x7f573b48 OsInaccessibleError(GenericException)>>propagateFrom:
context 0x7f573aac OsInaccessibleError(GenericException)>>propagate
context 0x7f573ba8 OsInaccessibleError(GenericException)>>raiseRequest
context 0x7f573bfc HandleRegistry>>evaluateWithFullProtection:
context 0x7f56b058 HandleRegistry>>registerValueOf:
context 0x7f56b040 PCDiskFileAccessor class(OSHandle class)>>handleValue:
context 0x7f56b028 IOAccessor class>>openFileNamed:direction:creation:
context 0x7f56b010 MC_FileBTree>>reopen:
context 0x7f56aff8 MC_FileBTree>>beOpenForRead
context 0x7f56afe0 MC_FileBTree>>fileLock
context 0x7f56afc8 MC_FileBTree>>currentLockState
context 0x7f56afb0 IndexedFileMessageCatalog>>at:
context 0x7f56af98 IndexedFileMessageCatalog>>lookUp:
context 0x7f56af80 [] in BlockClosure>>unboundMethod
context 0x7f56af68 OrderedCollection>>do:
context 0x7f573b24 MessageCatalogManager>>lookUp:
context 0x7f56b19c Locale>>lookUp:
context 0x7f56b184 UserMessage>>asString
context 0x7f56b16c UserMessage>>expandMacrosWithArguments:
context 0x7f56b154 UserMessage>>expandMacrosWith:
context 0x7f56b13c UnhandledException>>description
[....]
 
I flushed everything manually and it seems to have fixed our problem,
 
    MessageCatalogManager.Managers do: [:each | each flushCatalogs].
 
It might be a good idea to remove dependency on functioning catalogs from unhandled exception in the future so that one could debug broken catalogs.
 
Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Valloud, Andres
Sent: Friday, April 24, 2009 2:29 PM
To: VWNC List NC
Subject: Re: [vwnc] Strange case of hanging image (100% CPU)

If memory usage increases over time, then that is a symptom compatible with the infinite recursion Paul mentions.  Start the image with the -xq switch, let it get into the recursion, and wait until it runs out of memory and crashes.  At that point, you will have the stack dump and the recursion should be easier to identify.  If it is a tight loop that does not necessarily increase memory use, then this approach will not tell you much.


From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov
Sent: Friday, April 24, 2009 11:52 AM
To: VWNC
Subject: [vwnc] Strange case of hanging image (100% CPU)

I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second at 100 priority and attached are last 3 files before image goes haywire.
 
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]
 
OS Name:                   Microsoft(R) Windows(R) Server 2003, Standard Edition
OS Version:                5.2.3790 Service Pack 2 Build 3790
 
I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.
 
The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.
 
"Semaphore>>signal
Delay class>>postSnapshot
DelaySystem>>resumeAction
DelaySystem(Subsystem)>>privateResume"
 
Here's the snippet I used to dump process stacks regularly,
 
[120
 timesRepeat:
  [(Delay forSeconds: 1) wait.
  Processor printProcessStacks]] forkAt: 100.
 
Here's a modified version of stock #pringProcessStacks
 
printProcessStacks
 | stream |
 stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream.
 [stream
  nextPutAll: 'ping';
  cr.
 Process allInstances
  do:
   [:proc |
   stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority).
   stream cr.
   proc printStackOn: stream.
   stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk"
 
Thanks!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments.

Thank you.

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc