I was wondering if
anyone has seen anything similar or can provide suggestions for debugging. We've
recently changed some of our startup code in the subsystem among many other
things and now have images that hang shortly after snapshots and other events,
like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into
even with emergency evaluator. I had attempted to get some trace information by
dumping process stacks every second at 100 priority and attached are last 3
files before image goes haywire.
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64
240]
OS
Name:
Microsoft(R) Windows(R) Server 2003, Standard Edition
OS
Version:
5.2.3790 Service Pack 2 Build 3790
I've looked over the
startup code in the subsystem a 100 times and can't see anything suspicious or
substantially different from what we were doing before. The subsystem itself
prereqs ImageConfigurationSystem, but I've tried others as well, like
RuntimeSystem or EarlyInterestNotificationSystem.
The only thing that
catches my eye in the sequence of logs is the following bit, so any ideas and/or
suggestions would be most welcomed.
"Semaphore>>signal
Delay class>>postSnapshot DelaySystem>>resumeAction DelaySystem(Subsystem)>>privateResume" Here's the snippet I
used to dump process stacks regularly,
[120
timesRepeat: [(Delay forSeconds: 1) wait. Processor printProcessStacks]] forkAt: 100. Here's a modified
version of stock #pringProcessStacks
printProcessStacks
| stream | stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream. [stream nextPutAll: 'ping'; cr. Process allInstances do: [:proc | stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority). stream cr. proc printStackOn: stream. stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk" Thanks!
-Boris _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc stkdump.20090424-144035.txt (3K) Download Attachment stkdump.20090424-144038.txt (7K) Download Attachment stkdump.20090424-144039.txt Download Attachment |
Most likely tight recursion in a #printOn:. VA debugging is more reliable than VW because VA debugging tools use #debugPrintOn: instead of the common #printOn:
that everyone wants to override. Some time ago VW added #basicPrintOn: to do a similar thing but in not nearly as nice a way. Startup is attempting to print the bad object. If you can get to a debugger then it does the same thing. Hosed either way.
The approach to debug this is to modify the debug tools to use #basicPrintOn: so you can see what is going on without causing secondary recursion. The stack
you dump will then show the process and printing code causing the problem. Avoid stack dumping with details. Flush the file buffer more frequently. If you need to load those debug changes into a locking image then look into the image startup commands that
filein code from a file.
Paul Baumann
From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov Sent: Friday, April 24, 2009 2:52 PM To: VWNC Subject: [vwnc] Strange case of hanging image (100% CPU) I was wondering if anyone has seen anything similar or can provide suggestions for debugging. We've recently changed some of our startup code in the subsystem among many other things and now
have images that hang shortly after snapshots and other events, like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even with emergency evaluator. I had attempted to get some trace information by dumping process stacks every second
at 100 priority and attached are last 3 files before image goes haywire.
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64 240]
OS Name: Microsoft(R) Windows(R) Server 2003, Standard Edition
OS Version: 5.2.3790 Service Pack 2 Build 3790
I've looked over the startup code in the subsystem a 100 times and can't see anything suspicious or substantially different from what we were doing before. The subsystem itself prereqs ImageConfigurationSystem,
but I've tried others as well, like RuntimeSystem or EarlyInterestNotificationSystem.
The only thing that catches my eye in the sequence of logs is the following bit, so any ideas and/or suggestions would be most welcomed.
"Semaphore>>signal
Delay class>>postSnapshot DelaySystem>>resumeAction DelaySystem(Subsystem)>>privateResume" Here's the snippet I used to dump process stacks regularly,
[120
timesRepeat: [(Delay forSeconds: 1) wait. Processor printProcessStacks]] forkAt: 100. Here's a modified version of stock #pringProcessStacks
printProcessStacks
| stream | stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream. [stream nextPutAll: 'ping'; cr. Process allInstances do: [:proc | stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority). stream cr. proc printStackOn: stream. stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk" Thanks!
-Boris This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Boris Popov, DeepCove Labs (SNN)
Terry,
It's not the subsystem resume that hangs the image, it's
trying to do things after snapshot finishes. If I have a workspace up, I can
still do basic things like 1/1 Ctrl-P, but as soon I do anything that would
bring up new window, be it hover over toolbar or right-click the image hangs and
pins the CPU. So something gets messed up in the snapshot
sequence...
-Boris From: Terry Raymond [mailto:[hidden email]] Sent: Friday, April 24, 2009 1:34 PM To: Boris Popov Subject: RE: [vwnc] Strange case of hanging image (100% CPU) Boris If you can do file I/O
then try putting probes into different places in the subsystem startup
so each probe writes a unique file containing the microsecond clock
value. Then you can find out far the startup got and the sequence it
took. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov I was wondering if anyone has seen
anything similar or can provide suggestions for debugging. We've recently
changed some of our startup code in the subsystem among many other things and
now have images that hang shortly after snapshots and other events, like
debugging sessions etc. The VM goes to 100% CPU and cannot be broken into even
with emergency evaluator. I had attempted to get some trace information by
dumping process stacks every second at 100 priority and attached are last 3
files before image goes haywire. ObjectMemory versionId #[64 41 64
240 76 0 0 0 64 41 64 240] OS
Name:
Microsoft(R) Windows(R) Server 2003, Standard
Edition OS
Version:
5.2.3790 Service Pack 2 Build 3790 I've looked over the startup code in
the subsystem a 100 times and can't see anything suspicious or substantially
different from what we were doing before. The subsystem itself prereqs
ImageConfigurationSystem, but I've tried others as well, like RuntimeSystem or
EarlyInterestNotificationSystem. The only thing that catches my eye
in the sequence of logs is the following bit, so any ideas and/or suggestions
would be most welcomed. "Semaphore>>signal Here's the snippet I used to dump
process stacks regularly, [120 Here's a modified version of stock
#pringProcessStacks printProcessStacks Thanks! -Boris _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Boris Popov, DeepCove Labs (SNN)
If memory usage increases over time, then that is a symptom
compatible with the infinite recursion Paul mentions. Start the image
with the -xq switch, let it get into the recursion, and wait until it runs out
of memory and crashes. At that point, you will have the stack dump and the
recursion should be easier to identify. If it is a tight
loop that does not necessarily increase memory use, then this approach will not
tell you much. From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov Sent: Friday, April 24, 2009 11:52 AM To: VWNC Subject: [vwnc] Strange case of hanging image (100% CPU) I was wondering if
anyone has seen anything similar or can provide suggestions for debugging. We've
recently changed some of our startup code in the subsystem among many other
things and now have images that hang shortly after snapshots and other events,
like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into
even with emergency evaluator. I had attempted to get some trace information by
dumping process stacks every second at 100 priority and attached are last 3
files before image goes haywire.
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64
240]
OS
Name:
Microsoft(R) Windows(R) Server 2003, Standard Edition
OS
Version:
5.2.3790 Service Pack 2 Build 3790
I've looked over the
startup code in the subsystem a 100 times and can't see anything suspicious or
substantially different from what we were doing before. The subsystem itself
prereqs ImageConfigurationSystem, but I've tried others as well, like
RuntimeSystem or EarlyInterestNotificationSystem.
The only thing that
catches my eye in the sequence of logs is the following bit, so any ideas and/or
suggestions would be most welcomed.
"Semaphore>>signal
Delay class>>postSnapshot DelaySystem>>resumeAction DelaySystem(Subsystem)>>privateResume" Here's the snippet I
used to dump process stacks regularly,
[120
timesRepeat: [(Delay forSeconds: 1) wait. Processor printProcessStacks]] forkAt: 100. Here's a modified
version of stock #pringProcessStacks
printProcessStacks
| stream | stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream. [stream nextPutAll: 'ping'; cr. Process allInstances do: [:proc | stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority). stream cr. proc printStackOn: stream. stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk" Thanks!
-Boris _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Andres,
You are a savior! It wasn't growing too quickly, but I just put the
mem max really low to trigger the out of memory sooner. The recursion happens as
soon as there's an exception while reopening MC_FileBTree, specifically in our
case the catalogs weren't flushed, so paths that were stored in indices didn't
match the deployment environment and unhandled exception needs catalogs to print
its own description,
[...]
context 0x7f56aee4 UserMessage>>asString
context 0x7f56aecc UserMessage>>expandMacrosWithArguments: context 0x7f56aeb4 UserMessage>>expandMacrosWith: context 0x7f56ae9c UnhandledException>>description context 0x7f56ae84 UnhandledException>>defaultAction context 0x7f56ae6c UnhandledException(GenericException)>>performHandler: context 0x7f56ae54 UnhandledException(GenericException)>>propagatePrivateFrom: context 0x7f573bf0 UnhandledException(GenericException)>>propagateFrom: context 0x7f56af44 UnhandledException(GenericException)>>propagate context 0x7f573bcc UnhandledException(GenericException)>>raiseSignal context 0x7f56b0c4 OsInaccessibleError(GenericException)>>noHandler context 0x7f56b0ac OsInaccessibleError(GenericException)>>defaultAction context 0x7f56b094 OsInaccessibleError(GenericException)>>performHandler: context 0x7f56b07c OsInaccessibleError(GenericException)>>propagatePrivateFrom: context 0x7f573b48 OsInaccessibleError(GenericException)>>propagateFrom: context 0x7f573aac OsInaccessibleError(GenericException)>>propagate context 0x7f573ba8 OsInaccessibleError(GenericException)>>raiseRequest context 0x7f573bfc HandleRegistry>>evaluateWithFullProtection: context 0x7f56b058 HandleRegistry>>registerValueOf: context 0x7f56b040 PCDiskFileAccessor class(OSHandle class)>>handleValue: context 0x7f56b028 IOAccessor class>>openFileNamed:direction:creation: context 0x7f56b010 MC_FileBTree>>reopen: context 0x7f56aff8 MC_FileBTree>>beOpenForRead context 0x7f56afe0 MC_FileBTree>>fileLock context 0x7f56afc8 MC_FileBTree>>currentLockState context 0x7f56afb0 IndexedFileMessageCatalog>>at: context 0x7f56af98 IndexedFileMessageCatalog>>lookUp: context 0x7f56af80 [] in BlockClosure>>unboundMethod context 0x7f56af68 OrderedCollection>>do: context 0x7f573b24 MessageCatalogManager>>lookUp: context 0x7f56b19c Locale>>lookUp: context 0x7f56b184 UserMessage>>asString context 0x7f56b16c UserMessage>>expandMacrosWithArguments: context 0x7f56b154 UserMessage>>expandMacrosWith: context 0x7f56b13c UnhandledException>>description [....]
I
flushed everything manually and it seems to have fixed our
problem,
MessageCatalogManager.Managers do: [:each | each
flushCatalogs].
It might be a good idea to remove dependency on functioning catalogs
from unhandled exception in the future so that one could debug broken
catalogs.
Thanks!
-Boris From: [hidden email] [mailto:[hidden email]] On Behalf Of Valloud, Andres Sent: Friday, April 24, 2009 2:29 PM To: VWNC List NC Subject: Re: [vwnc] Strange case of hanging image (100% CPU) If memory usage increases over time, then that is a symptom
compatible with the infinite recursion Paul mentions. Start the image
with the -xq switch, let it get into the recursion, and wait until it runs out
of memory and crashes. At that point, you will have the stack dump and the
recursion should be easier to identify. If it is a tight
loop that does not necessarily increase memory use, then this approach will not
tell you much. From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov Sent: Friday, April 24, 2009 11:52 AM To: VWNC Subject: [vwnc] Strange case of hanging image (100% CPU) I was wondering if
anyone has seen anything similar or can provide suggestions for debugging. We've
recently changed some of our startup code in the subsystem among many other
things and now have images that hang shortly after snapshots and other events,
like debugging sessions etc. The VM goes to 100% CPU and cannot be broken into
even with emergency evaluator. I had attempted to get some trace information by
dumping process stacks every second at 100 priority and attached are last 3
files before image goes haywire.
ObjectMemory versionId #[64 41 64 240 76 0 0 0 64 41 64
240]
OS
Name:
Microsoft(R) Windows(R) Server 2003, Standard Edition
OS
Version:
5.2.3790 Service Pack 2 Build 3790
I've looked over the
startup code in the subsystem a 100 times and can't see anything suspicious or
substantially different from what we were doing before. The subsystem itself
prereqs ImageConfigurationSystem, but I've tried others as well, like
RuntimeSystem or EarlyInterestNotificationSystem.
The only thing that
catches my eye in the sequence of logs is the following bit, so any ideas and/or
suggestions would be most welcomed.
"Semaphore>>signal
Delay class>>postSnapshot DelaySystem>>resumeAction DelaySystem(Subsystem)>>privateResume" Here's the snippet I
used to dump process stacks regularly,
[120
timesRepeat: [(Delay forSeconds: 1) wait. Processor printProcessStacks]] forkAt: 100. Here's a modified
version of stock #pringProcessStacks
printProcessStacks
| stream | stream := ('stkdump.<1s>.txt' expandMacrosWith: (Timestamp now printUsing: 'yyyymmdd-hhmmss')) asFilename writeStream. [stream nextPutAll: 'ping'; cr. Process allInstances do: [:proc | stream nextPutAll: (#ProcessPriorityC1p << #dialogs >> 'Process priority: <1p>' expandMacrosWith: proc priority). stream cr. proc printStackOn: stream. stream cr]] ensure: [stream close]. "Smalltalk Professional Debug Package Version 3.0 (c) 2002 Crafted Smalltalk" Thanks!
-Boris _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |