Hi guys,
I am running some code that processes a huge csv file and inserts persistent data to gemstone. This is a GemStone 3.2.9 with 1GB of SPC, GEM_TEMPOBJ_CACHE_SIZE of 700MB and GEM_TEMPOBJ_POMGEN_PRUNE_ON_VOTE at 100. What is funny is that my code runs inside commitOnAlmost... I have this method: FAGemStoneCompatibility >> commitOnAlmostOutOfMemoryDuring: aBlock threshold: aThreshold [ MCPlatformSupport installAlmostOutOfMemoryStaticHandler: aThreshold. aBlock value ] ensure: [ MCPlatformSupport uninstallAlmostOutOfMemoryStaticHandler ] And this is how I use it: System commitTransaction. FACompatibilityUtils current commitOnAlmostOutOfMemoryDuring: [ WhateverClass whateverThingThatNeedsMemory. ] threshold: 10. System commitTransaction. And even with a threshold of 10... I am getting a VM temporary object memory is full , too many failed pom_gen scavenges Any idea what could be going on? Thanks in advance, _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Mariano,
The handler for AlmostOutOfMemory relies on being able to resume after a successful commit, but a GemStone vm will sometimes cross the threshol in the middle of a "random C code" called by a primitive ... in these cases we have to defer the signalling of AlmostOutOfMemory until we reach a point where we can safely resume .... The implication is that depending upon the details of your call it is possible to physically run out of memory before the deferred signaling can take place ... If you lower the threshold, you should be able to find a limit that gives you room to finish the memory hungry primitive call and get the deferred AlmostOfOfMemory exception signalled. It is also possible that your temp obj cache is filling with objects that have not yet been connected to the persistent root ... If you want to see what's happening with respect to the exceptions that are being signaleed, you could add logging to MCPlatformSupport class>>installAlmostOutOfMemoryStaticHandler: ... Dale On 03/07/2016 08:35 AM, Mariano
Martinez Peck via Glass wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Mon, Mar 7, 2016 at 3:18 PM, Dale Henrichs via Glass <[hidden email]> wrote:
Yes, but I am already running the threshold with 10% with a 700MB temp space... how less than that should it be? mmmm
mmm OK I will check this. But now I get this same error in another cron job. Both were running correctly a few weeks/months ago.
mmmm I am a bit lost there. Which kind of log could I add? Thanks in advance,
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
BTW Dale.... I think the line of "System _vmInstanceCounts: 3" in #installAlmostOutOfMemoryStaticHandler: is wrong and instead it should be (System _vmInstanceCountsReport: 3) and it also will fail the sort: [ :a :b | (a value at: 2) > (b value at: 2) ] Was this fixed somewhere after 3.2.9 ??? Cheers, On Mon, Mar 7, 2016 at 4:57 PM, Mariano Martinez Peck <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Also...I do not get the instance count written neither in object log (could be the bug I mentioned in previous email), nor in the gem file log...shouldn't I get that? At least I remember getting that once upon a time. Thanks! On Mon, Mar 7, 2016 at 5:11 PM, Mariano Martinez Peck <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Dale, this is the gem log I get: topaz> topaz> display oops topaz> iferror where topaz> login successful login fileformat is now utf8 sourcestringclass is now Unicode16 topaz 1> topaz 1> run Transcript disableLoggingToObjectLogForSession. Transcript enableLoggingToGemLogFileForSession. 12350651905 asObject sessionId: ( (System descriptionOfSession: System session) at: 10 ). System commit. System _cacheName: (#BackgroundProcess asString, 12350651905 asString ). 12350651905 asObject runInForeground % ----------------------------------------------------- GemStone: Error Fatal VM temporary object memory is full , too many failed pom_gen scavenges Error Category: 231169 [GemStone] Number: 4067 Arg Count: 1 Context : 20 exception : 20 Arg 1: 20 topaz > exec iferr 1 : where WHERE can't be used prior to logging in. topaz> topaz> This is a "background job" gem and the code is actually invoked via "12350651905 asObject runInForeground". Anyway, as you can see, there is NOTHING printed in the gem log, nor there is in the object log. So I cannot even get a list instances... Any clues? Thanks On Mon, Mar 7, 2016 at 5:39 PM, Mariano Martinez Peck <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Administrator
|
Hi Mariano, the following is an excerpt of a message I sent to a developer at one of our other customers. He had much the same kind of question.
|
In reply to this post by GLASS mailing list
Well,
In 2.4.x, this primitive returned an Array of Associations ... class and instance count .. somewhere along the line this obviously changed ---- the danger of using _ methods ... I assume that this also means that the condition that was being tested for has never been hit either :) I haven't worked my way up the GemStone version chain to find out when the behavior actually changed, but #_* methods are not documented by definition and also cannot be depended upon to even exist from version to version ... but sometimes the #_ method is the only way to get a bit of interesting information:) I've submitted an issue for this and I will check into what might have happened ... getting back objects was the intent and happens to be quite useful so I want to find out if that capability is still present in a different form. Dale [1] https://github.com/GsDevKit/GsDevKit/issues/90 On 03/07/2016 12:11 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
In 3.3, instead of returning an Array of Associations, the report is
written to the gem log ... you are using 3.2.9?
Dale On 03/07/2016 12:39 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
You are not getting a report written to the object log or gem log
because the code `System _vmInstanceCount: 3` is obviously not being
triggered ...
It used to be (back in 2.4 days) that instance count reports were unconditionally written to the gem log, then somewhere along the way that behavior was changed in favor of the environment variables that Richard mentions ... so those env vars need to be set in the shell before you start your netldi so that they'll be available to the gems and then when you get a 4067 error, the instance count information will be written to the gem log .. Dale On 03/07/2016 01:28 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
On Mon, Mar 7, 2016 at 6:44 PM, Dale Henrichs <[hidden email]> wrote:
Yes, I am using 3.2.9. But....I thought the guy that would write into the gem log was #_vmPrintInstanceCounts: , not #_vmInstanceCountsReport: (I thought this one would simply answer the string)
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 03/07/2016 01:49 PM, Mariano
Martinez Peck wrote:
Well as I've mentioned there is no guarantee of behavior for a #_* method and "it is not surprising" that it does different things from release to release ... apparently both calls write the the gem log ...and of course, in you case neither of them are getting called, althrough I haven't tried an experiment in 3.2.9 yet, as I've been busy writing email:) Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
On 03/07/2016 11:57 AM, Mariano
Martinez Peck wrote:
This implies that you are doing something that consumes quite a bit of memory while in a primitive .... a stack trace would be helpful ... but it is likely that the only way to get a stack trace in this situation is to set the env vars as suggested by Richard ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi guys, First of all, thanks to Richard for the suggestion. Adding the env var did write the useful info in the gem log. Problem is that even if I tried to fix it (in my code), I still failed :( I attach the whole log in case someone wants to take a look. However, I think there is something strange in #installAlmostOutOfMemoryStaticHandler:. I added some logging to this method to see what was going on (in bold the important parts): installAlmostOutOfMemoryStaticHandler: initialCommitThreshold | handler commitThreshold used | commitThreshold := initialCommitThreshold. handler := AlmostOutOfMemory addDefaultHandler: [ :ex | "Exception caught, do a commit." Transcript show: 'exception handled: ' , ex printString; cr. System commitTransaction ifFalse: [ Transcript show: 'AutoCommit failed'; cr. nil error: 'AutoCommit failed' ]. System _vmMarkSweep. Transcript show: 'commit threshold value: ' , commitThreshold printString; cr. used := System _tempObjSpacePercentUsedLastMark. Transcript show: 'percent used ' , used printString; cr. used < commitThreshold ifTrue: [ "We dropped below the threshold, reenable the signal" Transcript show: 're-enable the signal '; cr. System enableAlmostOutOfMemoryError ] ifFalse: [ "commit and mark sweep did drop us below threshold. cut threshold in half and try again. Record an entry in the object log " commitThreshold := (100 - commitThreshold) // 2 + commitThreshold. Transcript show: 'commit threshold value: ' , commitThreshold printString; cr. commitThreshold < 98 ifTrue: [ Transcript show: 'signalAlmostOutOfMemoryThreshold: '; cr. System signalAlmostOutOfMemoryThreshold: commitThreshold ] ifFalse: [ "we're with 10% of out of memory, commit instance counts structure into objectlog for posterity." Transcript show: '10% out of memory... record in the object log '; cr. (ObjectLogEntry trace: 'almost out of memory: ' , commitThreshold printString object: (System _vmInstanceCountsReport: 3)) addToLog. System commitTransaction ifFalse: [ Transcript show: 'Last AutoCommit failed '; cr. nil error: 'AutoCommit failed' ] ] ]. ex resume ]. SessionTemps current at: #'CommitOnAlmostOutOfMemoryStaticException' put: handler. System signalAlmostOutOfMemoryThreshold: commitThreshold Now...if I check the gem log, I get this: --transcript--'exception handled: a AlmostOutOfMemory occurred (notification 6013), Session''s temporary object memory is almost full' --transcript--'commit threshold value: 10' --transcript--'percent used 0' --transcript--'re-enable the signal ' And that 4 lines repeated several times, until it starts with the: *********** Fatal error 4067 OutOfMemory ***** Printing Smalltalk stack because of OutOfMemory: Smalltalk stack: printing to topaz .out file at [03/08/2016 12:36:11.827 EST] And this is what I don't understand...from the transcript, it seems that "System _tempObjSpacePercentUsedLastMark" always answered 0, hence we are below threshold (no problem!). But then I get the OutOfMemory anyway... so... how can I run out of memory if #_tempObjSpacePercentUsedLastMark answered 0 each time? Thanks in advance, On Mon, Mar 7, 2016 at 7:07 PM, Dale Henrichs <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass outOfMemory.log (176K) Download Attachment |
On 03/08/2016 09:50 AM, Mariano
Martinez Peck wrote:
Okay the `--transcript--'re-enable the signal '` simply means that the AlmostOfMemory signal was re-enabled - the handler is armed - it does not guarantee that you won't run out of memory. The `*********** Fatal error 4067 OutOfMemory *****` means that you ran out of memory without triggering an AlmostOutOfMemory exception ... for the reasons I mentioned earlier: most likely you ran out of memory during a primitive call BEFORE the deferred exception could be signaled .. BTW, I tried looking at the selector in my 3.2.9 and didn't see symbols, so I don't know what methods are actually being called in the first couple of frames, so I can't guess about what might be consuming memory ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Tue, Mar 8, 2016 at 3:13 PM, Dale Henrichs <[hidden email]> wrote:
OK, I got it, and that would explain it. OK.
OK.. I did some analysis. The most complicated part that I didn't know how to debug was the "blocks" entries in the stack. Until some time I figured out I could use: (Object _objectForOop: XXX) _sourceStringForBlock where XXX is the OOP of the GsNMethod printed at the END of the line, not the homeMethod printing at the beginning. This is the resulting, translated analysis. Please, start reading from the bottom to the top. Thanks!! 1 Object >> #'at:' (receiver is OrderedCollection) 2. The closure '[ :coDataRowDict | coDataRowDict row at: index ]' 3. SequenceableCollection >> #'collect:' 4. Looks to me the closure of #mapSelectedRows: inputListOfRowDictionaries toQuuveUsing: quuveKeysForZacksTableAccessor prependKeysWith: aPrefix , that is: [ :rowDict | | key tmp values dataType | ... values := aListOfRowDictionaries collect: [ :coDataRowDict | coDataRowDict row at: index ]. .... ] 5. The closure of #doWithRowDictionaries: that is: [ :row | | t| t := self rowAsRowDictionary: row. "now the user can use symbolic references for testing in the aTestBlock" (aBlock value: t) ]. 6. Collection >> do: 7. FaTableAccessor >> doWithRowDictionaries: aBlock self table do: [ :row | | t | t := self rowAsRowDictionary: row. "now the user can use symbolic references for testing in the aTestBlock" aBlock value: t ] 8 FaQuandlZacksFundamentalsDataLoader >> mapSelectedRows: inputListOfRowDictionaries toQuuveUsing: quuveKeysForZacksTableAccessor prependKeysWith: aPrefix "some code here" .... quuveKeysForZacksTableAccessor doWithRowDictionaries: [ :rowDict | | key tmp values dataType | ... values := aListOfRowDictionaries collect: [ :coDataRowDict | coDataRowDict row at: index ]. .... ] _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Okay .. given your stack, OrderedCollection>>at: is probably
not going to be causing you to run out of memory in and of itself.
...
The error `too many failed pom_gen scavenges` is indicating that you are running into trouble in POM gen and by extension OLD gen ... From your memory dump, you've got 250K OrderedCollection instances taking up 500MB and another 1M Strings taking up 300MB ... Objects are held in POM gen and OLD gen if they've been modified or if they are directly referenced from temp objects or the stack and the error specifically means that you've filled both OLD Gen and POM gen ... it is interesting that you are able to reference so many strings and ordered collections from the stack and/or temp objects ... First we want to know what the setting for GEM_TEMPOBJ_AGGRESSIVE_STUBBING (perhaps you could send us the banner that gets dumped when topaz starts up, so we can get more info about what's happening). If GEM_TEMPOBJ_AGGRESSIVE_STUBBING is FALSE, then that could be the source of your problem the default for GEM_TEMPOBJ_AGGRESSIVE_STUBBING is TRUE ... but seeing the banner will help us nonetheless... We took a 3.2.9 topaz and started it with a 700MB temp obj cache and this memory space info was printed out by topaz: (vmGc spaceSizes: eden init 2048K max 134400K , survivor init 384K max 22400K, vmGc old max 537600K, code max 143360K, perm max 71680K, pom 10 * 59776K = 597760K, vmGc remSet 13828K, meSpace max 685324K oopMapSize 4194304 max footprint 2229M) Giving you a pom gen size of ~600M and old gem of ~500M In order to get additional information we want you to reduce the size of POM gen to 300M (which will in turn cause OLD gen to be increased in size) with the following settings in your system.conf: GEM_TEMPOBJ_AGGRESSIVE_STUBBING = TRUE; GEM_TEMPOBJ_POMGEN_SIZE = 300MB; GEM_TEMPOBJ_CACHE_SIZE = 700MB; This may shift the problem to a different point but at least we will learn more info ... especially getting the startup banner from topaz... Dale On 03/08/2016 10:57 AM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Tue, Mar 8, 2016 at 6:38 PM, Dale Henrichs <[hidden email]> wrote:
Exactly, that's what I thought too.
OK.
BTW, I think strings are holding 30MB, not 300MB.
Hold on!!!! I have a question here. I do something like this (pseudo code) myBootstrapMethod | storage | storage := FaUserContextInformation current outerSpace cache at: 'ZacksQuarterlyLoadingCache' put: Dictionary new. self parsedData do: [:each | storage at: aKey put: self processInfoFrom: each. ]... As you can see, I have a temp object (part of the stack) that directly refers to a persistent object (a dictionary). I thought that under memory pressure, the objects referenced from this dictionary, would go to disk. But now I am not sure because of what you said. So maybe avoid the temp var "storage" and instead getting it from "FaUserContextInformation current outerSpace cache at: 'ZacksQuarterlyLoadingCache' " would allow what I was thinking? mmmmm
This is a bootstrap a huge text file of 600MB :) Of course, I am using temporary persistent dictionaries (similar to SIXX approeach when dealing with large graphs).
I didn't change that. I checked and it's TRUE. BTW, please find attached the whole topaz script. Note that in this case, I tried with the default POM gem size
Indeed, similar here: (vmGc spaceSizes: eden init 2048K max 131264K , survivor init 448K max 21888K, vmGc old max 524928K, code max 140032K, perm max 70016K, pom 10 * 58368K = 583680K, vmGc remSet 13252K, meSpace max 669196K oopMapSize 4194304 max footprint 2178M) BTW, thanks for pointing out this information, I wasn't aware of it :)
Thanks. I tried but it fails in the same place.
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass wholeTopazOutOfMemory.log (177K) Download Attachment |
Free forum by Nabble | Edit this page |