On Tue, Sep 8, 2015 at 4:26 PM, Dale Henrichs <[hidden email]> wrote:
Hi Dale, no worries, thanks for pushing!
This doesn't compile because 'sKind' was defined inside the 'scanBlk' and 'scanSetThisTime' is the argument to the closure. Since this problem was related to temp vars, I am not sure which is the correct solution. Let me know,
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Just rename the temps to ones that compile:)
This time around we are not suspecting that blockClosures and block temps are the problem, we are just trying to get the args to the primitive call when it fails, so we can trace things further in the C code and try determine the code path that leads to a nil return value ... Dale On 09/08/2015 12:49 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
OK Dale, I found out which was the problem, the code of printing should have been placed inside the scanBlock. Anyway..I did that, and then it did not work either because gem was crashing and so I couldn't see the log from GemTools. So I then replaced Transcript show: with "GsFile gciLogServer: " and now I got it the log: --LIST-FAILURE--_scanPomWithMaxThreads failure: 1 95 anIdentitySet( FaSecurityAdjustedClosingPriceRecord) 0 0 nil Doesn't look like wrong, does it? Cheers, On Tue, Sep 8, 2015 at 5:03 PM, Dale Henrichs <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Thanks Mariano - yeah the args look okay - At this point, I'm
suspicious that we're running out of memory during the scan and not
failing "gracefully", but no evidence of that quite yet ...
Dale On 09/08/2015 02:00 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Mariano,
I just talked with engineering and they concur that this is likely to be a malloc failure and the this area of the code has been substantially reworked in recent releases to attempt to reduce the amount of RAM consumed during list instances ... So for 3.1.0.6, you might try this operation with more RAM available or perhaps just adding more swap space will allow the malloc to complete ... running statmon with a 1 second interval and looking at the heap consumption of the gem, might show growth and a "sudden decline" when the malloc fails ... Dale On 09/08/2015 02:51 PM, Dale Henrichs
wrote:
Thanks Mariano - yeah the args look okay - At this point, I'm suspicious that we're running out of memory during the scan and not failing "gracefully", but no evidence of that quite yet ... _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Tue, Sep 8, 2015 at 7:00 PM, Dale Henrichs <[hidden email]> wrote:
Hi Dale, Just for the record, I tried with this scenario: [marianopeck@quuveserver1 ~]$ free -m total used free shared buff/cache available Mem: 8014 388 6850 359 775 7205 Swap: 16639 0 16639 And still didn't work. Note that I have 7GB of RAM free. At the end, when the system crashed, this was the resulting state: [marianopeck@quuveserver1 ~]$ free -m total used free shared buff/cache available Mem: 8014 338 1316 973 6359 6639 Swap: 16639 0 16639 Anyway, no problem, I would assume this is a problem in 3.1.0.6 and hopefully I will never need to list instances / migrate this class until I am in 3.2/3.3... Thanks for the effort!
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 09/09/2015 06:24 AM, Mariano Martinez Peck wrote: > > > Hi Dale, > > Just for the record, I tried with this scenario: > > [marianopeck@quuveserver1 ~]$ free -m > total used free shared buff/cache > available > Mem: 8014 388 6850 359 775 7205 > Swap: 16639 0 16639 > > And still didn't work. Note that I have 7GB of RAM free. At the end, > when the system crashed, this was the resulting state: > > [marianopeck@quuveserver1 ~]$ free -m > total used free shared buff/cache > available > Mem: 8014 338 1316 973 6359 6639 > Swap: 16639 0 16639 > > > Anyway, no problem, I would assume this is a problem in 3.1.0.6 and > hopefully I will never need to list instances / migrate this class > until I am in 3.2/3.3... > > Thanks for the effort! > I'm not sure that I can interpret the `free -m` numbers correctly. Are you confirming that this as a near out of RAM situation? We've got an engineer pursuing the "out of memory" scenario and looking for a smoking gun in the code for 3.1.0.6, so that we can be assured that we don't have an existing bug in 3.2/3.3... Thank you for your help in tracking this down ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Wed, Sep 9, 2015 at 3:36 PM, Dale Henrichs <[hidden email]> wrote:
I am saying that I cannot make it work even with 7GB of RAM free/available...and I also have plenty of swap space from what I can tell. Sounds like this should be plenty of RAM to list 66MM objects (66154585 instances to be accurate). But maybe I am wrong... We've got an engineer pursuing the "out of memory" scenario and looking for a smoking gun in the code for 3.1.0.6, so that we can be assured that we don't have an existing bug in 3.2/3.3... No problem. It usually takes me some time because I must stop everything, restore from backup, modify the listMethod via topaz with SystemUser, then run update code... then as soon as it fail I must revert again with the "corrected" extent so that the system is not that long in a bad state... But still, if this is if help to you by any means, I am happy to continue trying. My offer is still valid if you want to enter via web and have an use some "code workspace" I have or the seaside debugger etc. I could also open log the exception in the object log if you want. Or I could temporary open a port in the firewall in case you want remote-gemtools (but that is very very slow). But as said, we should coordinate the date for this. Cheers,
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 09/09/2015 11:47 AM, Mariano
Martinez Peck wrote:
Well we are still guessing and the problem with looking at the Smalltalk stack is that all of the interesting things that are going on are happening in a separate os process running c code ....having 7GB of free memory does not immediately rule out a "memory problem" - we had a list instances bug that used way too much memory ... a statmon run using 1 second sampling should allow you to see whether or not the gem's memory consumption is rising during the list instances run (we expect it to return to normal after the failure) ... We _are_ still guessing because we have not found a smoking gun yet .... can you tell me whether there is a time delay that causes the error to be raised, or does it happen "immediately" ... the guys are reading code here and we haven't found anything yet ... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
On 09/09/2015 06:24 AM, Mariano
Martinez Peck wrote:
Okay, we've read code and to sorta confirm your experience, we _do not_ return a nil when the malloc fails ... So we're reading more code, but our suspicion now is that you are running out of TOC and the"normal" failure mechanisms aren't being triggered ... to help confirm this suspicion we think that you can try two independent things: 1. trigger an in-vm scavenge before making a call and/or 2. bump up the TOC for that particular vm and see if you can find a size that works ... The journey continues... Dale _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi Dale, Ok, I increased the SPC at 2GB and I put a TOC of 1.8GB. Now, the code update DOES WORK and does not crash anymore. However, the resulting stuff is again the 2 metaclasses / 2 classes for the same class. So I think we are dealing with 2 problems: 1) One was that the listInstances thingy was clearly failing because of TOC size. As you just found out. 2) This kind of code refactor I needed, does not seem to be correctly performed by Monticello. The way to solve this was performing the manual thing that James and Martin recommended at the very beginning of this thread. This change also avoided migration and so avoided the listInstaces issue too. So... I think those are the 2 problems and conclusions. I don't think we should continue investigating more. Thoughts? Thank you very much for keeping searching for this and for the engineers also. On Fri, Sep 11, 2015 at 2:03 PM, Dale Henrichs <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Okay ... now that the bug is characterized we'll be able to
determine if it exists in older versions or not ... the code in this
area has been reworked for 3.2+ ...
Which brings us to the second problem ... since I am entering the bug sweep, it will be worth creating a test case to produce the "2 metaclasses / 2 classes for the same class" and I plan to do that (if I can) and then see if there is a reasonable resolution (not sure:) ... Dale On 09/11/2015 11:31 AM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Fri, Sep 11, 2015 at 4:06 PM, Dale Henrichs <[hidden email]> wrote:
Indeed.
Yes! I will see if I can reproduce that too today. Basically, I had this: Object - FaSecurityClosingPriceRecord (no instances) - SpecialSuperclass - - FaSecurityClosingPriceRecord2 (many instances) - - - FSCPR2a (instances) - - - FSCPR2b (instances) and then I committed a monticello change with this: Object - SpecialSuperclass - - FaSecurityClosingPriceRecord (many instances....and note there is no 2 at the end) - - - FSCPR2a (instances) - - - FSCPR2b (instances) I will see if I can reproduce it too using dummy classes. Cheers,
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Excellent! There's a bug for that[1] ... if you can reproduce it ..
Dale [1] https://github.com/GsDevKit/GsDevKit/issues/74 On 09/11/2015 12:10 PM, Mariano
Martinez Peck wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On Fri, Sep 11, 2015 at 4:17 PM, Dale Henrichs <[hidden email]> wrote:
"Challengeeeee ..... Accepted!!!" like Barny hahahaha. Ok...will see if I can reproduce it.
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Ok...it seems I am being able to reproduce the bug. I have added all the steps and details in the issue tracker. Let me know! Cheers, On Fri, Sep 11, 2015 at 4:28 PM, Mariano Martinez Peck <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |