Hi Eliot, list I'm working here with Pablo (Tesone) on moving forward the Ephemeron implementation. We first installed Eliot's changeset, added a #mourn method and an EphemeronDictionary collection, and then started testing something like this: f := ObjectFinalizer receiver: 'Hello' selector: #logCr. d := EphemeronDictionary new. d at: f put: f. f := nil. Smalltalk garbageCollect. However, as soon as we garbage collect twice, we have a VM crash. We started debugging the VM to see if we could have some more clues. The first thing we noticed is that the first time the GC runs, the mournQueue is nil. This is of course expected because the new finalization mechanism was not active and then there was no need to create the mournQueue. We saw that the mournQueue is actually created in a lazy fashion when putting queuing a mourned object (I refer myself to #queueMourner: and #ensureRoomOnObjStackAt:). So the second time the GC passes, the mournQueue is there. So far ok, but still crashing. The crash happens in the call to markAndTraceObjStackandContents(GIV(mournQueue), 1); after the if (!markAndTraceContents) { return; } But when understanding why, it starts being less clear to us :). We used the printObjStack() function and we saw that: call printObjStack(markStack) call printObjStack(weaklingStack) and we saw in the console some output that makes sense. However, printing the mournQueue in the same manner produces some strange output call printObjStack(mournQueue) head 0xb06e980 cx 18 (18) fmt 10 (10) sz 4092 (4092) myx: 4098 (4098) unmkd topx: 14 next: 0x0 free: 0x0 We noticed that free and next are 0x0 while the others are not... Finally we saw there is isValidObjStack(), that gave us the following results: call isValidObjStack(markStack) => 1 call isValidObjStack(weaklingStack) => 0 p objStackInvalidBecause = "marking but page is unmarked" call isValidObjStack(mournQueue) => 0 p objStackInvalidBecause = "marking but page is unmarked" So we assume that the stack creation is wrong? We are a bit lost in here. Guille and Pablo |
Now we found that the stackObject becomes invalid within compact(). Before arriving to eliminateAndFreeForwardersForPigCompact() the stack is already invalid. -------- Original Message -------- > Hi Eliot, list > > I'm working here with Pablo (Tesone) on moving forward the Ephemeron > implementation. We first installed Eliot's changeset, added a #mourn > method and an EphemeronDictionary collection, and then started testing > something like this: > > f := ObjectFinalizer receiver: 'Hello' selector: #logCr. > d := EphemeronDictionary new. > > d at: f put: f. > > f := nil. > Smalltalk garbageCollect. > > > However, as soon as we garbage collect twice, we have a VM crash. We > started debugging the VM to see if we could have some more clues. > > The first thing we noticed is that the first time the GC runs, the > mournQueue is nil. This is of course expected because the new > finalization mechanism was not active and then there was no need to > create the mournQueue. We saw that the mournQueue is actually created > in a lazy fashion when putting queuing a mourned object (I refer > myself to #queueMourner: and #ensureRoomOnObjStackAt:). So the second > time the GC passes, the mournQueue is there. So far ok, but still > crashing. > > The crash happens in the call to > > markAndTraceObjStackandContents(GIV(mournQueue), 1); > > after the > > if (!markAndTraceContents) { > return; > } > > But when understanding why, it starts being less clear to us :). We > used the printObjStack() function and we saw that: > > call printObjStack(markStack) > call printObjStack(weaklingStack) > > and we saw in the console some output that makes sense. However, > printing the mournQueue in the same manner produces some strange output > > call printObjStack(mournQueue) > > head 0xb06e980 cx 18 (18) fmt 10 (10) sz 4092 (4092) myx: 4098 (4098) > unmkd > topx: 14 next: 0x0 free: 0x0 > > We noticed that free and next are 0x0 while the others are not... > > Finally we saw there is isValidObjStack(), that gave us the following > results: > > call isValidObjStack(markStack) => 1 > > call isValidObjStack(weaklingStack) => 0 > p objStackInvalidBecause = "marking but page is unmarked" > > call isValidObjStack(mournQueue) => 0 > p objStackInvalidBecause = "marking but page is unmarked" > > > So we assume that the stack creation is wrong? We are a bit lost in here. > > Guille and Pablo |
In reply to this post by Guillermo Polito
Hi Guille, Hi Pablo (and welcome),
On Tue, May 17, 2016 at 8:37 AM, Guille Polito <[hidden email]> wrote:
So this looks like something simulate able. Are you able to use the simulator? If not, why not? When debugging the VM there are two main levels of support, one is the simulator where there is maximum support for debugging: - asserts on all the time - arbitrary breakpoints - attempting every GC in a copy of the heap before doing the real GC so that bugs in the GC can be investigated without needing to construct a reproducible case after a crash - the Smalltalk environment to inspect and browse The next level is the assert and debug VMs. If you look in the build directories on the Cog svn branch you'll see that all of them build three VMs, a production VM with maximum optimisation and asserts excluded, an assert VM with -O1 and asserts enabled, and a debug VM with -O0 and asserts enabled. So if you either don't see the bug in the simulator, or the simulator is too slow for the case being examined, or if the bug doesn't show up in the simulator (the worst of all worlds), build both assert and debug VMs and run with the assert VM first. Note that there is a heap leak checker which can be enabled both in the simulator and the assert and debug VMs. See the checkForLeaks method and the -leakcheck argument. Without the simulator or the assert and debug VMs you are flying blind. It is /really/ productive to use the simulator for debugging, provided the bug is reproducible within a short amount of time, as for example your case is above.
_,,,^..^,,,_ best, Eliot |
Hello, -------- Original Message --------
For some reason I have that bytesToShift when opening the image is negative. bytesToShift := objectMemory memoryBaseForImageRead - oldBaseAddr. "adjust pointers for zero base address" So I cannot continue loading because addresses become negative and I have "Improper Store into indexable object kind of errors". Well so far we were using a VM compiled for debug with a graphical C debugger. It was not so bad. However, I cannot say I'm missing a better debugger.
ok!
Ok, gotcha! By this afternoon I'll have some news probably. Thanks a lot!
|
Hi Guille,
On Wed, May 18, 2016 at 12:46 AM, Guille Polito <[hidden email]> wrote:
Where's "here"? Are you in Lille?
That is to be expected. In the real VM the heap is located somewhere well above the bottom of the address space, typically above the program code. In the simulator the heap is located either at 0 (an interpreter or stack VM) or immediately above the code zone (in a Cogit VM). So when an image that has been saved on the real VM is loaded into the simulator all oops have to be adjusted down and hence bytesToShift is negative.
Can you post a back trace? Where is this happening? What version of VMMaker.oscog are you using? Are you running in Pharo or Squeak? If you're in Lille you could perhaps visit Clément's office and get him to take a look. Clément, would that be ok?
"Compiled for debug" is vague. Do you mean it is compiled with -g -O0, or in addition is compiled with -g -O0 -DDEBUGVM=0 -DNDEBUG=1?
_,,,^..^,,,_ best, Eliot |
Hi again! I snip most of the old thread to keep the relevant. -------- Original Message --------
Ok, it looks good for me, I did not know what would be a good place to put them without breaking something :)
cool! I thought so that would be the best. ok!
Ha, there is no debt (or at least it would be in the opposite direction). Now I'm working on another crash that I can easily reproduce using this script: Smalltalk supportsQueueingFinalization: true. e := (1 to: 200000) collect: [ :i | Ephemeron key: (ObjectFinalizer receiver: 'test', 'asd' selector: #logCr) value: Object new ]. Smalltalk garbageCollect. While debugging that in the simulator, we saw with clement that during #fireEphemeronsInRememberedSet, #fireEphemeronsOnEphemeronList we are always firing, tenuring and scavenging ephemerons regardless they are marked or not. fireEphemeronsInRememberedSet [SNIP] coInterpreter fireEphemeron: ephemeron. manager storePointerUnchecked: 0 ofObject: ephemeron withValue: (self copyAndForward: (manager keyOfEphemeron: ephemeron)). (self scavengeReferentsOf: ephemeron) [SNIP] So I put an assertion to check that the key of the ephemeron is marked when scavenging and it is not always the case... self assert: (manager isMarked: (manager keyOfEphemeron: ephemeron)). Is the code wrong? or I'm missing something? Guille
|
Hi Guille,
first, v important. can you put the image(s) you're working with somewhere I can download, and send me a URL? I can be more helpful if I can run the tests too. On Wed, May 25, 2016 at 6:24 AM, Guille Polito <[hidden email]> wrote:
The implementation of ephemerons in the scavenger and the implementation in mark-sweep are quite different. Remember that what we're trying to do is find out if an object is only referenced from the transitive closure of ephemerons or not. And the way that we do that in both cases is avoid processing ephemerons whose keys are not yet reachable from the roots until all objects reachable from the roots have been reached. We do this by putting "unscanned" ephemerons in a queue, saving them until later. In the scavenger the roots are the objects in the remembered set, the interpreter state (newMethod etc) and the stack zone, and GC is performed by copying all objects reachable from these roots in past and new spaces into future space, possibly tenuring overflowed objects into old space. In the mark-sweep the roots are the specialObjectsArray, the interpreter state (newMethod etc) and the stack zone. In the scavenger (a copying collector) this means that when we process the unscanned ephemerons their keys will either have been copied into future space or tenured to old space, in which case they were reachable from the roots, or they will not have been copied yet, in which case they are reachable only from ephemerons. So in the scavenger marking is irrelevant; in fact /no/ objects should be marked when scavenging. The important thing is whether a key is in past and new spaces or is in future and old spaces. In the mark-sweep this means that when we process the unscanned ephemerons their keys will either be marked, in which case they were reachable from the roots, or unmarked, in which case they are reachable only from ephemerons. So the assert for marked-ness only makes sense in ephemeron processing in the mark-sweep collector. HTH
_,,,^..^,,,_ best, Eliot |
Hi Guille,
On Wed, May 25, 2016 at 10:43 AM, Eliot Miranda <[hidden email]> wrote:
Why the extra level of indirection? The Ephemeron /is/ the object finaliser. It can send finalise to its key. Why bother with the extra level of wrapper? It's wasteful; especially if we're attaching lots of ephemerons to things cuz we want to finalise something that has lots of instances.
_,,,^..^,,,_ best, Eliot |
On Wed, May 25, 2016 at 10:46 AM, Eliot Miranda <[hidden email]> wrote:
Ignore this stupid question. I see it's an old facility. And it is a helpful example :-). Sorry for the noise!
_,,,^..^,,,_ best, Eliot |
Hi Eliot, - Regarding yesterday's email: yesterday I took the time with Pablo to read and understand the entire scavenging and I understood that my first email was nonsense :). * We were thinking on enhancing the comment in #scavengeUnfiredEphemeronsInRememberedSet to explain how the remembered set is managed. It was a bit unclear at first for us that the ephemerons to fire were being swapped to occupy the first places, which was important afterwards for firing them. * Also, there is something that we think is a bug: in the whole ephemeron processing during scavenging, we check always if the key has survived the scavenge. However, we never check if the key is old. If the key is old it looks like the algorithm is treating the ephemerons as ephemerons to fire, while it should not, isn't it? - Regarding the images to test, I'm actually using a plain Pharo image with your changes (+ modifications in the class builder to create Ephemeric classes). The snippets that I share crash those images. I'll try to push these changes into Pharo to make it even easier to reproduce. - Then, about the finalizer is just a facility as you say. The important thing is that with those snippets I can reproduce the crashes with almost 100% probability and no manual intervention :). Guille -------- Original Message --------
|
Hi Guille,
I hope today to get your example with 200k ObjectFinalizers working. It works with 50k. Of course the difference is really important as I suspect that with 200k there is tenuring but with only 50k there is none (Eden is large at 4mb). Ah, I remember I found I could get the same crash with only 100k elements but not with 75k. So the smallest case we should investigate is 100k. I might try running with a smaller eden to see if I can provoke the bug faster. The 100k case in the simulator is slow enough (~ 5 mins) to be annoying and tempt me into picking up my guitar or some such :-/.
|
Morning I'm curious if anyone has studied the time taken to morn & finalize a weak object. I ask this because in my past work with VW and VA the most interesting GC issues I tackled related to how much time finalization took versus how long the engineer thought it might take. Thought on this should relate to normal processing, or situations where the engineer is trying to force the finalization via interaction with the GC logic. -- =========================================================================== John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk =========================================================================== |
Yes, see the 2015 IWST paper re: Linked Weak Reference Arrays. You can download the paper using the link here: http://www.esug.org/wiki/pier/Conferences/2015/International-Workshop-IWST_15/IWST15 See also the short version: http://blogten.blogspot.com/2015/07/linked-weak-reference-arrays-paper.html Andres. On 5/28/16 10:14 , John McIntosh wrote: > > > > > Morning > > I'm curious if anyone has studied the time taken to morn & finalize a > weak object. I ask this because in my past work with VW and VA the most > interesting GC issues I tackled related to how much time finalization > took versus how long the engineer thought it might take. > > Thought on this should relate to normal processing, or situations where > the engineer is trying to force the finalization via interaction with > the GC logic. > > > > > On Thursday, 26 May 2016, Eliot Miranda <[hidden email] > <mailto:[hidden email]>> wrote: > > > > > > -- > =========================================================================== > John M. McIntosh. Corporate Smalltalk Consulting > Ltd https://www.linkedin.com/in/smalltalk > =========================================================================== > |
Free forum by Nabble | Edit this page |