problem with #become, GC and proxies for compiled methods

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with #become, GC and proxies for compiled methods

Mariano Martinez Peck
 
Hi guys. I am doing some hacky things and I have a problem. I am sure it is not VM's fault but mine. I would just like to understand what could be happening.

Scenario: I have developed an "object graph swapper" which basically takes a graph, and replaces each object of the graph by a proxy. Then the graph is serialized and swapped out to a file. The proxies then intercept messages and materialize from file, and replaces proxies with the materialized objects. When I am swapping out, after I become original objects to proxies, the only reachable proxies are the proxies for the roots and for objects inside the graph which were also referenced from outside the graph. All the rest of the proxies (for objects only reachable from inside the graph) can be garbage collected without problem.

Problem: I am swapping out several graphs one after another one. If after swapping out each graph, I do a GC, then I have no problem. I can swap lots of graphs. However, if I start to swap several graphs but I do not run a gc after each graph, then I have a crash in the VM. The crash happens when becoming original objects to proxies. More precisely it crashes in:

remappedObj: forwardedObj
    "Answer the given forwardedOop's target value
     during a compaction or become: operation."
    | fwdBlock targetObj |
    <inline: true>
    fwdBlock := self forwardingPointerOf: forwardedObj.
    self assert: (self fwdBlockValid: fwdBlock).
    targetObj := self longAt: fwdBlock.
    self assert: (self addressCouldBeObjWhileForwarding: targetObj).
    ^targetObj

line  targetObj := self longAt: fwdBlock.
it seems fwdBlock has a negative value (I debugged the VM and checked its value) and then the longAt:   gives EXC_BAD_ACCESS.

My question is, can you imagine something that could be causing the crash?  I would like to understand what is happening. Maybe I am becoming objects which are not reachable anymore and that fails?
Any hint is really appreciated.

Some investigations: sometimes it fails, but sometimes it works (I think this is because on when the GC runs). It seems that when it breaks, in #remapFieldsAndClassOf:   the oop is a CompiledMethod and so the fieldOffset is the last literal, that is the assocation to the class. Since I also replace classes with proxies, such association has a ClassProxy in its value. Such class contains the 3 intsVars superclass, methodDict and format so that not to crash VM. Format is set with ClassProxy format.  
More information is that it doesn't crash in StackVM... moreover, it doesn't crash if I DO NOT create proxies for CompiledMethod .... looks like something with them ... So.... can this give you a hint?

Thanks

--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Mariano Martinez Peck
 


On Mon, Jan 30, 2012 at 5:05 PM, Mariano Martinez Peck <[hidden email]> wrote:
Hi guys. I am doing some hacky things and I have a problem. I am sure it is not VM's fault but mine. I would just like to understand what could be happening.

Scenario: I have developed an "object graph swapper" which basically takes a graph, and replaces each object of the graph by a proxy. Then the graph is serialized and swapped out to a file. The proxies then intercept messages and materialize from file, and replaces proxies with the materialized objects. When I am swapping out, after I become original objects to proxies, the only reachable proxies are the proxies for the roots and for objects inside the graph which were also referenced from outside the graph. All the rest of the proxies (for objects only reachable from inside the graph) can be garbage collected without problem.

Problem: I am swapping out several graphs one after another one. If after swapping out each graph, I do a GC, then I have no problem. I can swap lots of graphs. However, if I start to swap several graphs but I do not run a gc after each graph, then I have a crash in the VM. The crash happens when becoming original objects to proxies. More precisely it crashes in:

remappedObj: forwardedObj
    "Answer the given forwardedOop's target value
     during a compaction or become: operation."
    | fwdBlock targetObj |
    <inline: true>
    fwdBlock := self forwardingPointerOf: forwardedObj.
    self assert: (self fwdBlockValid: fwdBlock).
    targetObj := self longAt: fwdBlock.
    self assert: (self addressCouldBeObjWhileForwarding: targetObj).
    ^targetObj

line  targetObj := self longAt: fwdBlock.
it seems fwdBlock has a negative value (I debugged the VM and checked its value) and then the longAt:   gives EXC_BAD_ACCESS.

My question is, can you imagine something that could be causing the crash?  I would like to understand what is happening. Maybe I am becoming objects which are not reachable anymore and that fails?
Any hint is really appreciated.

Some investigations: sometimes it fails, but sometimes it works (I think this is because on when the GC runs). It seems that when it breaks, in #remapFieldsAndClassOf:   the oop is a CompiledMethod and so the fieldOffset is the last literal, that is the assocation to the class. Since I also replace classes with proxies, such association has a ClassProxy in its value. Such class contains the 3 intsVars superclass, methodDict and format so that not to crash VM. Format is set with ClassProxy format.  
More information is that it doesn't crash in StackVM... moreover, it doesn't crash if I DO NOT create proxies for CompiledMethod .... looks like something with them ... So.... can this give you a hint?



Doing some more debugging, it always failing updatePointersInRangeFromto  and always the same reason. The OOP is always a compiled method and fails in the same place. After looking a bit, the final cause is that fwdBlock1 is incorrect...but it is incorrect because of previous other incorrect variables:

fwdBlock1 -> fieldOop -> fieldOffset -> numLiterals -> headerPointer 

So from what I can see in:

    /* begin literalCountOfHeader: */
            /* begin headerOf: */
            methodHeader = longAt((oop + BaseHeaderSize) + (HeaderIndex << ShiftForWord));
            headerPointer = (isCogMethodReference(methodHeader)
                ? (assert(((((CogMethod *) methodHeader)->objectHeader)) == (nullHeaderForMachineCodeMethod())),
                    (((CogMethod *) methodHeader)->methodHeader))
                : methodHeader);
            numLiterals = (((usqInt) headerPointer) >> 10) & 255;

it looks like if the oop is correct, but headerPointer finishes with an incorrect (negative) value. This is return by #rawHeaderOf:
So...to sum up, I have a compiled method which seems to have its header corrumpted.

If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Thanks
 
Thanks

--
Mariano
http://marianopeck.wordpress.com




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Stefan Marr-3

Hi:

On 30 Jan 2012, at 18:30, Mariano Martinez Peck wrote:

> If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Wild guess: did you invalidate all the caches that a GC usually invalidates? Thinks like primitive caches etc? Not sure what is there in Cog that could be problematic.

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Mariano Martinez Peck
 


On Mon, Jan 30, 2012 at 6:47 PM, Stefan Marr <[hidden email]> wrote:

Hi:

On 30 Jan 2012, at 18:30, Mariano Martinez Peck wrote:

> If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Wild guess: did you invalidate all the caches that a GC usually invalidates? Thinks like primitive caches etc? Not sure what is there in Cog that could be problematic.

Thanks Stefan. I tried to do a Object flush after each graph I swap out, but still, same problem :(
Thanks for the hint anyway.
 

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525">+32 2 629 3525




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Eliot Miranda-2
 
Hi Mariano,

On Mon, Jan 30, 2012 at 9:57 AM, Mariano Martinez Peck <[hidden email]> wrote:
 


On Mon, Jan 30, 2012 at 6:47 PM, Stefan Marr <[hidden email]> wrote:

Hi:

On 30 Jan 2012, at 18:30, Mariano Martinez Peck wrote:

> If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Wild guess: did you invalidate all the caches that a GC usually invalidates? Thinks like primitive caches etc? Not sure what is there in Cog that could be problematic.

Thanks Stefan. I tried to do a Object flush after each graph I swap out, but still, same problem :(
Thanks for the hint anyway.

While it may be difficult for Cog to get this right, since there are two copies of a jitted method, the original and the machine-code version, it would be great if Cog could have it right.  So give than you have a reproducible case could you do me a huge favour and create an image that reproduces the bug?  Please create the image as a doit that saves the image and then continues to runt to the crash.  e.g. if the code that provokes the crash is "MyTest crash" evaluate this in a workspace, and then verify when you start-up the resulting snapshot that the image crashes:

    SmalltalkImage current snapshot: true andQuit: true.
    MyTest crash

or

    SmalltalkImage current
        garbageCollect;
        snapshot: true andQuit: true.
    MyTest crash

etc
 
 

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974" target="_blank">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525" target="_blank">+32 2 629 3525




--
Mariano
http://marianopeck.wordpress.com





--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Mariano Martinez Peck
 


On Mon, Jan 30, 2012 at 7:05 PM, Eliot Miranda <[hidden email]> wrote:
Hi Mariano,

On Mon, Jan 30, 2012 at 9:57 AM, Mariano Martinez Peck <[hidden email]> wrote:
 


On Mon, Jan 30, 2012 at 6:47 PM, Stefan Marr <[hidden email]> wrote:

Hi:

On 30 Jan 2012, at 18:30, Mariano Martinez Peck wrote:

> If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Wild guess: did you invalidate all the caches that a GC usually invalidates? Thinks like primitive caches etc? Not sure what is there in Cog that could be problematic.

Thanks Stefan. I tried to do a Object flush after each graph I swap out, but still, same problem :(
Thanks for the hint anyway.

While it may be difficult for Cog to get this right, since there are two copies of a jitted method, the original and the machine-code version, it would be great if Cog could have it right.  So give than you have a reproducible case could you do me a huge favour and create an image that reproduces the bug?  Please create the image as a doit that saves the image and then continues to runt to the crash.  e.g. if the code that provokes the crash is "MyTest crash" evaluate this in a workspace, and then verify when you start-up the resulting snapshot that the image crashes:


    SmalltalkImage current snapshot: true andQuit: true.
    MyTest crash



Hi Eliot. While trying to make the image for you, I found that if the test is run from the image start, as suggested above, the crash doesn't happen. If I do the same but from a "DoIt" in a workspace, then it happens. So I think that maybe I am becoming some contexts/compiledMethods from the executor (the DoIt) of the test...kind of shooting my own feets. But it is difficult for me to know whether a context or a method or whatever comes from the executor of the test or not....

If you think that even if you have to do a do-it from a workspace it helps, then I am happy to upload an image. Otherwise I will continue investigate what is wrong...
 
THanks!
or

    SmalltalkImage current
        garbageCollect;
        snapshot: true andQuit: true.
    MyTest crash

etc

 
 

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974" target="_blank">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525" target="_blank">+32 2 629 3525




--
Mariano
http://marianopeck.wordpress.com





--
best,
Eliot




--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: problem with #become, GC and proxies for compiled methods

Eliot Miranda-2
 


On Mon, Jan 30, 2012 at 11:19 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Mon, Jan 30, 2012 at 7:05 PM, Eliot Miranda <[hidden email]> wrote:
Hi Mariano,

On Mon, Jan 30, 2012 at 9:57 AM, Mariano Martinez Peck <[hidden email]> wrote:
 


On Mon, Jan 30, 2012 at 6:47 PM, Stefan Marr <[hidden email]> wrote:

Hi:

On 30 Jan 2012, at 18:30, Mariano Martinez Peck wrote:

> If with this piece of data, together with what I mentioned in the first email, you have a hint, please tell me :)

Wild guess: did you invalidate all the caches that a GC usually invalidates? Thinks like primitive caches etc? Not sure what is there in Cog that could be problematic.

Thanks Stefan. I tried to do a Object flush after each graph I swap out, but still, same problem :(
Thanks for the hint anyway.

While it may be difficult for Cog to get this right, since there are two copies of a jitted method, the original and the machine-code version, it would be great if Cog could have it right.  So give than you have a reproducible case could you do me a huge favour and create an image that reproduces the bug?  Please create the image as a doit that saves the image and then continues to runt to the crash.  e.g. if the code that provokes the crash is "MyTest crash" evaluate this in a workspace, and then verify when you start-up the resulting snapshot that the image crashes:


    SmalltalkImage current snapshot: true andQuit: true.
    MyTest crash



Hi Eliot. While trying to make the image for you, I found that if the test is run from the image start, as suggested above, the crash doesn't happen. If I do the same but from a "DoIt" in a workspace, then it happens. So I think that maybe I am becoming some contexts/compiledMethods from the executor (the DoIt) of the test...kind of shooting my own feets. But it is difficult for me to know whether a context or a method or whatever comes from the executor of the test or not....

If you think that even if you have to do a do-it from a workspace it helps, then I am happy to upload an image. Otherwise I will continue investigate what is wrong...

Just keep fiddling adding other things to run (e.g. open a browser programmatically etc).  There's very little chance of debugging this from an expression in a workspace.  Even if you give me the expression I will have to spend my time making it reproducible from start-up to have a chance of making sense of the crash.  So please, can you keep trying?

Basically the debugging process involves analysing, working back from the crash in subsequent runs.  Without a case that reproduces from start-up it is IME very difficult to make sense of.
 
 
THanks!
or

    SmalltalkImage current
        garbageCollect;
        snapshot: true andQuit: true.
    MyTest crash

etc

 
 

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974" target="_blank">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525" target="_blank">+32 2 629 3525




--
Mariano
http://marianopeck.wordpress.com





--
best,
Eliot




--
Mariano
http://marianopeck.wordpress.com




--
best,
Eliot