Image Segment semantics and weakness

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Image Segment semantics and weakness

Eliot Miranda-2
 
Hi All,

    I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation.  Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots.  Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment.  Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)
- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)
not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment?  I think that it makes no difference, and excluding them is only an optimization.  The argument is as follows.  Imagine that immediately after loading the image segment there is a garbage collection.  That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment.  Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.  

An analogous argument accounts for objects reachable from ephemerons.  Is my reasoning sound?
--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Andres Valloud-4
 
Are you saying that, for the purpose of tracing, the segment roots
behave like ephemeron key slots?

On 10/19/14 18:01 , Eliot Miranda wrote:
> The segment always includes the segment roots.  Except for the roots,
> objects are excluded from the segment that are also reachable form the
> roots of the system (the /system roots/, effectively the root
> environment, Smalltalk, and the stack of the current process).
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Andres Valloud-4
In reply to this post by Eliot Miranda-2
 
At first glance this sounds ok to me... however, I'd feel better if GC
cleaned up the recently loaded segment before anything in the segment
can become active or receive a message or be visible elsewhere.  If that
does not happen, there will be a window of opportunity for really weird
stuff to occur.

For example, consider what happens if you have an ephemeron reachable
from a weak array, the freshly loaded segment turns the weak array into
a strong array (not a far fetched example --- this is done in practice
for performance), and now you have an ephemeron that can go ahead and do
random stuff in finalization.  But how would that ephemeron know it is
no longer living in the image where it was instantiated?

Assuming I am still understanding image segments correctly, the more I
think about them, the more I start liking loadable modules from
declarative specifications --- in that other world, such weird brain
surgery stuff is impossible.  Nevertheless, I understand you may be in a
situation where you have to support existing features.

On 10/19/14 18:01 , Eliot Miranda wrote:

> - /not/ reachable form the system roots, /not/ reachable from the
> segment roots via strong pointers
>
> Should this last category be included or excluded from the segment?  I
> think that it makes no difference, and excluding them is only an
> optimization.  The argument is as follows.  Imagine that immediately
> after loading the image segment there is a garbage collection.  That
> garbage collection will collect all the objects in the last category as
> they are only reachable from the weak arrays in the segment.  Hence we
> are free to follow weak references as if they are strong when we create
> the image segment, leaving it to subsequent events to reclaim those
> objects.
>
> An analogous argument accounts for objects reachable from ephemerons.
> Is my reasoning sound?
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Eliot Miranda-2
In reply to this post by Andres Valloud-4
 


On Sun, Oct 19, 2014 at 6:14 PM, Andres Valloud <[hidden email]> wrote:

Are you saying that, for the purpose of tracing, the segment roots behave like ephemeron key slots?

Yes, that's a nice way of looking at it.  But I'm not saying that.  That's how image segments work.  The way they're implemented makes this clear.  To construct the set of objects in the segment the system
- starts off with all objects unmarked
- marks the segment roots
- marks the system, starting from the system roots

At the end of this process, unmarked objects reachable from the segment roots are only accessible from the system roots and are included in the set.

On 10/19/14 18:01 , Eliot Miranda wrote:
The segment always includes the segment roots.  Except for the roots,
objects are excluded from the segment that are also reachable form the
roots of the system (the /system roots/, effectively the root
environment, Smalltalk, and the stack of the current process).



--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Eliot Miranda-2
In reply to this post by Andres Valloud-4
 
Hi Andreas,

On Sun, Oct 19, 2014 at 6:23 PM, Andres Valloud <[hidden email]> wrote:

At first glance this sounds ok to me... however, I'd feel better if GC cleaned up the recently loaded segment before anything in the segment can become active or receive a message or be visible elsewhere.  If that does not happen, there will be a window of opportunity for really weird stuff to occur.

Oh thats a very good point.  I new it was wise to ask :-)  I'll sleep on this but at the moment I'm close to being convinced that there either needs to be the equivalent of a GC immediately on loading, or the equivalent of a GC on creating the segment, so as not to include category three objects in the segment.  Damn, I suspect that's tricky to implement efficiently ;-)

For example, consider what happens if you have an ephemeron reachable from a weak array, the freshly loaded segment turns the weak array into a strong array (not a far fetched example --- this is done in practice for performance), and now you have an ephemeron that can go ahead and do random stuff in finalization.  But how would that ephemeron know it is no longer living in the image where it was instantiated?

Assuming I am still understanding image segments correctly, the more I think about them, the more I start liking loadable modules from declarative specifications --- in that other world, such weird brain surgery stuff is impossible.  Nevertheless, I understand you may be in a situation where you have to support existing features.

Yes, image segments are used heavily in etoys (project saving and loading) and Terf (exchanging behavior and initial state between replicas).   So Spur does need to support image segments if it is to be easy to adopt.  And that's always been an important criterion for Cog.

On 10/19/14 18:01 , Eliot Miranda wrote:
- /not/ reachable form the system roots, /not/ reachable from the
segment roots via strong pointers

Should this last category be included or excluded from the segment?  I
think that it makes no difference, and excluding them is only an
optimization.  The argument is as follows.  Imagine that immediately
after loading the image segment there is a garbage collection.  That
garbage collection will collect all the objects in the last category as
they are only reachable from the weak arrays in the segment.  Hence we
are free to follow weak references as if they are strong when we create
the image segment, leaving it to subsequent events to reclaim those
objects.

An analogous argument accounts for objects reachable from ephemerons.
Is my reasoning sound?



--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Andres Valloud-4
 
How about this?

Step 1.  Mark segment roots with mark bit B.

Step 2.  Trace from system roots marking with mark bit A.

Step 3.  Trace from segment roots marking with mark bit B.

Step 4.  Hey, all those objects that could not be referenced from either
the system roots or the segment roots are garbage!  And between mark
bits A and B, there is now a complete trace of the system.  Finish off
the GC, and unset any mark bits A only.  This should fix weak arrays.

Step 5.  Trace from the segment roots and write anything marked with
mark bit B to the segment, unsetting mark bits B along the way.

The key is to use 2 mark bits, where setting either means "the object is
marked" as far as the GC is concerned.  What, no available header bits?
  Surely it's possible to pull off a Peter Deutsch, "let's squirrel away
information in bits used for something else as long as nobody notices"
kind of trick :).

Ephemerons are still problematic.  Should they be finalized both in the
host image, as well as in other images that load the image segment?  I'd
consider letting ephemerons queued for finalization during step 4 become
regular strong objects in the image segment, such that loading them into
a new image does not trigger random finalization.  Steps 4 and 5 produce
that for free (because ephemerons in the finalization queue become
regular strong objects).

Andres.

On 10/19/14 19:05 , Eliot Miranda wrote:
>
> Oh thats a very good point.  I new it was wise to ask :-)  I'll sleep on
> this but at the moment I'm close to being convinced that there either
> needs to be the equivalent of a GC immediately on loading, or the
> equivalent of a GC on creating the segment, so as not to include
> category three objects in the segment. Damn, I suspect that's tricky to
> implement efficiently ;-)
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Eliot Miranda-2

Hi Andres,

On Oct 19, 2014, at 7:54 PM, Andres Valloud <[hidden email]> wrote:

> How about this?
>
> Step 1.  Mark segment roots with mark bit B.
>
> Step 2.  Trace from system roots marking with mark bit A.
>
> Step 3.  Trace from segment roots marking with mark bit B.
>
> Step 4.  Hey, all those objects that could not be referenced from either the system roots or the segment roots are garbage!  And between mark bits A and B, there is now a complete trace of the system.  Finish off the GC, and unset any mark bits A only.  This should fix weak arrays.
>
> Step 5.  Trace from the segment roots and write anything marked with mark bit B to the segment, unsetting mark bits B along the way.
>
> The key is to use 2 mark bits, where setting either means "the object is marked" as far as the GC is concerned.  What, no available header bits?  Surely it's possible to pull off a Peter Deutsch, "let's squirrel away information in bits used for something else as long as nobody notices" kind of trick :).

Yes that'll work nicely.  At one stage the Spur implementation has all the objects in the segment and they have their mark bits set, plus a copy of these objects in a word array that becomes the segment.  Next step is to enumerate references in the copy, mapping oops to either segment-relative or indexes into the out pointers, the external references.  At that point I can set the second mark bit on all objects strongly referenced from within the segment.  Then a second pass of weak objs can nil refs to internal objs without the second mark but set.


> Ephemerons are still problematic.  Should they be finalized both in the host image, as well as in other images that load the image segment?  I'd consider letting ephemerons queued for finalization during step 4 become regular strong objects in the image segment, such that loading them into a new image does not trigger random finalization.  Steps 4 and 5 produce that for free (because ephemerons in the finalization queue become regular strong objects).

IMO the only sane thing is to treat then as strong objects.  If something gets finalized twice so be it; there's no way to complete finalization while the segment is being created anyway, so even if segment creation queues ephemerons  fir finalization they'll still be in the segment.  There are lots of potential issues here which the segment creation primitive can't deal with ( stale file handles, dangling C pointers ), so expecting it to work miracles with ephemeral references is a waste of effort.  KISS.  No queuing of ephemerons for finalization; just mark as if strong.


>
> Andres.
>
> On 10/19/14 19:05 , Eliot Miranda wrote:
>>
>> Oh thats a very good point.  I new it was wise to ask :-)  I'll sleep on
>> this but at the moment I'm close to being convinced that there either
>> needs to be the equivalent of a GC immediately on loading, or the
>> equivalent of a GC on creating the segment, so as not to include
>> category three objects in the segment. Damn, I suspect that's tricky to
>> implement efficiently ;-)

Eliot (phone)
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Andres Valloud-4
 

>> Ephemerons are still problematic.  Should they be finalized both in
>> the host image, as well as in other images that load the image
>> segment?  I'd consider letting ephemerons queued for finalization
>> during step 4 become regular strong objects in the image segment,
>> such that loading them into a new image does not trigger random
>> finalization.  Steps 4 and 5 produce that for free (because
>> ephemerons in the finalization queue become regular strong
>> objects).
>
> IMO the only sane thing is to treat then as strong objects.  If
> something gets finalized twice so be it; there's no way to complete
> finalization while the segment is being created anyway, so even if
> segment creation queues ephemerons  fir finalization they'll still be
> in the segment.  There are lots of potential issues here which the
> segment creation primitive can't deal with ( stale file handles,
> dangling C pointers ), so expecting it to work miracles with
> ephemeral references is a waste of effort.  KISS.  No queuing of
> ephemerons for finalization; just mark as if strong.

Hang on, why would the finalization queue live in the segment?  I'd
imagine it will be referenced from the system roots, so queuing
ephemerons will do that in the host image.  By the time the queued
ephemerons are seen in the segment by someone else, they will be strong
objects (because they got to the finalization queue) so they won't
finalize when loaded from a segment by construction.

Ideally, the mechanism would treat both weak arrays and ephemerons'
gc-special behavior the same way.  So, if the desire is to prevent weak
arrays from pulling in garbage, then I'd say that finalizing ephemerons
in the host image should also happen for the sake of consistency.

I agree that if some _other_ ephemerons try to finalize after loading
the segment... how about users do that at their own risk? :)

Andres.
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Eliot Miranda-2

Hi Andres,

Eliot (phone)

On Oct 19, 2014, at 9:05 PM, Andres Valloud <[hidden email]> wrote:

>>> Ephemerons are still problematic.  Should they be finalized both in
>>> the host image, as well as in other images that load the image
>>> segment?  I'd consider letting ephemerons queued for finalization
>>> during step 4 become regular strong objects in the image segment,
>>> such that loading them into a new image does not trigger random
>>> finalization.  Steps 4 and 5 produce that for free (because
>>> ephemerons in the finalization queue become regular strong
>>> objects).
>>
>> IMO the only sane thing is to treat then as strong objects.  If
>> something gets finalized twice so be it; there's no way to complete
>> finalization while the segment is being created anyway, so even if
>> segment creation queues ephemerons  fir finalization they'll still be
>> in the segment.  There are lots of potential issues here which the
>> segment creation primitive can't deal with ( stale file handles,
>> dangling C pointers ), so expecting it to work miracles with
>> ephemeral references is a waste of effort.  KISS.  No queuing of
>> ephemerons for finalization; just mark as if strong.
>
> Hang on, why would the finalization queue live in the segment?  

I didn't say they did.  My point re finalization is that the segment creation prim only dies a mark and an unmarked, not a GC.  So no finalization is done then.


> I'd imagine it will be referenced from the system roots, so queuing ephemerons will do that in the host image.  By the time the queued ephemerons are seen in the segment by someone else, they will be strong objects (because they got to the finalization queue) so they won't finalize when loaded from a segment by construction.
>
> Ideally, the mechanism would treat both weak arrays and ephemerons' gc-special behavior the same way.  So, if the desire is to prevent weak arrays from pulling in garbage, then I'd say that finalizing ephemerons in the host image should also happen for the sake of consistency.
>
> I agree that if some _other_ ephemerons try to finalize after loading the segment... how about users do that at their own risk? :)
>
> Andres.
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Andres Valloud-4
 
>> Hang on, why would the finalization queue live in the segment?
>
> I didn't say they did.  My point re finalization is that the segment
> creation prim only does a mark and an unmarked, not a GC.  So no
> finalization is done then.

Does performing a GC necessarily imply a compact in Spur?

Andres.
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Eliot Miranda-2

Hi Andres,

On Oct 19, 2014, at 9:21 PM, Andres Valloud <[hidden email]> wrote:

>>> Hang on, why would the finalization queue live in the segment?
>>
>> I didn't say they did.  My point re finalization is that the segment
>> creation prim only does a mark and an unmarked, not a GC.  So no
>> finalization is done then.
>
> Does performing a GC necessarily imply a compact in Spur?

Only in the scavenger.  But the way the GC is coded there is always some number of compaction passes, 2 in a normal stop-the-world GC and two for the GC on snapshot.

But there are other activities, the sweep to free and coalesce reclaimed space, the sweep to nil deal pointers in, and queue weak arrays, the reaching of a fixed point in the tracing and firing if ephemerons.  None if these are necessary for segment creation.

>
> Andres.


Eliot (phone)
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

J. Vuletich (mail lists)
In reply to this post by Eliot Miranda-2
 

Hi Eliot,

Quoting Eliot Miranda <[hidden email]>:

Hi All,
 
    I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation.  Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.
 
Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.
 
The segment always includes the segment roots.  Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).
 
Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment.  Objects referenced from that weak array may be in one of three categories
 
- reachable from the system roots (and hence not to be included in the segment)
- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)
not reachable form the system roots, not reachable from the segment roots via strong pointers
 
Should this last category be included or excluded from the segment?  I think that it makes no difference, and excluding them is only an optimization.  The argument is as follows.  Imagine that immediately after loading the image segment there is a garbage collection.  That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment.  Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.  
 
An analogous argument accounts for objects reachable from ephemerons.  Is my reasoning sound?
--
best,
Eliot


I think you are right. But there is a risk of somehow, someone, gaining a strong reference to the object after the image segment was created, breaking our invariants when the segment is loaded again.

An object might be (not reachable / strongly reachable / weakely reachable) from system roots and / or segment roots. This gives us 9 possibilities. Six of them are easy (and I'll not go into them). The other three are tricky:

a- Not reachable from system roots. Weakely reachable from segment roots.
Do not include them. It is best to run a GC before building the image segment, to get rid of them (run termination, etc). This is to avoid the risk of the object gaining somehow a strong reference after the segment is built, making the segment miss the weak ref to it. Doing this way would also mean that any objects affected by termination would be consistent, both in the image and in the segment.

b- Weakely reachable from system roots. Weakely reachable from segment roots.
Do not include them. If the object manages to survive by gaining a strong ref from the system roots, the weak ref will be repaired on segment load (Am I right on this?) If the original object was included in the segment, then on segment load it would point to a duplicate object that is about to be collected (and maybe terminated?) In any case, doing this way would also mean that any objects affected by termination would be consistent, both in the image and in the segment.

c- Weakely reachable from system roots. Strongly reachable from segment roots.
Do include them. It seems reasonable to run a GC and get rid of them after unloading the segment, to avoid the risk of the object gaining somehow a strong ref in the image, and being duplicated on segment load. But doing as I say means that we would be loading into the image an object that was already terminated, although in the state it had before running termination. Not really sure if this is ok. There could be some risk of objects in the segment being in some pre-termination state, with some objects in the image being in some after-termination state. In any case, this would suggest bad design... So perhaps it makes sense to throw an exception in these cases?

I hope this rant is of use.

Cheers,
Juan Vuletich

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image Segment semantics and weakness

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
 


On Mon, Oct 20, 2014 at 8:26 AM, stepharo <[hidden email]> wrote:
While I as a big fan of imageSegment and proposed to mariano to work on imageSegment2 (it was the original idea for his phd)
he convinced us that imagesegment were not worth their complexity.

I absolutely agree.
 
So why do you want to have imageSegment?

Because of backwards-compatibility.  If Spur does not provide image segments then the barrier to entry for Terf, eToys and Squeak may be too high.  Spur is supposed to be a plug-in replacement for Cog, not something that requires lots of effort to port to.
 
Stef



On 20/10/14 03:01, Eliot Miranda wrote:
Hi All,

    I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation.  Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots.  Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment.  Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)
- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)
not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment?  I think that it makes no difference, and excluding them is only an optimization.  The argument is as follows.  Imagine that immediately after loading the image segment there is a garbage collection.  That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment.  Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.  

An analogous argument accounts for objects reachable from ephemerons.  Is my reasoning sound?
--
best,
Eliot




--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image Segment semantics and weakness

EstebanLM
 

On 20 Oct 2014, at 21:41, Eliot Miranda <[hidden email]> wrote:



On Mon, Oct 20, 2014 at 8:26 AM, stepharo <[hidden email]> wrote:
While I as a big fan of imageSegment and proposed to mariano to work on imageSegment2 (it was the original idea for his phd)
he convinced us that imagesegment were not worth their complexity.

I absolutely agree.
 
So why do you want to have imageSegment?

Because of backwards-compatibility.  If Spur does not provide image segments then the barrier to entry for Terf, eToys and Squeak may be too high.  Spur is supposed to be a plug-in replacement for Cog, not something that requires lots of effort to port to.

but… (and tell me if I’m saying something stupid), it would be probably better to ask the guys using ImageSegments to spend some time doing an adaptor to use fuel (who is already there, works fine and faster than ImageSegments itself). In the not-so-long term, is better investment that make you replicate a technology that we all agree is not the best option (also, I would bet is better to use your valuable time in other stuff). 
Is not that there is no alternative to IS… and also, the IS binary format for Spur will not be compatible with the older one, so… why not?

anyway, that’s my 2c

Esteban

 
Stef



On 20/10/14 03:01, Eliot Miranda wrote:
Hi All,

    I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation.  Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots.  Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment.  Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)
- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)
not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment?  I think that it makes no difference, and excluding them is only an optimization.  The argument is as follows.  Imagine that immediately after loading the image segment there is a garbage collection.  That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment.  Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.  

An analogous argument accounts for objects reachable from ephemerons.  Is my reasoning sound?
--
best,
Eliot




--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Mariano Martinez Peck
In reply to this post by Eliot Miranda-2
 
Just a quick note I would like to share....
For my PhD, I did investigate ImageSegment very very deeply:


I didn't want to write Fuel just because. I took quite a lot of time to understand how ImageSegment primitives worked. From that effort, I remember a few conclusions:

1) I found only few users of ImageSegment
2) The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.   
3) I noticed I could achieve the same performance or even better with an OO serializer built at the language side, with all the benefits this means. Of course, having Cog helped here....

you can find some benchmark comparison agains IS. Also in my PhD: http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf

Cheers, 





On Mon, Oct 20, 2014 at 9:56 PM, <[hidden email]> wrote:
Hi Eliot,

> Hi All,
>
>     I want to check my understanding of reference semantics for image
> segments as I'm close to completing the Spur implementation.  Specifically
> the question is whether objects reachable only through weak pointers
> should
> be included in an image segment or not.
>
> Remember that an image segment is created from the transitive closure of
> an
> Array of root objects, the *segment roots*. i.e. we can think of an image
> segment as a set of objects created by tracing the object graph from the
> segment roots.
>
> The segment always includes the segment roots.  Except for the roots,
> objects are excluded from the segment that are also reachable form the
> roots of the system (the *system roots*, effectively the root environment,
> Smalltalk, and the stack of the current process).
>
> Consider a weak array in the transitive closure that is not reachable from
> the system roots, and hence should be included in the segment.  Objects
> referenced from that weak array may be in one of three categories
>
> - reachable from the system roots (and hence not to be included in the
> segment)
> - *not* reachable form the system roots, but reachable from the segment
> roots via strong pointers (and hence to be included in the segment)
> - *not* reachable form the system roots, *not* reachable from the segment
> roots via strong pointers
>
> Should this last category be included or excluded from the segment?  I
> think that it makes no difference, and excluding them is only an
> optimization.  The argument is as follows.  Imagine that immediately after
> loading the image segment there is a garbage collection.  That garbage
> collection will collect all the objects in the last category as they are
> only reachable from the weak arrays in the segment.  Hence we are free to
> follow weak references as if they are strong when we create the image
> segment, leaving it to subsequent events to reclaim those objects.
>
> An analogous argument accounts for objects reachable from ephemerons.  Is
> my reasoning sound?
> --
> best,
> Eliot
>
>

I think you are right. But there is a risk of somehow, someone, gaining a
strong reference to the object after the image segment was created,
breaking our invariants when the segment is loaded again.

An object might be (not reachable / strongly reachable / weakely
reachable) from system roots and / or segment roots. This gives us 9
possibilities.
Six of them are easy (and I'll not go into them). The other three are
tricky:

a- Not reachable from system roots. Weakely reachable from segment roots.
Do not include them. It is best to run a GC before building the image
segment, to get rid of them (run termination, etc). This is to avoid the
risk of the object gaining somehow a strong reference after the segment is
built, making the segment miss the weak ref to it. Doing this way would
also mean that any objects affected by termination would be consistent,
both in the image and in the segment.

b- Weakely reachable from system roots. Weakely reachable from segment
roots.
Do not include them. If the object manages to survive by gaining a strong
ref from the system roots, the weak ref will be repaired on segment load
(Am I right on this?) If the original object was included in the segment,
then on segment load it would point to a duplicate object that is about to
be collected (and maybe terminated?) In any case, doing this way would also
mean that any objects affected by termination would be consistent, both in
the image and in the segment.

c- Weakely reachable from system roots. Strongly reachable from segment
roots.
Do include them. It seems reasonable to run a GC and get rid of them after
unloading the segment, to avoid the risk of the object gaining somehow a
strong ref in the image, and being duplicated on segment load. But doing as
I say means that we would be loading into the image an object that was
already terminated, although in the state it had before running
termination. Not really sure if this is ok. There could be some risk of
objects in the segment being in some pre-termination state, with some
objects in the image being in some after-termination state. In any case,
this would suggest bad design... So perhaps it makes sense to throw an
exception in these cases?

I hope this rant is of use.

Cheers,
Juan Vuletich






--
Mariano
http://marianopeck.wordpress.com
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Yoshiki Ohshima-3
In reply to this post by Eliot Miranda-2
 
I'm not fully following the discussion here, but I do remember seeing
the following email from Dan in 1999:

http://lists.squeakfoundation.org/pipermail/squeak-dev/1999-March.txt

and search for: "From DanI at wdi.disney.com Fri Mar 26 07:17:09 1999"

It does not require two bits to mark.

(Hopefully this email has some relevance to the discussion at hand...


On Sun, Oct 19, 2014 at 6:01 PM, Eliot Miranda <[hidden email]> wrote:

> Hi All,
>
>     I want to check my understanding of reference semantics for image
> segments as I'm close to completing the Spur implementation.  Specifically
> the question is whether objects reachable only through weak pointers should
> be included in an image segment or not.
>
> Remember that an image segment is created from the transitive closure of an
> Array of root objects, the segment roots. i.e. we can think of an image
> segment as a set of objects created by tracing the object graph from the
> segment roots.
>
> The segment always includes the segment roots.  Except for the roots,
> objects are excluded from the segment that are also reachable form the roots
> of the system (the system roots, effectively the root environment,
> Smalltalk, and the stack of the current process).
>
> Consider a weak array in the transitive closure that is not reachable from
> the system roots, and hence should be included in the segment.  Objects
> referenced from that weak array may be in one of three categories
>
> - reachable from the system roots (and hence not to be included in the
> segment)
> - not reachable form the system roots, but reachable from the segment roots
> via strong pointers (and hence to be included in the segment)
> - not reachable form the system roots, not reachable from the segment roots
> via strong pointers
>
> Should this last category be included or excluded from the segment?  I think
> that it makes no difference, and excluding them is only an optimization.
> The argument is as follows.  Imagine that immediately after loading the
> image segment there is a garbage collection.  That garbage collection will
> collect all the objects in the last category as they are only reachable from
> the weak arrays in the segment.  Hence we are free to follow weak references
> as if they are strong when we create the image segment, leaving it to
> subsequent events to reclaim those objects.
>
> An analogous argument accounts for objects reachable from ephemerons.  Is my
> reasoning sound?
> --
> best,
> Eliot
>
>
>



--
-- Yoshiki
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

David T. Lewis
 
On Tue, Oct 21, 2014 at 04:29:53PM -0700, Yoshiki Ohshima wrote:

> I'm not fully following the discussion here, but I do remember seeing
> the following email from Dan in 1999:
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/1999-March.txt
>
> and search for: "From DanI at wdi.disney.com Fri Mar 26 07:17:09 1999"
>
> It does not require two bits to mark.
>
> (Hopefully this email has some relevance to the discussion at hand...

I don't know if it is directly relevant to the specific discussion,
but it is definitely worth rereading this post from Dan. I remember
being amazed at its simplicity when I first read it in fifteen years
ago, and what stands out to in retrospect is that we seem to have
largely overlooked what he apparently considered to be its main potential
application: "Steps to Modularity - Incremental Snapshots"

Thanks for the pointer :-)

Dave

Here is a copy of the squeak-dev post from 1999:

>From DanI at wdi.disney.com  Fri Mar 26 07:17:09 1999
From: DanI at wdi.disney.com (Dan Ingalls)
Date: Sat Jan 28 04:56:45 2012
Subject: Steps to Modularity - Incremental Snapshots
Message-ID: <v0300780cb320dafbd7ab@[206.16.10.26]>

Folks -

A week or so ago, I sent out a message describing a technique for extracting segments from the Squeak image.  What I want to know is,

        Does anyone know of such a technique having being used previously?

I figure it must be known, but I have certainly never heard of it.  Please reply directly to me.

To recap, here's how it works:

        1.  Mark the root (or roots) of the subtree desired.

        2.  Do a GC mark pass.  SInce this stops at any marked objects,
        the subtree will be unmarked, "in the shadow" of its roots.

        3.  Copy the roots and the unmarked subtree into a byteArray (the image segment)
                Relocate internal pointers as you go
                Copy external pointers into a table of outpointers.

        Reinstalling a segment is incredibly simple -- all you do is remap any
                pointers in one pass and throw away the byteArray header!

Thanks
        - Dan

PS...
Ted and I have just completed an implementation and it is great.  (It will be out in updates and release 2.4 within a week).  It can trace and copy a 520kb tree of over 15000 objects in about 390ms.  Used for deepCopy, it is about 20 times faster than what we do currently.  Used to swap segments in and out, it finally offers a realistic vehicle for breaking down Squeak's monolithic images.

It's even faster than you would guess from the above.  There is a fixed overhead for the full GC mark and unmark.  This is 350ms on my machine (could surely be improved).  It can then copy out the 520kb segment or reinstall it in about 40ms either way.




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Bert Freudenberg
In reply to this post by Mariano Martinez Peck
 
On 20.10.2014, at 18:55, Mariano Martinez Peck <[hidden email]> wrote:

> The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.  

Well, if you look closer, you will see that projects use image segments in two completely different ways. One is, as you say, for serialization, which is not the best use of image segments, agreed, especially with all the other logic wrapped around it.

But if you enable projectsSentToDisk then entering a project will swap the previous project to disk as an image segment, allowing you to have images with very large projects without having to hold all in main memory at the same time.

This uses a completely different code path and file format than regular project export. The same technique could be used to swap out arbitrary chunks of an image.

- Bert -




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Image Segment semantics and weakness

Mariano Martinez Peck
 


On Wed, Oct 22, 2014 at 2:05 AM, Bert Freudenberg <[hidden email]> wrote:
 
On 20.10.2014, at 18:55, Mariano Martinez Peck <[hidden email]> wrote:

> The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

Well, if you look closer, you will see that projects use image segments in two completely different ways. One is, as you say, for serialization, which is not the best use of image segments, agreed, especially with all the other logic wrapped around it.

But if you enable projectsSentToDisk then entering a project will swap the previous project to disk as an image segment, allowing you to have images with very large projects without having to hold all in main memory at the same time.

This uses a completely different code path and file format than regular project export. The same technique could be used to swap out arbitrary chunks of an image.


Totally agree.  So it seems we agree that the key and good part of ImageSegment is that one, swapping out, but not as a general object graph serializer. 


--
Mariano
http://marianopeck.wordpress.com
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] re: Image Segment semantics and weakness

Bert Freudenberg
 

On 22.10.2014, at 08:47, Craig Latta <[hidden email]> wrote:

>
>> So it seems we agree that the key and good part of ImageSegment is
>> that one, swapping out, but not as a general object graph serializer.
>
>     Even for swapping out objects, I think putting them in their own
> normal object memory is a better idea, since object memories can be
> minimal and small. This lets you perform more sophisticated reasoning
> about what to do when class formats change between swap-out and swap-in
> (as well as other meta-operations).

That's a different thing. Swapping is strictly about cutting up a single image into multiple segments. Proper mutation code would have to walk the whole object memory, meaning it needs to swap in all segments in turn. Which fortunately is extremely efficient, but likely still hairy enough that we're not actually using it given today's abundance of main memory. It might, however, still make a lot of sense e.g. as a deployment mechanism on mobile platforms, which still are severely memory-limited. If used strictly for deployment you don't have to worry about mutation.

- Bert -




smime.p7s (5K) Download Attachment
12