Smalltalk › Squeak › Squeak - Dev

Image Segment semantics and weakness

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

24 messages Options

Eliot Miranda-2

Image Segment semantics and weakness

Hi All,

I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation. Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots. Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment. Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)

- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)

- not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment? I think that it makes no difference, and excluding them is only an optimization. The argument is as follows. Imagine that immediately after loading the image segment there is a garbage collection. That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment. Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.

An analogous argument accounts for objects reachable from ephemerons. Is my reasoning sound?

--
best,

Eliot

J. Vuletich (mail lists)

Re: Image Segment semantics and weakness

Hi Eliot,

Quoting Eliot Miranda <[hidden email]>:

Hi All,

I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation. Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots. Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment. Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)

- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)

- not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment? I think that it makes no difference, and excluding them is only an optimization. The argument is as follows. Imagine that immediately after loading the image segment there is a garbage collection. That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment. Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.

An analogous argument accounts for objects reachable from ephemerons. Is my reasoning sound?
--
best,
Eliot

I think you are right. But there is a risk of somehow, someone, gaining a strong reference to the object after the image segment was created, breaking our invariants when the segment is loaded again.

An object might be (not reachable / strongly reachable / weakely reachable) from system roots and / or segment roots. This gives us 9 possibilities. Six of them are easy (and I'll not go into them). The other three are tricky:

a- Not reachable from system roots. Weakely reachable from segment roots.
Do not include them. It is best to run a GC before building the image segment, to get rid of them (run termination, etc). This is to avoid the risk of the object gaining somehow a strong reference after the segment is built, making the segment miss the weak ref to it. Doing this way would also mean that any objects affected by termination would be consistent, both in the image and in the segment.

b- Weakely reachable from system roots. Weakely reachable from segment roots.
Do not include them. If the object manages to survive by gaining a strong ref from the system roots, the weak ref will be repaired on segment load (Am I right on this?) If the original object was included in the segment, then on segment load it would point to a duplicate object that is about to be collected (and maybe terminated?) In any case, doing this way would also mean that any objects affected by termination would be consistent, both in the image and in the segment.

c- Weakely reachable from system roots. Strongly reachable from segment roots.
Do include them. It seems reasonable to run a GC and get rid of them after unloading the segment, to avoid the risk of the object gaining somehow a strong ref in the image, and being duplicated on segment load. But doing as I say means that we would be loading into the image an object that was already terminated, although in the state it had before running termination. Not really sure if this is ok. There could be some risk of objects in the segment being in some pre-termination state, with some objects in the image being in some after-termination state. In any case, this would suggest bad design... So perhaps it makes sense to throw an exception in these cases?

I hope this rant is of use.

Cheers,
Juan Vuletich

Eliot Miranda-2

Re: [Pharo-dev] Image Segment semantics and weakness

In reply to this post by Eliot Miranda-2

On Mon, Oct 20, 2014 at 8:26 AM, stepharo <[hidden email]> wrote:

While I as a big fan of imageSegment and proposed to mariano to work on imageSegment2 (it was the original idea for his phd)
he convinced us that imagesegment were not worth their complexity.

I absolutely agree.

So why do you want to have imageSegment?

Because of backwards-compatibility. If Spur does not provide image segments then the barrier to entry for Terf, eToys and Squeak may be too high. Spur is supposed to be a plug-in replacement for Cog, not something that requires lots of effort to port to.

Stef

On 20/10/14 03:01, Eliot Miranda wrote:

Hi All,

I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation. Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots. Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment. Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)

- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)

- not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment? I think that it makes no difference, and excluding them is only an optimization. The argument is as follows. Imagine that immediately after loading the image segment there is a garbage collection. That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment. Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.

An analogous argument accounts for objects reachable from ephemerons. Is my reasoning sound?
--
best,
Eliot

--
best,

Eliot

EstebanLM

Re: [Vm-dev] Re: [Pharo-dev] Image Segment semantics and weakness

On 20 Oct 2014, at 21:41, Eliot Miranda <[hidden email]> wrote:

On Mon, Oct 20, 2014 at 8:26 AM, stepharo <[hidden email]> wrote:

While I as a big fan of imageSegment and proposed to mariano to work on imageSegment2 (it was the original idea for his phd)
he convinced us that imagesegment were not worth their complexity.

I absolutely agree.

So why do you want to have imageSegment?

Because of backwards-compatibility. If Spur does not provide image segments then the barrier to entry for Terf, eToys and Squeak may be too high. Spur is supposed to be a plug-in replacement for Cog, not something that requires lots of effort to port to.

but… (and tell me if I’m saying something stupid), it would be probably better to ask the guys using ImageSegments to spend some time doing an adaptor to use fuel (who is already there, works fine and faster than ImageSegments itself). In the not-so-long term, is better investment that make you replicate a technology that we all agree is not the best option (also, I would bet is better to use your valuable time in other stuff).

Is not that there is no alternative to IS… and also, the IS binary format for Spur will not be compatible with the older one, so… why not?

anyway, that’s my 2c

Esteban

Stef

On 20/10/14 03:01, Eliot Miranda wrote:

Hi All,

I want to check my understanding of reference semantics for image segments as I'm close to completing the Spur implementation. Specifically the question is whether objects reachable only through weak pointers should be included in an image segment or not.

Remember that an image segment is created from the transitive closure of an Array of root objects, the segment roots. i.e. we can think of an image segment as a set of objects created by tracing the object graph from the segment roots.

The segment always includes the segment roots. Except for the roots, objects are excluded from the segment that are also reachable form the roots of the system (the system roots, effectively the root environment, Smalltalk, and the stack of the current process).

Consider a weak array in the transitive closure that is not reachable from the system roots, and hence should be included in the segment. Objects referenced from that weak array may be in one of three categories

- reachable from the system roots (and hence not to be included in the segment)

- not reachable form the system roots, but reachable from the segment roots via strong pointers (and hence to be included in the segment)

- not reachable form the system roots, not reachable from the segment roots via strong pointers

Should this last category be included or excluded from the segment? I think that it makes no difference, and excluding them is only an optimization. The argument is as follows. Imagine that immediately after loading the image segment there is a garbage collection. That garbage collection will collect all the objects in the last category as they are only reachable from the weak arrays in the segment. Hence we are free to follow weak references as if they are strong when we create the image segment, leaving it to subsequent events to reclaim those objects.

An analogous argument accounts for objects reachable from ephemerons. Is my reasoning sound?
--
best,
Eliot

--
best,
Eliot

Juan Vuletich (dc)

Re: Image Segment semantics and weakness

In reply to this post by Eliot Miranda-2

Hi Eliot,

> Hi All,
>
> I want to check my understanding of reference semantics for image
> segments as I'm close to completing the Spur implementation. Specifically
> the question is whether objects reachable only through weak pointers
> should
> be included in an image segment or not.
>
> Remember that an image segment is created from the transitive closure of
> an
> Array of root objects, the *segment roots*. i.e. we can think of an image
> segment as a set of objects created by tracing the object graph from the
> segment roots.
>
> The segment always includes the segment roots. Except for the roots,
> objects are excluded from the segment that are also reachable form the
> roots of the system (the *system roots*, effectively the root environment,
> Smalltalk, and the stack of the current process).
>
> Consider a weak array in the transitive closure that is not reachable from
> the system roots, and hence should be included in the segment. Objects
> referenced from that weak array may be in one of three categories
>
> - reachable from the system roots (and hence not to be included in the
> segment)
> - *not* reachable form the system roots, but reachable from the segment
> roots via strong pointers (and hence to be included in the segment)
> - *not* reachable form the system roots, *not* reachable from the segment
> roots via strong pointers
>
> Should this last category be included or excluded from the segment? I
> think that it makes no difference, and excluding them is only an
> optimization. The argument is as follows. Imagine that immediately after
> loading the image segment there is a garbage collection. That garbage
> collection will collect all the objects in the last category as they are
> only reachable from the weak arrays in the segment. Hence we are free to
> follow weak references as if they are strong when we create the image
> segment, leaving it to subsequent events to reclaim those objects.
>
> An analogous argument accounts for objects reachable from ephemerons. Is
> my reasoning sound?
> --
> best,
> Eliot
>
>

I think you are right. But there is a risk of somehow, someone, gaining a
strong reference to the object after the image segment was created,
breaking our invariants when the segment is loaded again.

An object might be (not reachable / strongly reachable / weakely
reachable) from system roots and / or segment roots. This gives us 9
possibilities.
Six of them are easy (and I'll not go into them). The other three are
tricky:

a- Not reachable from system roots. Weakely reachable from segment roots.
Do not include them. It is best to run a GC before building the image
segment, to get rid of them (run termination, etc). This is to avoid the
risk of the object gaining somehow a strong reference after the segment is
built, making the segment miss the weak ref to it. Doing this way would
also mean that any objects affected by termination would be consistent,
both in the image and in the segment.

b- Weakely reachable from system roots. Weakely reachable from segment
roots.
Do not include them. If the object manages to survive by gaining a strong
ref from the system roots, the weak ref will be repaired on segment load
(Am I right on this?) If the original object was included in the segment,
then on segment load it would point to a duplicate object that is about to
be collected (and maybe terminated?) In any case, doing this way would also
mean that any objects affected by termination would be consistent, both in
the image and in the segment.

c- Weakely reachable from system roots. Strongly reachable from segment
roots.
Do include them. It seems reasonable to run a GC and get rid of them after
unloading the segment, to avoid the risk of the object gaining somehow a
strong ref in the image, and being duplicated on segment load. But doing as
I say means that we would be loading into the image an object that was
already terminated, although in the state it had before running
termination. Not really sure if this is ok. There could be some risk of
objects in the segment being in some pre-termination state, with some
objects in the image being in some after-termination state. In any case,
this would suggest bad design... So perhaps it makes sense to throw an
exception in these cases?

I hope this rant is of use.

Cheers,
Juan Vuletich

Mariano Martinez Peck

Re: Image Segment semantics and weakness

Just a quick note I would like to share....

For my PhD, I did investigate ImageSegment very very deeply:

http://dl.acm.org/citation.cfm?id=2076323

http://www.slideshare.net/MarianoMartinezPeck/2010-smalltalkspeckobject-swapping

I didn't want to write Fuel just because. I took quite a lot of time to understand how ImageSegment primitives worked. From that effort, I remember a few conclusions:

1) I found only few users of ImageSegment

2) The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

3) I noticed I could achieve the same performance or even better with an OO serializer built at the language side, with all the benefits this means. Of course, having Cog helped here....

In the Fuel paper: http://rmod.lille.inria.fr/archives/papers/Dias12a-SPE-Fuel.pdf

you can find some benchmark comparison agains IS. Also in my PhD: http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf

Cheers,

On Mon, Oct 20, 2014 at 9:56 PM, <[hidden email]> wrote:

Hi Eliot,

> Hi All,
>
> I want to check my understanding of reference semantics for image
> segments as I'm close to completing the Spur implementation. Specifically
> the question is whether objects reachable only through weak pointers
> should
> be included in an image segment or not.
>
> Remember that an image segment is created from the transitive closure of
> an
> Array of root objects, the *segment roots*. i.e. we can think of an image
> segment as a set of objects created by tracing the object graph from the
> segment roots.
>
> The segment always includes the segment roots. Except for the roots,
> objects are excluded from the segment that are also reachable form the
> roots of the system (the *system roots*, effectively the root environment,
> Smalltalk, and the stack of the current process).
>
> Consider a weak array in the transitive closure that is not reachable from
> the system roots, and hence should be included in the segment. Objects
> referenced from that weak array may be in one of three categories
>
> - reachable from the system roots (and hence not to be included in the
> segment)
> - *not* reachable form the system roots, but reachable from the segment
> roots via strong pointers (and hence to be included in the segment)
> - *not* reachable form the system roots, *not* reachable from the segment
> roots via strong pointers
>
> Should this last category be included or excluded from the segment? I
> think that it makes no difference, and excluding them is only an
> optimization. The argument is as follows. Imagine that immediately after
> loading the image segment there is a garbage collection. That garbage
> collection will collect all the objects in the last category as they are
> only reachable from the weak arrays in the segment. Hence we are free to
> follow weak references as if they are strong when we create the image
> segment, leaving it to subsequent events to reclaim those objects.
>
> An analogous argument accounts for objects reachable from ephemerons. Is
> my reasoning sound?
> --
> best,
> Eliot
>
>

I think you are right. But there is a risk of somehow, someone, gaining a
strong reference to the object after the image segment was created,
breaking our invariants when the segment is loaded again.

An object might be (not reachable / strongly reachable / weakely
reachable) from system roots and / or segment roots. This gives us 9
possibilities.
Six of them are easy (and I'll not go into them). The other three are
tricky:

a- Not reachable from system roots. Weakely reachable from segment roots.
Do not include them. It is best to run a GC before building the image
segment, to get rid of them (run termination, etc). This is to avoid the
risk of the object gaining somehow a strong reference after the segment is
built, making the segment miss the weak ref to it. Doing this way would
also mean that any objects affected by termination would be consistent,
both in the image and in the segment.

b- Weakely reachable from system roots. Weakely reachable from segment
roots.
Do not include them. If the object manages to survive by gaining a strong
ref from the system roots, the weak ref will be repaired on segment load
(Am I right on this?) If the original object was included in the segment,
then on segment load it would point to a duplicate object that is about to
be collected (and maybe terminated?) In any case, doing this way would also
mean that any objects affected by termination would be consistent, both in
the image and in the segment.

c- Weakely reachable from system roots. Strongly reachable from segment
roots.
Do include them. It seems reasonable to run a GC and get rid of them after
unloading the segment, to avoid the risk of the object gaining somehow a
strong ref in the image, and being duplicated on segment load. But doing as
I say means that we would be loading into the image an object that was
already terminated, although in the state it had before running
termination. Not really sure if this is ok. There could be some risk of
objects in the segment being in some pre-termination state, with some
objects in the image being in some after-termination state. In any case,
this would suggest bad design... So perhaps it makes sense to throw an
exception in these cases?

I hope this rant is of use.

Cheers,
Juan Vuletich

--
Mariano
http://marianopeck.wordpress.com

Yoshiki Ohshima-3

Re: Image Segment semantics and weakness

In reply to this post by Eliot Miranda-2

I'm not fully following the discussion here, but I do remember seeing
the following email from Dan in 1999:

http://lists.squeakfoundation.org/pipermail/squeak-dev/1999-March.txt

and search for: "From DanI at wdi.disney.com Fri Mar 26 07:17:09 1999"

It does not require two bits to mark.

(Hopefully this email has some relevance to the discussion at hand...

On Sun, Oct 19, 2014 at 6:01 PM, Eliot Miranda <[hidden email]> wrote:

> Hi All,
>
> I want to check my understanding of reference semantics for image
> segments as I'm close to completing the Spur implementation. Specifically
> the question is whether objects reachable only through weak pointers should
> be included in an image segment or not.
>
> Remember that an image segment is created from the transitive closure of an
> Array of root objects, the segment roots. i.e. we can think of an image
> segment as a set of objects created by tracing the object graph from the
> segment roots.
>
> The segment always includes the segment roots. Except for the roots,
> objects are excluded from the segment that are also reachable form the roots
> of the system (the system roots, effectively the root environment,
> Smalltalk, and the stack of the current process).
>
> Consider a weak array in the transitive closure that is not reachable from
> the system roots, and hence should be included in the segment. Objects
> referenced from that weak array may be in one of three categories
>
> - reachable from the system roots (and hence not to be included in the
> segment)
> - not reachable form the system roots, but reachable from the segment roots
> via strong pointers (and hence to be included in the segment)
> - not reachable form the system roots, not reachable from the segment roots
> via strong pointers
>
> Should this last category be included or excluded from the segment? I think
> that it makes no difference, and excluding them is only an optimization.
> The argument is as follows. Imagine that immediately after loading the
> image segment there is a garbage collection. That garbage collection will
> collect all the objects in the last category as they are only reachable from
> the weak arrays in the segment. Hence we are free to follow weak references
> as if they are strong when we create the image segment, leaving it to
> subsequent events to reclaim those objects.
>
> An analogous argument accounts for objects reachable from ephemerons. Is my
> reasoning sound?
> --
> best,
> Eliot
>
>
>

--
-- Yoshiki

David T. Lewis

Re: Image Segment semantics and weakness

On Tue, Oct 21, 2014 at 04:29:53PM -0700, Yoshiki Ohshima wrote:

> I'm not fully following the discussion here, but I do remember seeing
> the following email from Dan in 1999:
>
> http://lists.squeakfoundation.org/pipermail/squeak-dev/1999-March.txt
>
> and search for: "From DanI at wdi.disney.com Fri Mar 26 07:17:09 1999"
>
> It does not require two bits to mark.
>
> (Hopefully this email has some relevance to the discussion at hand...

I don't know if it is directly relevant to the specific discussion,
but it is definitely worth rereading this post from Dan. I remember
being amazed at its simplicity when I first read it in fifteen years
ago, and what stands out to in retrospect is that we seem to have
largely overlooked what he apparently considered to be its main potential
application: "Steps to Modularity - Incremental Snapshots"

Thanks for the pointer :-)

Dave

Here is a copy of the squeak-dev post from 1999:

>From DanI at wdi.disney.com Fri Mar 26 07:17:09 1999
From: DanI at wdi.disney.com (Dan Ingalls)
Date: Sat Jan 28 04:56:45 2012
Subject: Steps to Modularity - Incremental Snapshots
Message-ID: <v0300780cb320dafbd7ab@[206.16.10.26]>

Folks -

A week or so ago, I sent out a message describing a technique for extracting segments from the Squeak image. What I want to know is,

Does anyone know of such a technique having being used previously?

I figure it must be known, but I have certainly never heard of it. Please reply directly to me.

To recap, here's how it works:

1. Mark the root (or roots) of the subtree desired.

2. Do a GC mark pass. SInce this stops at any marked objects,
the subtree will be unmarked, "in the shadow" of its roots.

3. Copy the roots and the unmarked subtree into a byteArray (the image segment)
Relocate internal pointers as you go
Copy external pointers into a table of outpointers.

Reinstalling a segment is incredibly simple -- all you do is remap any
pointers in one pass and throw away the byteArray header!

Thanks
- Dan

PS...
Ted and I have just completed an implementation and it is great. (It will be out in updates and release 2.4 within a week). It can trace and copy a 520kb tree of over 15000 objects in about 390ms. Used for deepCopy, it is about 20 times faster than what we do currently. Used to swap segments in and out, it finally offers a realistic vehicle for breaking down Squeak's monolithic images.

It's even faster than you would guess from the above. There is a fixed overhead for the full GC mark and unmark. This is 350ms on my machine (could surely be improved). It can then copy out the 520kb segment or reinstall it in about 40ms either way.

Casey Ransberger-2

Re: Image Segment semantics and weakness

In reply to this post by Mariano Martinez Peck

Hi Mariano,

I've stripped some context here (it can be found by reading the original thread on squeak-dev) but I did want to respond to one or two things you said, so...

> On Oct 20, 2014, at 6:55 PM, Mariano Martinez Peck <[hidden email]> wrote:
>
> 1) I found only few users of ImageSegment

If by users, you mean people (as opposed to something like senders or implementors,) did you count the people using EToys? I'm afraid my memory is telling me that it's used in saving and sharing projects (I'm not sure though.)

> 2) The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

Seems legit. Fair enough.

> 3) I noticed I could achieve the same performance or even better with an OO serializer built at the language side, with all the benefits this means. Of course, having Cog helped here....

I suppose I'll be reading your paper then, won't I? :)

Cheers,

Casey

Bert Freudenberg

Re: [Vm-dev] [squeak-dev] Image Segment semantics and weakness

In reply to this post by Mariano Martinez Peck

On 20.10.2014, at 18:55, Mariano Martinez Peck <[hidden email]> wrote:

> The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

Well, if you look closer, you will see that projects use image segments in two completely different ways. One is, as you say, for serialization, which is not the best use of image segments, agreed, especially with all the other logic wrapped around it.

But if you enable projectsSentToDisk then entering a project will swap the previous project to disk as an image segment, allowing you to have images with very large projects without having to hold all in main memory at the same time.

This uses a completely different code path and file format than regular project export. The same technique could be used to swap out arbitrary chunks of an image.

- Bert -

smime.p7s (5K) Download Attachment

Mariano Martinez Peck

Re: Image Segment semantics and weakness

In reply to this post by Casey Ransberger-2

On Wed, Oct 22, 2014 at 1:42 AM, Casey Ransberger <[hidden email]> wrote:

Hi Mariano,

I've stripped some context here (it can be found by reading the original thread on squeak-dev) but I did want to respond to one or two things you said, so...

> On Oct 20, 2014, at 6:55 PM, Mariano Martinez Peck <[hidden email]> wrote:
>
> 1) I found only few users of ImageSegment

If by users, you mean people (as opposed to something like senders or implementors,) did you count the people using EToys? I'm afraid my memory is telling me that it's used in saving and sharing projects (I'm not sure though.)

By users I mean code...senders... and yeah, Etoys Projects was one of them.

> 2) The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

Seems legit. Fair enough.

> 3) I noticed I could achieve the same performance or even better with an OO serializer built at the language side, with all the benefits this means. Of course, having Cog helped here....

I suppose I'll be reading your paper then, won't I? :)

Cheers,

Casey

--
Mariano
http://marianopeck.wordpress.com

Mariano Martinez Peck

Re: [Vm-dev] [squeak-dev] Image Segment semantics and weakness

In reply to this post by Bert Freudenberg

On Wed, Oct 22, 2014 at 2:05 AM, Bert Freudenberg <[hidden email]> wrote:

On 20.10.2014, at 18:55, Mariano Martinez Peck <[hidden email]> wrote:

> The few users I found, were NOT using the real purpose of ImageSegment, that is, object swapping. It was used instead as an object serializer. For that, they use #writeForExportOn: which ended up using SmartRefStream for the rest of the objects.

Well, if you look closer, you will see that projects use image segments in two completely different ways. One is, as you say, for serialization, which is not the best use of image segments, agreed, especially with all the other logic wrapped around it.

But if you enable projectsSentToDisk then entering a project will swap the previous project to disk as an image segment, allowing you to have images with very large projects without having to hold all in main memory at the same time.

This uses a completely different code path and file format than regular project export. The same technique could be used to swap out arbitrary chunks of an image.

Totally agree. So it seems we agree that the key and good part of ImageSegment is that one, swapping out, but not as a general object graph serializer.

--
Mariano
http://marianopeck.wordpress.com

ccrraaiigg

re: Image Segment semantics and weakness

> So it seems we agree that the key and good part of ImageSegment is
> that one, swapping out, but not as a general object graph serializer.

Even for swapping out objects, I think putting them in their own
normal object memory is a better idea, since object memories can be
minimal and small. This lets you perform more sophisticated reasoning
about what to do when class formats change between swap-out and swap-in
(as well as other meta-operations).

-C

--
Craig Latta
netjam.org
+31 6 2757 7177 (SMS ok)
+ 1 415 287 3547 (no SMS)

Bert Freudenberg

re: Image Segment semantics and weakness

On 22.10.2014, at 08:47, Craig Latta <[hidden email]> wrote:

>
>> So it seems we agree that the key and good part of ImageSegment is
>> that one, swapping out, but not as a general object graph serializer.
>
> Even for swapping out objects, I think putting them in their own
> normal object memory is a better idea, since object memories can be
> minimal and small. This lets you perform more sophisticated reasoning
> about what to do when class formats change between swap-out and swap-in
> (as well as other meta-operations).

That's a different thing. Swapping is strictly about cutting up a single image into multiple segments. Proper mutation code would have to walk the whole object memory, meaning it needs to swap in all segments in turn. Which fortunately is extremely efficient, but likely still hairy enough that we're not actually using it given today's abundance of main memory. It might, however, still make a lot of sense e.g. as a deployment mechanism on mobile platforms, which still are severely memory-limited. If used strictly for deployment you don't have to worry about mutation.

- Bert -

smime.p7s (5K) Download Attachment

Stéphane Ducasse

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

What I can tell you is that instability raised by just having one single pointer not in the root objects
pointing to an element in the segment and the implication of this pointer on reloaded segments, (yes I do not want to have two objects in memory after loading) makes sure that we will not use IS primitive in Pharo in any future. For us this is a non feature.

IS was a nice trick but since having a pointer to an object is so cheap and the basis of our computational model
so this is lead fo unpredictable side effects. We saw that when mariano worked during the first year of his PhD (which is a kind of LOOM revisit).

Stef

Eliot Miranda-2

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Hi Stephane, Hi All,

let me talk a little about the ParcPlace experience, which led to David Leibs' parcels, whose architecture Fuel uses.

In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage System), a traditional interpretive pickling system defined by a little bytecoded language. Think of a bytecode as something like "What follows is an object definition, which is its id followed by size info followed by the definitions or ids of its sub-parts, including its class", or "What follows is the id of an already defined object". So the loading interpreter looks at the next byte in the stream and that tells it what to do. So the storage is a recursive definition of a graph, much like a recursive grammar for a programming language.

This approach is slow (its a bytecode interpreter) and fragile (structures in the process of being built aren't valid yet, imagine trying to take the hash of a Set that is only half-way through being materialized). But this architecture was very common at the time (I wrote something very similar). The advantage BOSS had was a clumsy hack for versioning. One could specify blocks that were supplied with the version and state of older objects, and these blocks could effect shape change etc to bring loaded instances up-to-date.

David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to decompose the VW image (chainsaw was the code name of the VW 2.5 release). If one groups instances by class, one can instantiate in bulk, creating all the instances of a particular class in one go, followed by all the instances of a different class, etc. Then the arc information (the pointers to objects to be stored in the loaded objects inst vars) can follow the instance information. So now the file looks like header, names of classes that are referenced (not defined), definitions of classes, definitions of instances (essentially class id, count pairs), arc information. And materializing means finding the classes in the image, creating the classes in the file, creating the instances, stitching the graph together, and then performing any post-load actions (rehashing instances, etc).

Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and were introduced to TeamV's loading model which was very much like ImageSegments, being based on the VM's object format. Because an ImageSegment also has imports (references to classes and globals taken from the host system, not defined in the file) performance doesn't just depend on loading the segment into memorty. It also depends on how long it takes to search the system to find imports, etc. In practice we found that a) Parcels were 4 times faster than BOSS, and b) they were no slower than Digitalk's image segments. But being independent of the VM's heap format Parcels had BOSS's flexibility and could support shape change on load, something ImageSegments *cannot do*. I went on to extend parcels with support for shape change, plus support for partial loading of code, but I won't describe that here. Too detailed, even thought its very important.

Mariano spent time talking with me and Fuel's basic architecture is that of parcels, but reimplemented to be nicer, more flexible etc. But essentially Parcels and Fuel are at their core David Leibs' invention. He came up with the ideas of a) grouping objects by class and b) separating the arcs from the nodes.

Now, where ImageSegments are faster than Parcels is *not* loading. Our experience with VW vs TeamV showed us that. But they are faster in collecting the graph of objects to be included. ImageSegments are dead simple. So IMO the right architecture is to use Parcels' segregation, and Parcels' "abstract" format (independent of the heap object format) with ImageSegment's computation of the object graph. Igor Stasenko has suggested providing the tracing part of ImageSegments (Dan Ingalls' cool invention of mark the segment root objects, then mark the heap, leaving the objects to be stored unmarked in the shadow of the marked segment roots) as a separate primitive. Then this can be quickly partitioned by class and then written by Smalltalk code.

The loader can then materialize objects using Smalltalk code, can deal with shape change, and not be significantly slower than image segments. Crucially this means that one has a portable, long-lived object storage format; freeing the VM to evolve its object format without breaking image segments with every change to the object format.

I'd be happy to help people working on Fuel by providing that primitive for anyone who wants to try and reimplement the ImageSegment functonality (project saving, class faulting, etc) above Fuel.

On Wed, Oct 22, 2014 at 11:56 AM, Stéphane Ducasse <[hidden email]> wrote:

What I can tell you is that instability raised by just having one single pointer not in the root objects
pointing to an element in the segment and the implication of this pointer on reloaded segments, (yes I do not want to have two objects in memory after loading) makes sure that we will not use IS primitive in Pharo in any future. For us this is a non feature.

IS was a nice trick but since having a pointer to an object is so cheap and the basis of our computational model
so this is lead fo unpredictable side effects. We saw that when mariano worked during the first year of his PhD (which is a kind of LOOM revisit).

Stef

--
best,

Eliot

David T. Lewis

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Eliot,

Thanks for this background, it is very helpful and interesting.

I would also like to put in a good word for Fuel. It is well designed, well
documented, and well supported on Squeak and Pharo. Very high quality work.

I use Fuel in RemoteTask (in package CommandShell) for inter-image communication.
ReferenceStream also works, and both are supported in RemoteTask. But if you
want to have a serializer that you can read and understand, I'd say that Fuel
is hard to beat.

I am not advocating anything with respect to image segments, project saving,
and so forth, I'm just saying that Fuel is a very good thing. It works well
in Squeak, and I suspect that many folks may not be aware of this.

Dave

On Wed, Oct 22, 2014 at 12:53:15PM -0700, Eliot Miranda wrote:

>
> Hi Stephane, Hi All,
>
> let me talk a little about the ParcPlace experience, which led to David
> Leibs' parcels, whose architecture Fuel uses.
>
> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
> System), a traditional interpretive pickling system defined by a little
> bytecoded language. Think of a bytecode as something like "What follows is
> an object definition, which is its id followed by size info followed by the
> definitions or ids of its sub-parts, including its class", or "What follows
> is the id of an already defined object". So the loading interpreter looks
> at the next byte in the stream and that tells it what to do. So the
> storage is a recursive definition of a graph, much like a recursive grammar
> for a programming language.
>
> This approach is slow (its a bytecode interpreter) and fragile (structures
> in the process of being built aren't valid yet, imagine trying to take the
> hash of a Set that is only half-way through being materialized). But this
> architecture was very common at the time (I wrote something very similar).
> The advantage BOSS had was a clumsy hack for versioning. One could specify
> blocks that were supplied with the version and state of older objects, and
> these blocks could effect shape change etc to bring loaded instances
> up-to-date.
>
> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to
> decompose the VW image (chainsaw was the code name of the VW 2.5 release).
> If one groups instances by class, one can instantiate in bulk, creating all
> the instances of a particular class in one go, followed by all the
> instances of a different class, etc. Then the arc information (the
> pointers to objects to be stored in the loaded objects inst vars) can
> follow the instance information. So now the file looks like header, names
> of classes that are referenced (not defined), definitions of classes,
> definitions of instances (essentially class id, count pairs), arc
> information. And materializing means finding the classes in the image,
> creating the classes in the file, creating the instances, stitching the
> graph together, and then performing any post-load actions (rehashing
> instances, etc).
>
> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and
> were introduced to TeamV's loading model which was very much like
> ImageSegments, being based on the VM's object format. Because an
> ImageSegment also has imports (references to classes and globals taken from
> the host system, not defined in the file) performance doesn't just depend
> on loading the segment into memorty. It also depends on how long it takes
> to search the system to find imports, etc. In practice we found that a)
> Parcels were 4 times faster than BOSS, and b) they were no slower than
> Digitalk's image segments. But being independent of the VM's heap format
> Parcels had BOSS's flexibility and could support shape change on load,
> something ImageSegments *cannot do*. I went on to extend parcels with
> support for shape change, plus support for partial loading of code, but I
> won't describe that here. Too detailed, even thought its very important.
>
> Mariano spent time talking with me and Fuel's basic architecture is that of
> parcels, but reimplemented to be nicer, more flexible etc. But essentially
> Parcels and Fuel are at their core David Leibs' invention. He came up with
> the ideas of a) grouping objects by class and b) separating the arcs from
> the nodes.
>
>
> Now, where ImageSegments are faster than Parcels is *not* loading. Our
> experience with VW vs TeamV showed us that. But they are faster in
> collecting the graph of objects to be included. ImageSegments are dead
> simple. So IMO the right architecture is to use Parcels' segregation, and
> Parcels' "abstract" format (independent of the heap object format) with
> ImageSegment's computation of the object graph. Igor Stasenko has
> suggested providing the tracing part of ImageSegments (Dan Ingalls' cool
> invention of mark the segment root objects, then mark the heap, leaving the
> objects to be stored unmarked in the shadow of the marked segment roots) as
> a separate primitive. Then this can be quickly partitioned by class and
> then written by Smalltalk code.
>
> The loader can then materialize objects using Smalltalk code, can deal with
> shape change, and not be significantly slower than image segments.
> Crucially this means that one has a portable, long-lived object storage
> format; freeing the VM to evolve its object format without breaking image
> segments with every change to the object format.
>
> I'd be happy to help people working on Fuel by providing that primitive for
> anyone who wants to try and reimplement the ImageSegment functonality
> (project saving, class faulting, etc) above Fuel.
>
>
> On Wed, Oct 22, 2014 at 11:56 AM, St??phane Ducasse <
> [hidden email]> wrote:
>
> > What I can tell you is that instability raised by just having one single
> > pointer not in the root objects
> > pointing to an element in the segment and the implication of this pointer
> > on reloaded segments, (yes I do not want to have two objects in memory
> > after loading) makes sure that we will not use IS primitive in Pharo in any
> > future. For us this is a non feature.
> >
> > IS was a nice trick but since having a pointer to an object is so cheap
> > and the basis of our computational model
> > so this is lead fo unpredictable side effects. We saw that when mariano
> > worked during the first year of his PhD (which is a kind of LOOM revisit).
> >
> > Stef
> >
>
>
>
> --
> best,
> Eliot

Eliot Miranda-2

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Hi David,

On Oct 22, 2014, at 5:52 PM, "David T. Lewis" <[hidden email]> wrote:

>
> Eliot,
>
> Thanks for this background, it is very helpful and interesting.
>
> I would also like to put in a good word for Fuel. It is well designed, well
> documented, and well supported on Squeak and Pharo. Very high quality work.
>
> I use Fuel in RemoteTask (in package CommandShell) for inter-image communication.
> ReferenceStream also works, and both are supported in RemoteTask. But if you
> want to have a serializer that you can read and understand, I'd say that Fuel
> is hard to beat.
>
> I am not advocating anything with respect to image segments, project saving,
> and so forth, I'm just saying that Fuel is a very good thing. It works well
> in Squeak, and I suspect that many folks may not be aware of this.

Oh I agree. If only ImageSegments weren't used... :-). We use an early version of Fuel at Cadence which is essential to our system. We haven't upgraded as it "just works".

>
> Dave
>
> On Wed, Oct 22, 2014 at 12:53:15PM -0700, Eliot Miranda wrote:
>>
>> Hi Stephane, Hi All,
>>
>> let me talk a little about the ParcPlace experience, which led to David
>> Leibs' parcels, whose architecture Fuel uses.
>>
>> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
>> System), a traditional interpretive pickling system defined by a little
>> bytecoded language. Think of a bytecode as something like "What follows is
>> an object definition, which is its id followed by size info followed by the
>> definitions or ids of its sub-parts, including its class", or "What follows
>> is the id of an already defined object". So the loading interpreter looks
>> at the next byte in the stream and that tells it what to do. So the
>> storage is a recursive definition of a graph, much like a recursive grammar
>> for a programming language.
>>
>> This approach is slow (its a bytecode interpreter) and fragile (structures
>> in the process of being built aren't valid yet, imagine trying to take the
>> hash of a Set that is only half-way through being materialized). But this
>> architecture was very common at the time (I wrote something very similar).
>> The advantage BOSS had was a clumsy hack for versioning. One could specify
>> blocks that were supplied with the version and state of older objects, and
>> these blocks could effect shape change etc to bring loaded instances
>> up-to-date.
>>
>> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to
>> decompose the VW image (chainsaw was the code name of the VW 2.5 release).
>> If one groups instances by class, one can instantiate in bulk, creating all
>> the instances of a particular class in one go, followed by all the
>> instances of a different class, etc. Then the arc information (the
>> pointers to objects to be stored in the loaded objects inst vars) can
>> follow the instance information. So now the file looks like header, names
>> of classes that are referenced (not defined), definitions of classes,
>> definitions of instances (essentially class id, count pairs), arc
>> information. And materializing means finding the classes in the image,
>> creating the classes in the file, creating the instances, stitching the
>> graph together, and then performing any post-load actions (rehashing
>> instances, etc).
>>
>> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and
>> were introduced to TeamV's loading model which was very much like
>> ImageSegments, being based on the VM's object format. Because an
>> ImageSegment also has imports (references to classes and globals taken from
>> the host system, not defined in the file) performance doesn't just depend
>> on loading the segment into memorty. It also depends on how long it takes
>> to search the system to find imports, etc. In practice we found that a)
>> Parcels were 4 times faster than BOSS, and b) they were no slower than
>> Digitalk's image segments. But being independent of the VM's heap format
>> Parcels had BOSS's flexibility and could support shape change on load,
>> something ImageSegments *cannot do*. I went on to extend parcels with
>> support for shape change, plus support for partial loading of code, but I
>> won't describe that here. Too detailed, even thought its very important.
>>
>> Mariano spent time talking with me and Fuel's basic architecture is that of
>> parcels, but reimplemented to be nicer, more flexible etc. But essentially
>> Parcels and Fuel are at their core David Leibs' invention. He came up with
>> the ideas of a) grouping objects by class and b) separating the arcs from
>> the nodes.
>>
>>
>> Now, where ImageSegments are faster than Parcels is *not* loading. Our
>> experience with VW vs TeamV showed us that. But they are faster in
>> collecting the graph of objects to be included. ImageSegments are dead
>> simple. So IMO the right architecture is to use Parcels' segregation, and
>> Parcels' "abstract" format (independent of the heap object format) with
>> ImageSegment's computation of the object graph. Igor Stasenko has
>> suggested providing the tracing part of ImageSegments (Dan Ingalls' cool
>> invention of mark the segment root objects, then mark the heap, leaving the
>> objects to be stored unmarked in the shadow of the marked segment roots) as
>> a separate primitive. Then this can be quickly partitioned by class and
>> then written by Smalltalk code.
>>
>> The loader can then materialize objects using Smalltalk code, can deal with
>> shape change, and not be significantly slower than image segments.
>> Crucially this means that one has a portable, long-lived object storage
>> format; freeing the VM to evolve its object format without breaking image
>> segments with every change to the object format.
>>
>> I'd be happy to help people working on Fuel by providing that primitive for
>> anyone who wants to try and reimplement the ImageSegment functonality
>> (project saving, class faulting, etc) above Fuel.
>>
>>
>> On Wed, Oct 22, 2014 at 11:56 AM, St??phane Ducasse <
>> [hidden email]> wrote:
>>
>>> What I can tell you is that instability raised by just having one single
>>> pointer not in the root objects
>>> pointing to an element in the segment and the implication of this pointer
>>> on reloaded segments, (yes I do not want to have two objects in memory
>>> after loading) makes sure that we will not use IS primitive in Pharo in any
>>> future. For us this is a non feature.
>>>
>>> IS was a nice trick but since having a pointer to an object is so cheap
>>> and the basis of our computational model
>>> so this is lead fo unpredictable side effects. We saw that when mariano
>>> worked during the first year of his PhD (which is a kind of LOOM revisit).
>>>
>>> Stef
>>
>>
>>
>> --
>> best,
>> Eliot
>

Chris Muller-3

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

In reply to this post by Eliot Miranda-2

> If one groups instances by class, one can instantiate in bulk, creating all the instances of a
> particular class in one go

What does "instantiate in bulk" mean? Doesn't that mean one still
must send #new (or #basicNew) to the class for each instance? Why
would that be faster?

Chris Muller-3

Re: [Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

In reply to this post by Eliot Miranda-2

>> I would also like to put in a good word for Fuel. It is well designed, well
>> documented, and well supported on Squeak and Pharo. Very high quality work.
>>
>> I use Fuel in RemoteTask (in package CommandShell) for inter-image communication.
>> ReferenceStream also works, and both are supported in RemoteTask. But if you
>> want to have a serializer that you can read and understand, I'd say that Fuel
>> is hard to beat.
>>
>> I am not advocating anything with respect to image segments, project saving,
>> and so forth, I'm just saying that Fuel is a very good thing. It works well
>> in Squeak, and I suspect that many folks may not be aware of this.
>
> Oh I agree. If only ImageSegments weren't used... :-). We use an early version of Fuel at Cadence which is essential to our system. We haven't upgraded as it "just works".

I'd just like to remind everyone, there is another stand-alone
serializer available for Squeak called "Ma-Object-Serializer". It was
developed from the ground up for _Squeak_ -- meaning, it already
supports all the same Squeak-specific preserialization and
postmaterialization pickling/unpickling behaviors, like for Project,
etc. which used by ReferenceStream.

There is nothing more that I would *love* than for interest from my
fellow Squeakers to lead to significant improvements in this
serializer from trying to incorporate it into your applications. I
think there is some low-hanging fruit (like the nascent
#addNewElement:!) to be had simply by everyone's different development
views and experience. Such improvements would be directly inherited
by Magma!

I looked at trying to incorporate Fuel as the serializer for Magma, to
take advantage of its purported speed. But one of the very first
things I found was the benchmarks for "the Magma serializer" in the
Fuel paper were totally bogus. I had asked Mariano to separate out
initialization from serialization and materialization, but since he
didn't, the numbers reported are a tiny fraction of their actual
speed.

I came to realize that Fuel is really targeted at just two primary
use-cases: 1) saving a complete-graph and 2) loading a
complete-graph. But Ma-Object-Serializer has the ability to
serialize/materialize *partial* graphs by letting the user specify a
TraversalStrategy, which is essential for Magma. Unfortunately, Fuel
cannot do this.

The other innovation of Ma-Object-Serializer is its first-class access
to the object-graph **in its serialized state** in the same ways
(partial or complete) like when they were Smalltalk objects.