Smalltalk › Squeak › Squeak - Dev

Storing Squeak Images in mercurial

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

12 messages Options

keith1y

Storing Squeak Images in mercurial

It would be advantageous for storing binary diffs if images did not
change much between snapshots.

I seem to remember some mention that the image may have its pointers
updated and so this would not be the case.

can anyone fill me in on the details?

Would it be possible to save or post process a "sorted" image to take
full advantage of a repository which uses binary diffs?

best regards

Keith

Hans-Martin Mosner

Re: Storing Squeak Images in mercurial

Keith Hodges schrieb:

> It would be advantageous for storing binary diffs if images did not
> change much between snapshots.
>
> I seem to remember some mention that the image may have its pointers
> updated and so this would not be the case.
>
> can anyone fill me in on the details?
>
> Would it be possible to save or post process a "sorted" image to take
> full advantage of a repository which uses binary diffs?

Assuming that most unchanged image content should be within the oldest
parts of an image, which accumulate at low memory addresses, there's a
possibility:
Images are mostly a snapshot of the object memory, with all addresses
kept as they are. When re-loading an image at a different memory base
address, all object pointers get updated by the difference between old
and new base. It should be relatively easy to move a saved image file to
memory address 0 (I think the interpreter simulator does this). Two
related images with the same base address probably have only small
differences in the lower memory addresses. I have not tried this, but it
could work.

Cheers,
Hans-Martin

Bert Freudenberg

Re: Storing Squeak Images in mercurial

On Sep 28, 2007, at 9:15 , Hans-Martin Mosner wrote:

> Keith Hodges schrieb:
>> It would be advantageous for storing binary diffs if images did not
>> change much between snapshots.
>>
>> I seem to remember some mention that the image may have its pointers
>> updated and so this would not be the case.
>>
>> can anyone fill me in on the details?
>>
>> Would it be possible to save or post process a "sorted" image to take
>> full advantage of a repository which uses binary diffs?
> Assuming that most unchanged image content should be within the oldest
> parts of an image, which accumulate at low memory addresses, there's a
> possibility:
> Images are mostly a snapshot of the object memory, with all addresses
> kept as they are. When re-loading an image at a different memory base
> address, all object pointers get updated by the difference between old
> and new base. It should be relatively easy to move a saved image
> file to
> memory address 0 (I think the interpreter simulator does this). Two
> related images with the same base address probably have only small
> differences in the lower memory addresses. I have not tried this,
> but it
> could work.

Interesting idea, yes, that should work.

Or, we could move to an object table. Awfully nice, these ;) Solve so
many issues - one of them being object references that do not have to
be relocated on startup.

Another idea I have been pondering for a while is making the lower
part of Squeak's object memory be "constant". There is a large number
of objects in an image that virtually never change but are only read.
This part does not have to be garbage-collected, making a full GC
much cheaper. When we fork off a new system process with the VM using
copy-on-write pages, this part could be shared between images,
reducing the over-all memory consumption significantly.

- Bert -

Giovanni Corriga

Re: Storing Squeak Images in mercurial

Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha scritto:

> Another idea I have been pondering for a while is making the lower
> part of Squeak's object memory be "constant". There is a large number
> of objects in an image that virtually never change but are only read.
> This part does not have to be garbage-collected, making a full GC
> much cheaper. When we fork off a new system process with the VM using
> copy-on-write pages, this part could be shared between images,
> reducing the over-all memory consumption significantly.

Could this constant part be kept in a separate file, thus reducing also
the disk occupation of our images?

Ciao,

Giovanni

Bert Freudenberg

Re: Storing Squeak Images in mercurial

On Sep 28, 2007, at 11:02 , Giovanni Corriga wrote:

> Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha
> scritto:
>
>> Another idea I have been pondering for a while is making the lower
>> part of Squeak's object memory be "constant". There is a large number
>> of objects in an image that virtually never change but are only read.
>> This part does not have to be garbage-collected, making a full GC
>> much cheaper. When we fork off a new system process with the VM using
>> copy-on-write pages, this part could be shared between images,
>> reducing the over-all memory consumption significantly.
>
> Could this constant part be kept in a separate file, thus reducing
> also
> the disk occupation of our images?

Haven't thought about that and I have a gut feeling this would be
impractical, but who knows ... get the brains rolling ;)

"Perm space" in VisualWorks is similar, but I don't know anything
about its implementation.

- Bert -

Hans-Martin Mosner

Re: Storing Squeak Images in mercurial

Bert Freudenberg schrieb:
>
>
> "Perm space" in VisualWorks is similar, but I don't know anything
> about its implementation.
>
> - Bert -
>
>
>
Perm space differs from normal old spaces in VW in that it is not
subject to garbage collection and compaction, so the amount of memory
writes in perm space is reduced (the only writes can occur when slots in
perm space objects are changed.) This means that perm space is pretty
well sharable between multiple instances of an image (it's actually
shared when it can be memory-mapped upon loading into the same memory
region where it was when the image was saved, and AFAIK this is not
supported on all VW platforms).
The reduced garbage collection load is an advantage even for
single-image applications.

To achieve somethin similar in Squeak, we would have to add an
additional memory division in addition to the oldspace/youngspace one.
The image file does not even have that division, all of it is considered
old space after loading.

Cheers,
Hans-Martin

johnmci

Re: Storing Squeak Images in mercurial

In reply to this post by Hans-Martin Mosner

Actually the VM has an optimization where as for example on the
macintosh the memory start value for the oops space "usually" when using
the same VM is at the same virtual memory address. If so and this
matches the remembered memory address then no swizzling of the object
pointers is needed.

Otherwise all object addresses are swizzled with an offset so that
the object pointer which is the 32/64 bit memory address matches
expectations using the last offset versus the new offset.

I'll note that more secure operating systems, say OpenBSD ensure the
memory addresses allocated from application startup to startup don't
follow a pattern so on that
operating system it's likely the startup address would never be the
same as any previous startup.

Therefore it's plausible that one *could* swizzle the address from
the starting offset to zero after the image has been fully GCed and
halted .

See ObjectMemory>>adjustAllOopsBy:
for thoughts
and Interpreter>>writeImageFile:
for placement to swizzle to zero, then back to the original value.

PS I have a dim memory of someone wanting to abuse the low bits in an
object pointer because they decided for example the oops space could
never be allocated below say
ox00000000XX because the operating systems usually allocate from
memory after the VM binary...

On Sep 28, 2007, at 12:15 AM, Hans-Martin Mosner wrote:

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Jason Johnson-5

Re: Storing Squeak Images in mercurial

In reply to this post by keith1y

I seem to recall a thread on here some time back of someone having
exactly this issue. They were trying to have a way of getting to a
well defined point so they could enter the image in some system in
their organization or something.

If I get a chance I'll look it up. It may be related to this.

On 9/28/07, Keith Hodges <[hidden email]> wrote:

Hans-Martin Mosner

Re: Storing Squeak Images in mercurial

Jason Johnson schrieb:
> I seem to recall a thread on here some time back of someone having
> exactly this issue. They were trying to have a way of getting to a
> well defined point so they could enter the image in some system in
> their organization or something.
>
>
If it's the same thread I am remembering, they wanted to achieve
complete identity of images given a defined set of source files.
That is impossible, in my opinion, due to the nondeterministic behavior
of some of the processes in an image.
The binary diff requirement is much weaker - for it to be satisfied, it
suffices if the diff is significantly smaller than either of the two
images being compared.

Cheers,
Hans-Martin

Yoshiki Ohshima-2

Re: Storing Squeak Images in mercurial

In reply to this post by Giovanni Corriga

Giovanni,

> > Another idea I have been pondering for a while is making the lower
> > part of Squeak's object memory be "constant". There is a large number
> > of objects in an image that virtually never change but are only read.
> > This part does not have to be garbage-collected, making a full GC
> > much cheaper. When we fork off a new system process with the VM using
> > copy-on-write pages, this part could be shared between images,
> > reducing the over-all memory consumption significantly.
>
> Could this constant part be kept in a separate file, thus reducing also
> the disk occupation of our images?

That may be tricky as others say, but by normalizing the start image
offset to zero upon saving, the resulting .image *would* be more
compressable by the LZ family of algorithm. Whether it is true or not
is a question...

-- Yoshiki

Bert Freudenberg

Re: Storing Squeak Images in mercurial

On Oct 2, 2007, at 3:53 , Yoshiki Ohshima wrote:

> Giovanni,
>
>>> Another idea I have been pondering for a while is making the lower
>>> part of Squeak's object memory be "constant". There is a large
>>> number
>>> of objects in an image that virtually never change but are only
>>> read.
>>> This part does not have to be garbage-collected, making a full GC
>>> much cheaper. When we fork off a new system process with the VM
>>> using
>>> copy-on-write pages, this part could be shared between images,
>>> reducing the over-all memory consumption significantly.
>>
>> Could this constant part be kept in a separate file, thus reducing
>> also
>> the disk occupation of our images?
>
> That may be tricky as others say, but by normalizing the start image
> offset to zero upon saving, the resulting .image *would* be more
> compressable by the LZ family of algorithm. Whether it is true or not
> is a question...

Why would it be more compressable? Because there are more zeros in oops?

- Bert -

Paolo Bonzini-2

Re: Storing Squeak Images in mercurial

>> That may be tricky as others say, but by normalizing the start image
>> offset to zero upon saving, the resulting .image *would* be more
>> compressable by the LZ family of algorithm. Whether it is true or not
>> is a question...
>
> Why would it be more compressable? Because there are more zeros in oops?

Presumably, but that's not true at least for GNU Smalltalk images.

2.3.6 (normalizes offset to zero) => 72.4% gzip compression, 76.6% bzip2
2.95d (does not) => 72.2% gzip compression, 76.5% bzip2

(As a side note, that change was made to have faster startup times -- if
you don't have to swizzle back object pointers, there is less work to be
done on image startup -- and as a prerequisite to implement a shared
memory space via copy-on-write).

Paolo