Storing Squeak Images in mercurial

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Storing Squeak Images in mercurial

keith1y
It would be advantageous for storing binary diffs if images did not
change much between snapshots.

I seem to remember some mention that the image may have its pointers
updated and so this would not be the case.

can anyone fill me in on the details?

Would it be possible to save or post process a "sorted" image to take
full advantage of a repository which uses binary diffs?

best regards

Keith

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Hans-Martin Mosner
Keith Hodges schrieb:

> It would be advantageous for storing binary diffs if images did not
> change much between snapshots.
>
> I seem to remember some mention that the image may have its pointers
> updated and so this would not be the case.
>
> can anyone fill me in on the details?
>
> Would it be possible to save or post process a "sorted" image to take
> full advantage of a repository which uses binary diffs?
Assuming that most unchanged image content should be within the oldest
parts of an image, which accumulate at low memory addresses, there's a
possibility:
Images are mostly a snapshot of the object memory, with all addresses
kept as they are. When re-loading an image at a different memory base
address, all object pointers get updated by the difference between old
and new base. It should be relatively easy to move a saved image file to
memory address 0 (I think the interpreter simulator does this). Two
related images with the same base address probably have only small
differences in the lower memory addresses. I have not tried this, but it
could work.

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Bert Freudenberg
On Sep 28, 2007, at 9:15 , Hans-Martin Mosner wrote:

> Keith Hodges schrieb:
>> It would be advantageous for storing binary diffs if images did not
>> change much between snapshots.
>>
>> I seem to remember some mention that the image may have its pointers
>> updated and so this would not be the case.
>>
>> can anyone fill me in on the details?
>>
>> Would it be possible to save or post process a "sorted" image to take
>> full advantage of a repository which uses binary diffs?
> Assuming that most unchanged image content should be within the oldest
> parts of an image, which accumulate at low memory addresses, there's a
> possibility:
> Images are mostly a snapshot of the object memory, with all addresses
> kept as they are. When re-loading an image at a different memory base
> address, all object pointers get updated by the difference between old
> and new base. It should be relatively easy to move a saved image  
> file to
> memory address 0 (I think the interpreter simulator does this). Two
> related images with the same base address probably have only small
> differences in the lower memory addresses. I have not tried this,  
> but it
> could work.

Interesting idea, yes, that should work.

Or, we could move to an object table. Awfully nice, these ;) Solve so  
many issues - one of them being object references that do not have to  
be relocated on startup.

Another idea I have been pondering for a while is making the lower  
part of Squeak's object memory be "constant". There is a large number  
of objects in an image that virtually never change but are only read.  
This part does not have to be garbage-collected, making a full GC  
much cheaper. When we fork off a new system process with the VM using  
copy-on-write pages, this part could be shared between images,  
reducing the over-all memory consumption significantly.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Giovanni Corriga
Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha scritto:

> Another idea I have been pondering for a while is making the lower  
> part of Squeak's object memory be "constant". There is a large number  
> of objects in an image that virtually never change but are only read.  
> This part does not have to be garbage-collected, making a full GC  
> much cheaper. When we fork off a new system process with the VM using  
> copy-on-write pages, this part could be shared between images,  
> reducing the over-all memory consumption significantly.

Could this constant part be kept in a separate file, thus reducing also
the disk occupation of our images?

        Ciao,

                Giovanni


Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Bert Freudenberg

On Sep 28, 2007, at 11:02 , Giovanni Corriga wrote:

> Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha  
> scritto:
>
>> Another idea I have been pondering for a while is making the lower
>> part of Squeak's object memory be "constant". There is a large number
>> of objects in an image that virtually never change but are only read.
>> This part does not have to be garbage-collected, making a full GC
>> much cheaper. When we fork off a new system process with the VM using
>> copy-on-write pages, this part could be shared between images,
>> reducing the over-all memory consumption significantly.
>
> Could this constant part be kept in a separate file, thus reducing  
> also
> the disk occupation of our images?

Haven't thought about that and I have a gut feeling this would be  
impractical, but who knows ... get the brains rolling ;)

"Perm space" in VisualWorks is similar, but I don't know anything  
about its implementation.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Hans-Martin Mosner
Bert Freudenberg schrieb:
>
>
> "Perm space" in VisualWorks is similar, but I don't know anything
> about its implementation.
>
> - Bert -
>
>
>
Perm space differs from normal old spaces in VW in that it is not
subject to garbage collection and compaction, so the amount of memory
writes in perm space is reduced (the only writes can occur when slots in
perm space objects are changed.) This means that perm space is pretty
well sharable between multiple instances of an image (it's actually
shared when it can be memory-mapped upon loading into the same memory
region where it was when the image was saved, and AFAIK this is not
supported on all VW platforms).
The reduced garbage collection load is an advantage even for
single-image applications.

To achieve somethin similar in Squeak, we would have to add an
additional memory division in addition to the oldspace/youngspace one.
The image file does not even have that division, all of it is considered
old space after loading.

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

johnmci
In reply to this post by Hans-Martin Mosner
Actually the VM has an optimization where as for example on the  
macintosh the memory start value for the oops space "usually" when using
the same VM is at the same virtual memory address. If so and this  
matches the remembered memory address then no swizzling of the object  
pointers is needed.

Otherwise all object addresses are swizzled with an offset so that  
the object pointer which is the 32/64 bit memory address matches  
expectations using the last offset versus the new offset.

I'll note that more secure operating systems, say OpenBSD ensure the  
memory addresses allocated from application startup to startup don't  
follow a pattern so on that
operating system it's likely the startup address would never be the  
same as any previous startup.

Therefore it's plausible that one *could* swizzle the address from  
the starting offset to zero after the image has been fully GCed and  
halted .

See ObjectMemory>>adjustAllOopsBy:
for thoughts
and Interpreter>>writeImageFile:
for placement to swizzle to zero, then back to the original value.
       

PS I have a dim memory of someone wanting to abuse the low bits in an  
object pointer because they decided for example the oops space could  
never be allocated below say
ox00000000XX because the operating systems usually allocate from  
memory after the VM binary...


On Sep 28, 2007, at 12:15 AM, Hans-Martin Mosner wrote:

> Keith Hodges schrieb:
>> It would be advantageous for storing binary diffs if images did not
>> change much between snapshots.
>>
>> I seem to remember some mention that the image may have its pointers
>> updated and so this would not be the case.
>>
>> can anyone fill me in on the details?
>>
>> Would it be possible to save or post process a "sorted" image to take
>> full advantage of a repository which uses binary diffs?
> Assuming that most unchanged image content should be within the oldest
> parts of an image, which accumulate at low memory addresses, there's a
> possibility:
> Images are mostly a snapshot of the object memory, with all addresses
> kept as they are. When re-loading an image at a different memory base
> address, all object pointers get updated by the difference between old
> and new base. It should be relatively easy to move a saved image  
> file to
> memory address 0 (I think the interpreter simulator does this). Two
> related images with the same base address probably have only small
> differences in the lower memory addresses. I have not tried this,  
> but it
> could work.
>
> Cheers,
> Hans-Martin
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Jason Johnson-5
In reply to this post by keith1y
I seem to recall a thread on here some time back of someone having
exactly this issue.  They were trying to have a way of getting to a
well defined point so they could enter the image in some system in
their organization or something.

If I get a chance I'll look it up.  It may be related to this.

On 9/28/07, Keith Hodges <[hidden email]> wrote:

> It would be advantageous for storing binary diffs if images did not
> change much between snapshots.
>
> I seem to remember some mention that the image may have its pointers
> updated and so this would not be the case.
>
> can anyone fill me in on the details?
>
> Would it be possible to save or post process a "sorted" image to take
> full advantage of a repository which uses binary diffs?
>
> best regards
>
> Keith
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Hans-Martin Mosner
Jason Johnson schrieb:
> I seem to recall a thread on here some time back of someone having
> exactly this issue.  They were trying to have a way of getting to a
> well defined point so they could enter the image in some system in
> their organization or something.
>  
>  
If it's the same thread I am remembering, they wanted to achieve
complete identity of images given a defined set of source files.
That is impossible, in my opinion, due to the nondeterministic behavior
of some of the processes in an image.
The binary diff requirement is much weaker - for it to be satisfied, it
suffices if the diff is significantly smaller than either of the two
images being compared.

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Yoshiki Ohshima-2
In reply to this post by Giovanni Corriga
  Giovanni,

> > Another idea I have been pondering for a while is making the lower  
> > part of Squeak's object memory be "constant". There is a large number  
> > of objects in an image that virtually never change but are only read.  
> > This part does not have to be garbage-collected, making a full GC  
> > much cheaper. When we fork off a new system process with the VM using  
> > copy-on-write pages, this part could be shared between images,  
> > reducing the over-all memory consumption significantly.
>
> Could this constant part be kept in a separate file, thus reducing also
> the disk occupation of our images?

  That may be tricky as others say, but by normalizing the start image
offset to zero upon saving, the resulting .image *would* be more
compressable by the LZ family of algorithm.  Whether it is true or not
is a question...

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Bert Freudenberg

On Oct 2, 2007, at 3:53 , Yoshiki Ohshima wrote:

>   Giovanni,
>
>>> Another idea I have been pondering for a while is making the lower
>>> part of Squeak's object memory be "constant". There is a large  
>>> number
>>> of objects in an image that virtually never change but are only  
>>> read.
>>> This part does not have to be garbage-collected, making a full GC
>>> much cheaper. When we fork off a new system process with the VM  
>>> using
>>> copy-on-write pages, this part could be shared between images,
>>> reducing the over-all memory consumption significantly.
>>
>> Could this constant part be kept in a separate file, thus reducing  
>> also
>> the disk occupation of our images?
>
>   That may be tricky as others say, but by normalizing the start image
> offset to zero upon saving, the resulting .image *would* be more
> compressable by the LZ family of algorithm.  Whether it is true or not
> is a question...

Why would it be more compressable? Because there are more zeros in oops?

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Storing Squeak Images in mercurial

Paolo Bonzini-2

>>   That may be tricky as others say, but by normalizing the start image
>> offset to zero upon saving, the resulting .image *would* be more
>> compressable by the LZ family of algorithm.  Whether it is true or not
>> is a question...
>
> Why would it be more compressable? Because there are more zeros in oops?

Presumably, but that's not true at least for GNU Smalltalk images.

2.3.6 (normalizes offset to zero) => 72.4% gzip compression, 76.6% bzip2
2.95d (does not) => 72.2% gzip compression, 76.5% bzip2

(As a side note, that change was made to have faster startup times -- if
you don't have to swizzle back object pointers, there is less work to be
done on image startup -- and as a prerequisite to implement a shared
memory space via copy-on-write).

Paolo