It would be advantageous for storing binary diffs if images did not
change much between snapshots. I seem to remember some mention that the image may have its pointers updated and so this would not be the case. can anyone fill me in on the details? Would it be possible to save or post process a "sorted" image to take full advantage of a repository which uses binary diffs? best regards Keith |
Keith Hodges schrieb:
> It would be advantageous for storing binary diffs if images did not > change much between snapshots. > > I seem to remember some mention that the image may have its pointers > updated and so this would not be the case. > > can anyone fill me in on the details? > > Would it be possible to save or post process a "sorted" image to take > full advantage of a repository which uses binary diffs? parts of an image, which accumulate at low memory addresses, there's a possibility: Images are mostly a snapshot of the object memory, with all addresses kept as they are. When re-loading an image at a different memory base address, all object pointers get updated by the difference between old and new base. It should be relatively easy to move a saved image file to memory address 0 (I think the interpreter simulator does this). Two related images with the same base address probably have only small differences in the lower memory addresses. I have not tried this, but it could work. Cheers, Hans-Martin |
On Sep 28, 2007, at 9:15 , Hans-Martin Mosner wrote:
> Keith Hodges schrieb: >> It would be advantageous for storing binary diffs if images did not >> change much between snapshots. >> >> I seem to remember some mention that the image may have its pointers >> updated and so this would not be the case. >> >> can anyone fill me in on the details? >> >> Would it be possible to save or post process a "sorted" image to take >> full advantage of a repository which uses binary diffs? > Assuming that most unchanged image content should be within the oldest > parts of an image, which accumulate at low memory addresses, there's a > possibility: > Images are mostly a snapshot of the object memory, with all addresses > kept as they are. When re-loading an image at a different memory base > address, all object pointers get updated by the difference between old > and new base. It should be relatively easy to move a saved image > file to > memory address 0 (I think the interpreter simulator does this). Two > related images with the same base address probably have only small > differences in the lower memory addresses. I have not tried this, > but it > could work. Interesting idea, yes, that should work. Or, we could move to an object table. Awfully nice, these ;) Solve so many issues - one of them being object references that do not have to be relocated on startup. Another idea I have been pondering for a while is making the lower part of Squeak's object memory be "constant". There is a large number of objects in an image that virtually never change but are only read. This part does not have to be garbage-collected, making a full GC much cheaper. When we fork off a new system process with the VM using copy-on-write pages, this part could be shared between images, reducing the over-all memory consumption significantly. - Bert - |
Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha scritto:
> Another idea I have been pondering for a while is making the lower > part of Squeak's object memory be "constant". There is a large number > of objects in an image that virtually never change but are only read. > This part does not have to be garbage-collected, making a full GC > much cheaper. When we fork off a new system process with the VM using > copy-on-write pages, this part could be shared between images, > reducing the over-all memory consumption significantly. Could this constant part be kept in a separate file, thus reducing also the disk occupation of our images? Ciao, Giovanni |
On Sep 28, 2007, at 11:02 , Giovanni Corriga wrote: > Il giorno ven, 28/09/2007 alle 10.00 +0200, Bert Freudenberg ha > scritto: > >> Another idea I have been pondering for a while is making the lower >> part of Squeak's object memory be "constant". There is a large number >> of objects in an image that virtually never change but are only read. >> This part does not have to be garbage-collected, making a full GC >> much cheaper. When we fork off a new system process with the VM using >> copy-on-write pages, this part could be shared between images, >> reducing the over-all memory consumption significantly. > > Could this constant part be kept in a separate file, thus reducing > also > the disk occupation of our images? Haven't thought about that and I have a gut feeling this would be impractical, but who knows ... get the brains rolling ;) "Perm space" in VisualWorks is similar, but I don't know anything about its implementation. - Bert - |
Bert Freudenberg schrieb:
> > > "Perm space" in VisualWorks is similar, but I don't know anything > about its implementation. > > - Bert - > > > Perm space differs from normal old spaces in VW in that it is not subject to garbage collection and compaction, so the amount of memory writes in perm space is reduced (the only writes can occur when slots in perm space objects are changed.) This means that perm space is pretty well sharable between multiple instances of an image (it's actually shared when it can be memory-mapped upon loading into the same memory region where it was when the image was saved, and AFAIK this is not supported on all VW platforms). The reduced garbage collection load is an advantage even for single-image applications. To achieve somethin similar in Squeak, we would have to add an additional memory division in addition to the oldspace/youngspace one. The image file does not even have that division, all of it is considered old space after loading. Cheers, Hans-Martin |
In reply to this post by Hans-Martin Mosner
Actually the VM has an optimization where as for example on the
macintosh the memory start value for the oops space "usually" when using the same VM is at the same virtual memory address. If so and this matches the remembered memory address then no swizzling of the object pointers is needed. Otherwise all object addresses are swizzled with an offset so that the object pointer which is the 32/64 bit memory address matches expectations using the last offset versus the new offset. I'll note that more secure operating systems, say OpenBSD ensure the memory addresses allocated from application startup to startup don't follow a pattern so on that operating system it's likely the startup address would never be the same as any previous startup. Therefore it's plausible that one *could* swizzle the address from the starting offset to zero after the image has been fully GCed and halted . See ObjectMemory>>adjustAllOopsBy: for thoughts and Interpreter>>writeImageFile: for placement to swizzle to zero, then back to the original value. PS I have a dim memory of someone wanting to abuse the low bits in an object pointer because they decided for example the oops space could never be allocated below say ox00000000XX because the operating systems usually allocate from memory after the VM binary... On Sep 28, 2007, at 12:15 AM, Hans-Martin Mosner wrote: > Keith Hodges schrieb: >> It would be advantageous for storing binary diffs if images did not >> change much between snapshots. >> >> I seem to remember some mention that the image may have its pointers >> updated and so this would not be the case. >> >> can anyone fill me in on the details? >> >> Would it be possible to save or post process a "sorted" image to take >> full advantage of a repository which uses binary diffs? > Assuming that most unchanged image content should be within the oldest > parts of an image, which accumulate at low memory addresses, there's a > possibility: > Images are mostly a snapshot of the object memory, with all addresses > kept as they are. When re-loading an image at a different memory base > address, all object pointers get updated by the difference between old > and new base. It should be relatively easy to move a saved image > file to > memory address 0 (I think the interpreter simulator does this). Two > related images with the same base address probably have only small > differences in the lower memory addresses. I have not tried this, > but it > could work. > > Cheers, > Hans-Martin > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by keith1y
I seem to recall a thread on here some time back of someone having
exactly this issue. They were trying to have a way of getting to a well defined point so they could enter the image in some system in their organization or something. If I get a chance I'll look it up. It may be related to this. On 9/28/07, Keith Hodges <[hidden email]> wrote: > It would be advantageous for storing binary diffs if images did not > change much between snapshots. > > I seem to remember some mention that the image may have its pointers > updated and so this would not be the case. > > can anyone fill me in on the details? > > Would it be possible to save or post process a "sorted" image to take > full advantage of a repository which uses binary diffs? > > best regards > > Keith > > |
Jason Johnson schrieb:
> I seem to recall a thread on here some time back of someone having > exactly this issue. They were trying to have a way of getting to a > well defined point so they could enter the image in some system in > their organization or something. > > If it's the same thread I am remembering, they wanted to achieve complete identity of images given a defined set of source files. That is impossible, in my opinion, due to the nondeterministic behavior of some of the processes in an image. The binary diff requirement is much weaker - for it to be satisfied, it suffices if the diff is significantly smaller than either of the two images being compared. Cheers, Hans-Martin |
In reply to this post by Giovanni Corriga
Giovanni,
> > Another idea I have been pondering for a while is making the lower > > part of Squeak's object memory be "constant". There is a large number > > of objects in an image that virtually never change but are only read. > > This part does not have to be garbage-collected, making a full GC > > much cheaper. When we fork off a new system process with the VM using > > copy-on-write pages, this part could be shared between images, > > reducing the over-all memory consumption significantly. > > Could this constant part be kept in a separate file, thus reducing also > the disk occupation of our images? That may be tricky as others say, but by normalizing the start image offset to zero upon saving, the resulting .image *would* be more compressable by the LZ family of algorithm. Whether it is true or not is a question... -- Yoshiki |
On Oct 2, 2007, at 3:53 , Yoshiki Ohshima wrote: > Giovanni, > >>> Another idea I have been pondering for a while is making the lower >>> part of Squeak's object memory be "constant". There is a large >>> number >>> of objects in an image that virtually never change but are only >>> read. >>> This part does not have to be garbage-collected, making a full GC >>> much cheaper. When we fork off a new system process with the VM >>> using >>> copy-on-write pages, this part could be shared between images, >>> reducing the over-all memory consumption significantly. >> >> Could this constant part be kept in a separate file, thus reducing >> also >> the disk occupation of our images? > > That may be tricky as others say, but by normalizing the start image > offset to zero upon saving, the resulting .image *would* be more > compressable by the LZ family of algorithm. Whether it is true or not > is a question... Why would it be more compressable? Because there are more zeros in oops? - Bert - |
>> That may be tricky as others say, but by normalizing the start image >> offset to zero upon saving, the resulting .image *would* be more >> compressable by the LZ family of algorithm. Whether it is true or not >> is a question... > > Why would it be more compressable? Because there are more zeros in oops? Presumably, but that's not true at least for GNU Smalltalk images. 2.3.6 (normalizes offset to zero) => 72.4% gzip compression, 76.6% bzip2 2.95d (does not) => 72.2% gzip compression, 76.5% bzip2 (As a side note, that change was made to have faster startup times -- if you don't have to swizzle back object pointers, there is less work to be done on image startup -- and as a prerequisite to implement a shared memory space via copy-on-write). Paolo |
Free forum by Nabble | Edit this page |