Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files. Many thanks in advance, Martin Troielli deploy.zip (440 bytes) Download Attachment |
Not to mention anything that records TimeStamps or clock values...
-----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Martin Troielli Sent: 30 July 2007 4:32 pm To: [hidden email] Subject: How to generate identically image file after snapshots Hi All, We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files. Many thanks in advance, Martin Troielli |
In reply to this post by Martin Troielli
cp image.im twinbrother.im
;) -Boris -- +1.604.689.0322 DeepCove Labs Ltd. 4th floor 595 Howe Street Vancouver, Canada V6C 2T5 http://tinyurl.com/r7uw4 [hidden email] CONFIDENTIALITY NOTICE This email is intended only for the persons named in the message header. Unless otherwise indicated, it contains information that is private and confidential. If you have received it in error, please notify the sender and delete the entire message including any attachments. Thank you. > -----Original Message----- > From: [hidden email] [mailto:squeak-dev- > [hidden email]] On Behalf Of Martin Troielli > Sent: Monday, July 30, 2007 8:32 AM > To: [hidden email] > Subject: How to generate identically image file after snapshots > > Hi All, > > We have tried to produce the same (bit identically) image file after two > consecutive snapshots. We start from a base image then, fileIn several > files into it and finally, we just SmalltalkImage current snapshot: true > andQuit: true. We need this to verify the image file generated by a > third-party with a checksum by executing a script. > After trying several ways to get it (even by scripting the fileIn process > and the snapshot), we found that the image files have, beside the > timestamp differences, thousands of other differences and sometimes the > snapshots have also size differeces. > We supose that this kind of issues may occur due to the GC activity. > Are this issues from the way GC process is changing dynamically the memory > bytes? There is a way to inhibite this activity? > Attached are the scripts we use to produce the image files. > > Many thanks in advance, > Martin Troielli |
In reply to this post by Martin Troielli
Hi Martin,
there are a lot of objects (like, for example subinstances of ContextPart) allocated and deallocated on which you do not have much control. One corner from which this could be started is to consider enumerating (in two sister .images) all the objects you want to deploy. If that fails to produce comparable objects (for any reason, for example if you cannot order/compare object identities other than by hash identity and the latter is assigned by the VM and not by you) then, hrm, it fails. But if not then you could trace out all the objects you want (thereby disacrding all the unwanted) and the resulting (two sister) .image files then have the same contents byte by byte, because you fix the object's position in the files. I've done that with other images and non-Smalltalk interpreters. Having said that, your project doesn't look to be easy. /Klaus On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote: > Hi All, > > We have tried to produce the same (bit identically) image file after two > consecutive snapshots. We start from a base image then, fileIn several > files into it and finally, we just SmalltalkImage current snapshot: true > andQuit: true. We need this to verify the image file generated by a > third-party with a checksum by executing a script. > After trying several ways to get it (even by scripting the fileIn process > and the snapshot), we found that the image files have, beside the > timestamp differences, thousands of other differences and sometimes the > snapshots have also size differeces. > We supose that this kind of issues may occur due to the GC activity. > Are this issues from the way GC process is changing dynamically the > memory > bytes? There is a way to inhibite this activity? > Attached are the scripts we use to produce the image files. > > Many thanks in advance, > Martin Troielli |
In reply to this post by Martin Troielli
You have to cp the image once the engine interacts with it it is never the
"same" lots of objects get created and destroyed in image start up the garbage collector runs anything using a clock runs. So if it is a script it has to be a shell script. Any deployment specific stuff should go in some config or text file that is read on start up. Sean -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Martin Troielli Sent: Monday, July 30, 2007 8:32 AM To: [hidden email] Subject: How to generate identically image file after snapshots Hi All, We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files. Many thanks in advance, Martin Troielli |
In reply to this post by Klaus D. Witzel
Hi Klaus,
Thanks for the information. We have modified the VM in order to reduce the GC activity inhibiting it until the fileIn processes are done, but had no luck. The produced files were different with less differences. We think we have to follow an approach similar to yours. We thought to generate a serialized file with all the CompiledMethods we use, without change the base image, merging them only when squeak starts up. We hope that this process does not demand too much time, since we have also a lot of resources to bring up at that time :S Regards, Martin On Mon, 30 Jul 2007 13:22:08 -0300, Klaus D. Witzel <[hidden email]> wrote: > Hi Martin, > > there are a lot of objects (like, for example subinstances of > ContextPart) allocated and deallocated on which you do not have much > control. > > One corner from which this could be started is to consider enumerating > (in two sister .images) all the objects you want to deploy. If that > fails to produce comparable objects (for any reason, for example if you > cannot order/compare object identities other than by hash identity and > the latter is assigned by the VM and not by you) then, hrm, it fails. > > But if not then you could trace out all the objects you want (thereby > disacrding all the unwanted) and the resulting (two sister) .image files > then have the same contents byte by byte, because you fix the object's > position in the files. I've done that with other images and > non-Smalltalk interpreters. > > Having said that, your project doesn't look to be easy. > > /Klaus > > On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote: > >> Hi All, >> >> We have tried to produce the same (bit identically) image file after two >> consecutive snapshots. We start from a base image then, fileIn several >> files into it and finally, we just SmalltalkImage current snapshot: true >> andQuit: true. We need this to verify the image file generated by a >> third-party with a checksum by executing a script. >> After trying several ways to get it (even by scripting the fileIn >> process >> and the snapshot), we found that the image files have, beside the >> timestamp differences, thousands of other differences and sometimes the >> snapshots have also size differeces. >> We supose that this kind of issues may occur due to the GC activity. >> Are this issues from the way GC process is changing dynamically the >> memory >> bytes? There is a way to inhibite this activity? >> Attached are the scripts we use to produce the image files. >> >> Many thanks in advance, >> Martin Troielli |
On 7/30/07, Martin Troielli <[hidden email]> wrote:
> Hi Klaus, > > Thanks for the information. We have modified the VM in order to reduce the > GC activity inhibiting it until the fileIn processes are done, but had no > luck. The produced files were different with less differences. > We think we have to follow an approach similar to yours. We thought to > generate a serialized file with all the CompiledMethods we use, without > change the base image, merging them only when squeak starts up. We hope > that this process does not demand too much time, since we have also a lot > of resources to bring up at that time :S Maybe I'm alone in being unclear on this, but what is the root goal here? Maybe there's a simpler way to achieve it. Avi |
Well writing the image out, means doing a full GC, some cleanup, then
we write out some header bytes and do bytesWritten = sqImageFileWrite(pointerForOop(memory), sizeof (unsigned char), imageBytes, f); which depending on the platform is #define sqImageFileWrite(ptr, sz, count, f) fwrite(ptr, sz, count, f) or sqInt sqImageFileWrite(void *ptr, size_t elementSize, size_t count, sqImageFile f) { if (f != 0) return fwrite(ptr,elementSize,count,f); return 0; } or size_t sqImageFileWrite(void *ptr, size_t sz, size_t count, sqImageFile h) { DWORD dwReallyWritten; WriteFile((HANDLE)(h-1), (LPVOID) ptr, count*sz, &dwReallyWritten, NULL); return (size_t) (dwReallyWritten / sz); } So after we've shoved the entire oops memory space out to what ever the file handle points to we start running the VM which instantly changes the bytes in memory because objects are created/destroyed as as result of executing byte codes. If you have some desire to make duplicate images look at primitiveSnapshot and consider cloning that to perform the writeImageFile() twice using different image names. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
> If you have some desire to make duplicate images look at > > primitiveSnapshot > > and consider cloning that to perform the writeImageFile() twice using > different image names. It would be simpler to copy image files after they are written. But I don't this the issue was to copy image files locally. The original poster wanted to update third-party images by shipping fileIns to a reference image instead of the whole image itself. The poser, then, is how to verify that the resulting image is same as intended. I would simply use xdelta (see xdelta.org) for situations like this. E.g. xdelta delta ref.image thirdparty.image thirdparty.xd and ship thirdparty.xd xdelta patch thirdparty.xd ref.image thirdparty.image The downside is xdelta is a memory hungry utility. How big is the image? Regards .. Subbu |
mmm, I wonder how well this would work since when you load an image
we first figure out how big it is then allocate memory for it, load it, then swizzle all the memory references by +/- an offset which is calculated base on the offset used when the image was saved, versus the offset given by the memory location allocated. Now some operating system might give you the same virtual memory address when you use the same VM on the same operating system. In this case we don't have to swizzle the references. Currrent (I believe), certainly past versions of OSX would do this. However in cases where the operating system does not give the same memory address, and I'll note the operating system might give you a random address each time on purpose for security reasons, why all the memory references become different at swizzle tie. Of course if this is the case, then on your next save, all your memory reference values will be different than the last save. Needless to say this would greatly affect how xdelta thinks your images are the save/different. On Jul 30, 2007, at 9:49 PM, subbukk wrote: > On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote: >> If you have some desire to make duplicate images look at >> >> primitiveSnapshot >> >> and consider cloning that to perform the writeImageFile() twice using >> different image names. > It would be simpler to copy image files after they are written. But > I don't > this the issue was to copy image files locally. The original poster > wanted to > update third-party images by shipping fileIns to a reference image > instead of > the whole image itself. The poser, then, is how to verify that the > resulting > image is same as intended. > > I would simply use xdelta (see xdelta.org) for situations like > this. E.g. > > xdelta delta ref.image thirdparty.image thirdparty.xd > and ship thirdparty.xd > xdelta patch thirdparty.xd ref.image thirdparty.image > > The downside is xdelta is a memory hungry utility. How big is the > image? > Regards .. Subbu > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Avi Bryant-2
Hi Avi,
The main goal is to certify a software development. The certifier must to check that a set of source files produce a binary output. We give them: 1 - The final image and VM 2 - The Smalltalk source files (fileOuts of our development) 3 - The VM C source files 4 - The base image 5 - A make script that compiles the VM, filesIn the smalltalk source files on the base image and produces a final images and VM. They need to check that the two images, the one we give (1) and the generated by our script (5) are the same. They check the differences by doing a binary diff plus a hash over the files. They only could allow changes refered to timestamps. They don't know anything about Smalltalk... Best regards, Martin On Mon, 30 Jul 2007 18:41:03 -0300, Avi Bryant <[hidden email]> wrote: > On 7/30/07, Martin Troielli <[hidden email]> wrote: >> Hi Klaus, >> >> Thanks for the information. We have modified the VM in order to reduce >> the >> GC activity inhibiting it until the fileIn processes are done, but had >> no >> luck. The produced files were different with less differences. >> We think we have to follow an approach similar to yours. We thought to >> generate a serialized file with all the CompiledMethods we use, without >> change the base image, merging them only when squeak starts up. We hope >> that this process does not demand too much time, since we have also a >> lot >> of resources to bring up at that time :S > > Maybe I'm alone in being unclear on this, but what is the root goal > here? Maybe there's a simpler way to achieve it. > > Avi > -- Ing. MartÃn Troielli - [hidden email] psiware | desarrollo de software tel. +54 (341) 411-3966, 448-8572 Rosario S2000CVV, Santa Fe, ARGENTINA www.psiware.com.ar |
On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote: > Hi Avi, > > The main goal is to certify a software development. The certifier > must to check that a set of source files produce a binary output. > > We give them: > 1 - The final image and VM > 2 - The Smalltalk source files (fileOuts of our development) > 3 - The VM C source files > 4 - The base image > 5 - A make script that compiles the VM, filesIn the smalltalk > source files on the base image and produces a final images and VM. > > They need to check that the two images, the one we give (1) and the > generated by our script (5) are the same. They check the > differences by doing a binary diff plus a hash over the files. They > only could allow changes refered to timestamps. They don't know > anything about Smalltalk... > > Best regards, > Martin 12 years back I had a client like this. Let's see if I remember... you could try doing | m | m := OrderedCollection new. SystemNavigation default allBehaviorsDo: [ :behavior | behavior selectors do: [ :sel | decompiled := Decompiler new decompile: sel in: behavior. m add: decompiled]]. ^m where you sort the behaviors by the class name, then sort the selectors and instead of collecting the decompiled value you stream the print string out to a stream. This should give you all the source code for the image in a sorted order which you then can compare as text files. Think of it as decompiling the binary to see if the assembly instructions are the same. What's missing is the globals and class variable values, but you might not need those... ? Perhaps even a file out of all the methods in the image after the build might help? -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Hi John,
Yes, I think that way we'll show that from different image files the extracted "source code" is exactly the same. I hope the certifiers could understand this... Thanks anyway, Martin On Wed, 01 Aug 2007 05:15:37 -0300, John M McIntosh <[hidden email]> wrote: > > On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote: > >> Hi Avi, >> >> The main goal is to certify a software development. The certifier must >> to check that a set of source files produce a binary output. >> >> We give them: >> 1 - The final image and VM >> 2 - The Smalltalk source files (fileOuts of our development) >> 3 - The VM C source files >> 4 - The base image >> 5 - A make script that compiles the VM, filesIn the smalltalk source >> files on the base image and produces a final images and VM. >> >> They need to check that the two images, the one we give (1) and the >> generated by our script (5) are the same. They check the differences by >> doing a binary diff plus a hash over the files. They only could allow >> changes refered to timestamps. They don't know anything about >> Smalltalk... >> >> Best regards, >> Martin > > 12 years back I had a client like this. Let's see if I remember... > > you could try doing > > | m | > m := OrderedCollection new. > SystemNavigation default allBehaviorsDo: [ :behavior | > behavior selectors do: [ :sel | > decompiled := Decompiler new decompile: sel in: behavior. > m add: decompiled]]. > ^m > > where you sort the behaviors by the class name, then sort the selectors > and instead of collecting the decompiled value you > stream the print string out to a stream. This should give you all the > source code for the image in a sorted order which you then can > compare as text files. > > Think of it as decompiling the binary to see if the assembly > instructions are the same. > > What's missing is the globals and class variable values, but you might > not need those... ? > > > Perhaps even a file out of all the methods in the image after the build > might help? > > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
Free forum by Nabble | Edit this page |