How to generate identically image file after snapshots

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

How to generate identically image file after snapshots

Martin Troielli
Hi All,

We have tried to produce the same (bit identically) image file after two  
consecutive snapshots. We start from a base image then, fileIn several  
files into it and finally, we just SmalltalkImage current snapshot: true  
andQuit: true. We need this to verify the image file generated by a  
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process  
and the snapshot), we found that the image files have, beside the  
timestamp differences, thousands of other differences and sometimes the  
snapshots have also size differeces.
We supose that this kind of issues may occur due to the GC activity.
Are this issues from the way GC process is changing dynamically the memory  
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli


deploy.zip (440 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: How to generate identically image file after snapshots

Gary Chambers-4
Not to mention anything that records TimeStamps or clock values...

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Martin
Troielli
Sent: 30 July 2007 4:32 pm
To: [hidden email]
Subject: How to generate identically image file after snapshots


Hi All,

We have tried to produce the same (bit identically) image file after two  
consecutive snapshots. We start from a base image then, fileIn several  
files into it and finally, we just SmalltalkImage current snapshot: true  
andQuit: true. We need this to verify the image file generated by a  
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process  
and the snapshot), we found that the image files have, beside the  
timestamp differences, thousands of other differences and sometimes the  
snapshots have also size differeces.
We supose that this kind of issues may occur due to the GC activity. Are
this issues from the way GC process is changing dynamically the memory  
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli


Reply | Threaded
Open this post in threaded view
|

RE: How to generate identically image file after snapshots

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Martin Troielli
cp image.im twinbrother.im

;)

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message
header. Unless otherwise indicated, it contains information that is
private and confidential. If you have received it in error, please
notify the sender and delete the entire message including any
attachments.

Thank you.

> -----Original Message-----
> From: [hidden email]
[mailto:squeak-dev-
> [hidden email]] On Behalf Of Martin Troielli
> Sent: Monday, July 30, 2007 8:32 AM
> To: [hidden email]
> Subject: How to generate identically image file after snapshots
>
> Hi All,
>
> We have tried to produce the same (bit identically) image file after
two
> consecutive snapshots. We start from a base image then, fileIn several
> files into it and finally, we just SmalltalkImage current snapshot:
true
> andQuit: true. We need this to verify the image file generated by a
> third-party with a checksum by executing a script.
> After trying several ways to get it (even by scripting the fileIn
process
> and the snapshot), we found that the image files have, beside the
> timestamp differences, thousands of other differences and sometimes
the
> snapshots have also size differeces.
> We supose that this kind of issues may occur due to the GC activity.
> Are this issues from the way GC process is changing dynamically the
memory
> bytes? There is a way to inhibite this activity?
> Attached are the scripts we use to produce the image files.
>
> Many thanks in advance,
> Martin Troielli

Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

Klaus D. Witzel
In reply to this post by Martin Troielli
Hi Martin,

there are a lot of objects (like, for example subinstances of ContextPart)  
allocated and deallocated on which you do not have much control.

One corner from which this could be started is to consider enumerating (in  
two sister .images) all the objects you want to deploy. If that fails to  
produce comparable objects (for any reason, for example if you cannot  
order/compare object identities other than by hash identity and the latter  
is assigned by the VM and not by you) then, hrm, it fails.

But if not then you could trace out all the objects you want (thereby  
disacrding all the unwanted) and the resulting (two sister) .image files  
then have the same contents byte by byte, because you fix the object's  
position in the files. I've done that with other images and non-Smalltalk  
interpreters.

Having said that, your project doesn't look to be easy.

/Klaus

On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:

> Hi All,
>
> We have tried to produce the same (bit identically) image file after two
> consecutive snapshots. We start from a base image then, fileIn several
> files into it and finally, we just SmalltalkImage current snapshot: true
> andQuit: true. We need this to verify the image file generated by a
> third-party with a checksum by executing a script.
> After trying several ways to get it (even by scripting the fileIn process
> and the snapshot), we found that the image files have, beside the
> timestamp differences, thousands of other differences and sometimes the
> snapshots have also size differeces.
> We supose that this kind of issues may occur due to the GC activity.
> Are this issues from the way GC process is changing dynamically the  
> memory
> bytes? There is a way to inhibite this activity?
> Attached are the scripts we use to produce the image files.
>
> Many thanks in advance,
> Martin Troielli



Reply | Threaded
Open this post in threaded view
|

RE: How to generate identically image file after snapshots

Sean Glazier-3
In reply to this post by Martin Troielli
You have to cp the image once the engine interacts with it it is never the
"same" lots of objects get created and destroyed in image start up the
garbage collector runs anything using a clock runs. So if it is a script it
has to be a shell script. Any deployment specific stuff should go in some
config or text file that is read on start up.

Sean

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Martin
Troielli
Sent: Monday, July 30, 2007 8:32 AM
To: [hidden email]
Subject: How to generate identically image file after snapshots

Hi All,

We have tried to produce the same (bit identically) image file after two
consecutive snapshots. We start from a base image then, fileIn several files
into it and finally, we just SmalltalkImage current snapshot: true
andQuit: true. We need this to verify the image file generated by a
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process
and the snapshot), we found that the image files have, beside the timestamp
differences, thousands of other differences and sometimes the snapshots have
also size differeces.
We supose that this kind of issues may occur due to the GC activity.
Are this issues from the way GC process is changing dynamically the memory
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli


Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

Martin Troielli
In reply to this post by Klaus D. Witzel
Hi Klaus,

Thanks for the information. We have modified the VM in order to reduce the  
GC activity inhibiting it until the fileIn processes are done, but had no  
luck. The produced files were different with less differences.
We think we have to follow an approach similar to yours. We thought to  
generate a serialized file with all the CompiledMethods we use, without  
change the base image, merging them only when squeak starts up. We hope  
that this process does not demand too much time, since we have also a lot  
of resources to bring up at that time :S

Regards,
Martin

On Mon, 30 Jul 2007 13:22:08 -0300, Klaus D. Witzel  
<[hidden email]> wrote:

> Hi Martin,
>
> there are a lot of objects (like, for example subinstances of  
> ContextPart) allocated and deallocated on which you do not have much  
> control.
>
> One corner from which this could be started is to consider enumerating  
> (in two sister .images) all the objects you want to deploy. If that  
> fails to produce comparable objects (for any reason, for example if you  
> cannot order/compare object identities other than by hash identity and  
> the latter is assigned by the VM and not by you) then, hrm, it fails.
>
> But if not then you could trace out all the objects you want (thereby  
> disacrding all the unwanted) and the resulting (two sister) .image files  
> then have the same contents byte by byte, because you fix the object's  
> position in the files. I've done that with other images and  
> non-Smalltalk interpreters.
>
> Having said that, your project doesn't look to be easy.
>
> /Klaus
>
> On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:
>
>> Hi All,
>>
>> We have tried to produce the same (bit identically) image file after two
>> consecutive snapshots. We start from a base image then, fileIn several
>> files into it and finally, we just SmalltalkImage current snapshot: true
>> andQuit: true. We need this to verify the image file generated by a
>> third-party with a checksum by executing a script.
>> After trying several ways to get it (even by scripting the fileIn  
>> process
>> and the snapshot), we found that the image files have, beside the
>> timestamp differences, thousands of other differences and sometimes the
>> snapshots have also size differeces.
>> We supose that this kind of issues may occur due to the GC activity.
>> Are this issues from the way GC process is changing dynamically the  
>> memory
>> bytes? There is a way to inhibite this activity?
>> Attached are the scripts we use to produce the image files.
>>
>> Many thanks in advance,
>> Martin Troielli

Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

Avi Bryant-2
On 7/30/07, Martin Troielli <[hidden email]> wrote:

> Hi Klaus,
>
> Thanks for the information. We have modified the VM in order to reduce the
> GC activity inhibiting it until the fileIn processes are done, but had no
> luck. The produced files were different with less differences.
> We think we have to follow an approach similar to yours. We thought to
> generate a serialized file with all the CompiledMethods we use, without
> change the base image, merging them only when squeak starts up. We hope
> that this process does not demand too much time, since we have also a lot
> of resources to bring up at that time :S

Maybe I'm alone in being unclear on this, but what is the root goal
here?  Maybe there's a simpler way to achieve it.

Avi

Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

johnmci
Well writing the image out, means doing a full GC, some cleanup, then  
we write out some header bytes and do
       
bytesWritten = sqImageFileWrite(pointerForOop(memory), sizeof
(unsigned char), imageBytes, f);

which depending on the platform is

#define sqImageFileWrite(ptr, sz, count, f)  fwrite(ptr, sz, count, f)
or
sqInt sqImageFileWrite(void *ptr, size_t elementSize, size_t count,  
sqImageFile f) {
     if (f != 0)
       return fwrite(ptr,elementSize,count,f);
        return 0;
}
or
size_t sqImageFileWrite(void *ptr, size_t sz, size_t count,  
sqImageFile h)
{
   DWORD dwReallyWritten;
   WriteFile((HANDLE)(h-1), (LPVOID) ptr, count*sz, &dwReallyWritten,  
NULL);
   return (size_t) (dwReallyWritten / sz);
}


So after we've shoved the entire  oops memory space out to what ever  
the file handle points to we
start running the VM which instantly changes the  bytes in memory  
because objects are created/destroyed as
as result of executing byte codes.


If you have some desire to make duplicate images look at

primitiveSnapshot

and consider cloning that to perform the writeImageFile() twice using  
different image names.


--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

K. K. Subramaniam
On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
> If you have some desire to make duplicate images look at
>
> primitiveSnapshot
>
> and consider cloning that to perform the writeImageFile() twice using
> different image names.
It would be simpler to copy image files after they are written. But I don't
this the issue was to copy image files locally. The original poster wanted to
update third-party images by shipping fileIns to a reference image instead of
the whole image itself. The poser, then, is how to verify that the resulting
image is same as intended.

I would simply use xdelta (see xdelta.org) for situations like this. E.g.

 xdelta delta ref.image thirdparty.image thirdparty.xd
and ship thirdparty.xd
 xdelta patch thirdparty.xd ref.image thirdparty.image

The downside is xdelta is a memory hungry utility. How big is the image?
Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

johnmci
mmm, I wonder how well this would work since when you load an image  
we first figure out how big it is then allocate memory for it, load it,
then swizzle all the memory references by +/- an offset which is  
calculated base on the offset used when the image was saved, versus
the offset given by the memory location allocated.

Now some operating system might give you the same virtual memory  
address when you use the same VM on the same operating system.
In this case we don't have to swizzle the references.  Currrent (I  
believe), certainly past versions of OSX would do this.

However in cases where the operating system does not give the same  
memory address, and I'll note the operating system might give you
a random address each time on purpose for security reasons, why all  
the memory references become different at swizzle tie.  Of course if  
this is the case, then on your next save,
all your memory reference values will be different than the last  
save.  Needless to say this would greatly affect how xdelta thinks  
your images are the save/different.

On Jul 30, 2007, at 9:49 PM, subbukk wrote:

> On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
>> If you have some desire to make duplicate images look at
>>
>> primitiveSnapshot
>>
>> and consider cloning that to perform the writeImageFile() twice using
>> different image names.
> It would be simpler to copy image files after they are written. But  
> I don't
> this the issue was to copy image files locally. The original poster  
> wanted to
> update third-party images by shipping fileIns to a reference image  
> instead of
> the whole image itself. The poser, then, is how to verify that the  
> resulting
> image is same as intended.
>
> I would simply use xdelta (see xdelta.org) for situations like  
> this. E.g.
>
>  xdelta delta ref.image thirdparty.image thirdparty.xd
> and ship thirdparty.xd
>  xdelta patch thirdparty.xd ref.image thirdparty.image
>
> The downside is xdelta is a memory hungry utility. How big is the  
> image?
> Regards .. Subbu
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

Martin Troielli
In reply to this post by Avi Bryant-2
Hi Avi,

The main goal is to certify a software development. The certifier must to  
check that a set of source files produce a binary output.

We give them:
1 - The final image and VM
2 - The Smalltalk source files (fileOuts of our development)
3 - The VM C source files
4 - The base image
5 - A make script that compiles the VM, filesIn the smalltalk source files  
on the base image and produces a final images and VM.

They need to check that the two images, the one we give (1) and the  
generated by our script (5) are the same. They check the differences by  
doing a binary diff plus a hash over the files. They only could allow  
changes refered to timestamps. They don't know anything about Smalltalk...

Best regards,
Martin

On Mon, 30 Jul 2007 18:41:03 -0300, Avi Bryant <[hidden email]> wrote:

> On 7/30/07, Martin Troielli <[hidden email]> wrote:
>> Hi Klaus,
>>
>> Thanks for the information. We have modified the VM in order to reduce  
>> the
>> GC activity inhibiting it until the fileIn processes are done, but had  
>> no
>> luck. The produced files were different with less differences.
>> We think we have to follow an approach similar to yours. We thought to
>> generate a serialized file with all the CompiledMethods we use, without
>> change the base image, merging them only when squeak starts up. We hope
>> that this process does not demand too much time, since we have also a  
>> lot
>> of resources to bring up at that time :S
>
> Maybe I'm alone in being unclear on this, but what is the root goal
> here?  Maybe there's a simpler way to achieve it.
>
> Avi
>

--
Ing. Martín Troielli - [hidden email]

psiware | desarrollo de software
tel. +54 (341) 411-3966, 448-8572
Rosario S2000CVV, Santa Fe, ARGENTINA
www.psiware.com.ar

Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

johnmci

On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:

> Hi Avi,
>
> The main goal is to certify a software development. The certifier  
> must to check that a set of source files produce a binary output.
>
> We give them:
> 1 - The final image and VM
> 2 - The Smalltalk source files (fileOuts of our development)
> 3 - The VM C source files
> 4 - The base image
> 5 - A make script that compiles the VM, filesIn the smalltalk  
> source files on the base image and produces a final images and VM.
>
> They need to check that the two images, the one we give (1) and the  
> generated by our script (5) are the same. They check the  
> differences by doing a binary diff plus a hash over the files. They  
> only could allow changes refered to timestamps. They don't know  
> anything about Smalltalk...
>
> Best regards,
> Martin

12 years back I had a client like this. Let's see if I remember...

you could try doing

| m |
m := OrderedCollection new.
SystemNavigation default allBehaviorsDo: [ :behavior |
        behavior selectors do: [ :sel |
                decompiled := Decompiler new decompile: sel in: behavior.
                m add: decompiled]].
^m

where you sort the behaviors by the class name, then sort the  
selectors and instead of collecting the decompiled value you
stream the print string out to a stream. This should give you all the  
source code for the image in a sorted order which you then can
compare as text files.

Think of it as decompiling the binary to see if the assembly  
instructions are the same.

What's missing is the globals and class variable values, but you  
might not need those... ?


Perhaps even a file out of all the methods in the image after the  
build might help?

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===



Reply | Threaded
Open this post in threaded view
|

Re: How to generate identically image file after snapshots

Martin Troielli
Hi John,

Yes, I think that way we'll show that from different image files the  
extracted "source code" is exactly the same. I hope the certifiers could  
understand this...

Thanks anyway,
Martin

On Wed, 01 Aug 2007 05:15:37 -0300, John M McIntosh  
<[hidden email]> wrote:

>
> On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:
>
>> Hi Avi,
>>
>> The main goal is to certify a software development. The certifier must  
>> to check that a set of source files produce a binary output.
>>
>> We give them:
>> 1 - The final image and VM
>> 2 - The Smalltalk source files (fileOuts of our development)
>> 3 - The VM C source files
>> 4 - The base image
>> 5 - A make script that compiles the VM, filesIn the smalltalk source  
>> files on the base image and produces a final images and VM.
>>
>> They need to check that the two images, the one we give (1) and the  
>> generated by our script (5) are the same. They check the differences by  
>> doing a binary diff plus a hash over the files. They only could allow  
>> changes refered to timestamps. They don't know anything about  
>> Smalltalk...
>>
>> Best regards,
>> Martin
>
> 12 years back I had a client like this. Let's see if I remember...
>
> you could try doing
>
> | m |
> m := OrderedCollection new.
> SystemNavigation default allBehaviorsDo: [ :behavior |
> behavior selectors do: [ :sel |
> decompiled := Decompiler new decompile: sel in: behavior.
> m add: decompiled]].
> ^m
>
> where you sort the behaviors by the class name, then sort the selectors  
> and instead of collecting the decompiled value you
> stream the print string out to a stream. This should give you all the  
> source code for the image in a sorted order which you then can
> compare as text files.
>
> Think of it as decompiling the binary to see if the assembly  
> instructions are the same.
>
> What's missing is the globals and class variable values, but you might  
> not need those... ?
>
>
> Perhaps even a file out of all the methods in the image after the build  
> might help?
>
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>
>



--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/