Smalltalk › Squeak › Squeak - Dev

How to generate identically image file after snapshots

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

13 messages Options

Martin Troielli

How to generate identically image file after snapshots

Hi All,

We have tried to produce the same (bit identically) image file after two
consecutive snapshots. We start from a base image then, fileIn several
files into it and finally, we just SmalltalkImage current snapshot: true
andQuit: true. We need this to verify the image file generated by a
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process
and the snapshot), we found that the image files have, beside the
timestamp differences, thousands of other differences and sometimes the
snapshots have also size differeces.
We supose that this kind of issues may occur due to the GC activity.
Are this issues from the way GC process is changing dynamically the memory
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli

deploy.zip (440 bytes) Download Attachment

Gary Chambers-4

RE: How to generate identically image file after snapshots

Not to mention anything that records TimeStamps or clock values...

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Martin
Troielli
Sent: 30 July 2007 4:32 pm
To: [hidden email]
Subject: How to generate identically image file after snapshots

Hi All,

We have tried to produce the same (bit identically) image file after two
consecutive snapshots. We start from a base image then, fileIn several
files into it and finally, we just SmalltalkImage current snapshot: true
andQuit: true. We need this to verify the image file generated by a
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process
and the snapshot), we found that the image files have, beside the
timestamp differences, thousands of other differences and sometimes the
snapshots have also size differeces.
We supose that this kind of issues may occur due to the GC activity. Are
this issues from the way GC process is changing dynamically the memory
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli

Boris Popov, DeepCove Labs (SNN)

RE: How to generate identically image file after snapshots

In reply to this post by Martin Troielli

cp image.im twinbrother.im

;)

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message
header. Unless otherwise indicated, it contains information that is
private and confidential. If you have received it in error, please
notify the sender and delete the entire message including any
attachments.

Thank you.

> -----Original Message-----
> From: [hidden email]
[mailto:squeak-dev-
> [hidden email]] On Behalf Of Martin Troielli
> Sent: Monday, July 30, 2007 8:32 AM
> To: [hidden email]
> Subject: How to generate identically image file after snapshots
>
> Hi All,
>
> We have tried to produce the same (bit identically) image file after
two
> consecutive snapshots. We start from a base image then, fileIn several
> files into it and finally, we just SmalltalkImage current snapshot:
true
> andQuit: true. We need this to verify the image file generated by a
> third-party with a checksum by executing a script.
> After trying several ways to get it (even by scripting the fileIn
process
> and the snapshot), we found that the image files have, beside the
> timestamp differences, thousands of other differences and sometimes
the
> snapshots have also size differeces.
> We supose that this kind of issues may occur due to the GC activity.
> Are this issues from the way GC process is changing dynamically the
memory
> bytes? There is a way to inhibite this activity?
> Attached are the scripts we use to produce the image files.
>
> Many thanks in advance,
> Martin Troielli

Klaus D. Witzel

Re: How to generate identically image file after snapshots

In reply to this post by Martin Troielli

Hi Martin,

there are a lot of objects (like, for example subinstances of ContextPart)
allocated and deallocated on which you do not have much control.

One corner from which this could be started is to consider enumerating (in
two sister .images) all the objects you want to deploy. If that fails to
produce comparable objects (for any reason, for example if you cannot
order/compare object identities other than by hash identity and the latter
is assigned by the VM and not by you) then, hrm, it fails.

But if not then you could trace out all the objects you want (thereby
disacrding all the unwanted) and the resulting (two sister) .image files
then have the same contents byte by byte, because you fix the object's
position in the files. I've done that with other images and non-Smalltalk
interpreters.

Having said that, your project doesn't look to be easy.

/Klaus

On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:

> Hi All,
>
> We have tried to produce the same (bit identically) image file after two
> consecutive snapshots. We start from a base image then, fileIn several
> files into it and finally, we just SmalltalkImage current snapshot: true
> andQuit: true. We need this to verify the image file generated by a
> third-party with a checksum by executing a script.
> After trying several ways to get it (even by scripting the fileIn process
> and the snapshot), we found that the image files have, beside the
> timestamp differences, thousands of other differences and sometimes the
> snapshots have also size differeces.
> We supose that this kind of issues may occur due to the GC activity.
> Are this issues from the way GC process is changing dynamically the
> memory
> bytes? There is a way to inhibite this activity?
> Attached are the scripts we use to produce the image files.
>
> Many thanks in advance,
> Martin Troielli

Sean Glazier-3

RE: How to generate identically image file after snapshots

In reply to this post by Martin Troielli

You have to cp the image once the engine interacts with it it is never the
"same" lots of objects get created and destroyed in image start up the
garbage collector runs anything using a clock runs. So if it is a script it
has to be a shell script. Any deployment specific stuff should go in some
config or text file that is read on start up.

Sean

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Martin
Troielli
Sent: Monday, July 30, 2007 8:32 AM
To: [hidden email]
Subject: How to generate identically image file after snapshots

Hi All,

We have tried to produce the same (bit identically) image file after two
consecutive snapshots. We start from a base image then, fileIn several files
into it and finally, we just SmalltalkImage current snapshot: true
andQuit: true. We need this to verify the image file generated by a
third-party with a checksum by executing a script.
After trying several ways to get it (even by scripting the fileIn process
and the snapshot), we found that the image files have, beside the timestamp
differences, thousands of other differences and sometimes the snapshots have
also size differeces.
We supose that this kind of issues may occur due to the GC activity.
Are this issues from the way GC process is changing dynamically the memory
bytes? There is a way to inhibite this activity?
Attached are the scripts we use to produce the image files.

Many thanks in advance,
Martin Troielli

Martin Troielli

Re: How to generate identically image file after snapshots

In reply to this post by Klaus D. Witzel

Hi Klaus,

Thanks for the information. We have modified the VM in order to reduce the
GC activity inhibiting it until the fileIn processes are done, but had no
luck. The produced files were different with less differences.
We think we have to follow an approach similar to yours. We thought to
generate a serialized file with all the CompiledMethods we use, without
change the base image, merging them only when squeak starts up. We hope
that this process does not demand too much time, since we have also a lot
of resources to bring up at that time :S

Regards,
Martin

On Mon, 30 Jul 2007 13:22:08 -0300, Klaus D. Witzel
<[hidden email]> wrote:

> Hi Martin,
>
> there are a lot of objects (like, for example subinstances of
> ContextPart) allocated and deallocated on which you do not have much
> control.
>
> One corner from which this could be started is to consider enumerating
> (in two sister .images) all the objects you want to deploy. If that
> fails to produce comparable objects (for any reason, for example if you
> cannot order/compare object identities other than by hash identity and
> the latter is assigned by the VM and not by you) then, hrm, it fails.
>
> But if not then you could trace out all the objects you want (thereby
> disacrding all the unwanted) and the resulting (two sister) .image files
> then have the same contents byte by byte, because you fix the object's
> position in the files. I've done that with other images and
> non-Smalltalk interpreters.
>
> Having said that, your project doesn't look to be easy.
>
> /Klaus
>
> On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:
>
>> Hi All,
>>
>> We have tried to produce the same (bit identically) image file after two
>> consecutive snapshots. We start from a base image then, fileIn several
>> files into it and finally, we just SmalltalkImage current snapshot: true
>> andQuit: true. We need this to verify the image file generated by a
>> third-party with a checksum by executing a script.
>> After trying several ways to get it (even by scripting the fileIn
>> process
>> and the snapshot), we found that the image files have, beside the
>> timestamp differences, thousands of other differences and sometimes the
>> snapshots have also size differeces.
>> We supose that this kind of issues may occur due to the GC activity.
>> Are this issues from the way GC process is changing dynamically the
>> memory
>> bytes? There is a way to inhibite this activity?
>> Attached are the scripts we use to produce the image files.
>>
>> Many thanks in advance,
>> Martin Troielli

Avi Bryant-2

Re: How to generate identically image file after snapshots

On 7/30/07, Martin Troielli <[hidden email]> wrote:

> Hi Klaus,
>
> Thanks for the information. We have modified the VM in order to reduce the
> GC activity inhibiting it until the fileIn processes are done, but had no
> luck. The produced files were different with less differences.
> We think we have to follow an approach similar to yours. We thought to
> generate a serialized file with all the CompiledMethods we use, without
> change the base image, merging them only when squeak starts up. We hope
> that this process does not demand too much time, since we have also a lot
> of resources to bring up at that time :S

Maybe I'm alone in being unclear on this, but what is the root goal
here? Maybe there's a simpler way to achieve it.

Avi

johnmci

Re: How to generate identically image file after snapshots

Well writing the image out, means doing a full GC, some cleanup, then
we write out some header bytes and do

bytesWritten = sqImageFileWrite(pointerForOop(memory), sizeof
(unsigned char), imageBytes, f);

which depending on the platform is

#define sqImageFileWrite(ptr, sz, count, f) fwrite(ptr, sz, count, f)
or
sqInt sqImageFileWrite(void *ptr, size_t elementSize, size_t count,
sqImageFile f) {
if (f != 0)
return fwrite(ptr,elementSize,count,f);
return 0;
}
or
size_t sqImageFileWrite(void *ptr, size_t sz, size_t count,
sqImageFile h)
{
DWORD dwReallyWritten;
WriteFile((HANDLE)(h-1), (LPVOID) ptr, count*sz, &dwReallyWritten,
NULL);
return (size_t) (dwReallyWritten / sz);
}

So after we've shoved the entire oops memory space out to what ever
the file handle points to we
start running the VM which instantly changes the bytes in memory
because objects are created/destroyed as
as result of executing byte codes.

If you have some desire to make duplicate images look at

primitiveSnapshot

and consider cloning that to perform the writeImageFile() twice using
different image names.

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

K. K. Subramaniam

Re: How to generate identically image file after snapshots

On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
> If you have some desire to make duplicate images look at
>
> primitiveSnapshot
>
> and consider cloning that to perform the writeImageFile() twice using
> different image names.
It would be simpler to copy image files after they are written. But I don't
this the issue was to copy image files locally. The original poster wanted to
update third-party images by shipping fileIns to a reference image instead of
the whole image itself. The poser, then, is how to verify that the resulting
image is same as intended.

I would simply use xdelta (see xdelta.org) for situations like this. E.g.

xdelta delta ref.image thirdparty.image thirdparty.xd
and ship thirdparty.xd
xdelta patch thirdparty.xd ref.image thirdparty.image

The downside is xdelta is a memory hungry utility. How big is the image?
Regards .. Subbu

johnmci

Re: How to generate identically image file after snapshots

mmm, I wonder how well this would work since when you load an image
we first figure out how big it is then allocate memory for it, load it,
then swizzle all the memory references by +/- an offset which is
calculated base on the offset used when the image was saved, versus
the offset given by the memory location allocated.

Now some operating system might give you the same virtual memory
address when you use the same VM on the same operating system.
In this case we don't have to swizzle the references. Currrent (I
believe), certainly past versions of OSX would do this.

However in cases where the operating system does not give the same
memory address, and I'll note the operating system might give you
a random address each time on purpose for security reasons, why all
the memory references become different at swizzle tie. Of course if
this is the case, then on your next save,
all your memory reference values will be different than the last
save. Needless to say this would greatly affect how xdelta thinks
your images are the save/different.

On Jul 30, 2007, at 9:49 PM, subbukk wrote:

> On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
>> If you have some desire to make duplicate images look at
>>
>> primitiveSnapshot
>>
>> and consider cloning that to perform the writeImageFile() twice using
>> different image names.
> It would be simpler to copy image files after they are written. But
> I don't
> this the issue was to copy image files locally. The original poster
> wanted to
> update third-party images by shipping fileIns to a reference image
> instead of
> the whole image itself. The poser, then, is how to verify that the
> resulting
> image is same as intended.
>
> I would simply use xdelta (see xdelta.org) for situations like
> this. E.g.
>
> xdelta delta ref.image thirdparty.image thirdparty.xd
> and ship thirdparty.xd
> xdelta patch thirdparty.xd ref.image thirdparty.image
>
> The downside is xdelta is a memory hungry utility. How big is the
> image?
> Regards .. Subbu
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Martin Troielli

Re: How to generate identically image file after snapshots

In reply to this post by Avi Bryant-2

Hi Avi,

The main goal is to certify a software development. The certifier must to
check that a set of source files produce a binary output.

We give them:
1 - The final image and VM
2 - The Smalltalk source files (fileOuts of our development)
3 - The VM C source files
4 - The base image
5 - A make script that compiles the VM, filesIn the smalltalk source files
on the base image and produces a final images and VM.

They need to check that the two images, the one we give (1) and the
generated by our script (5) are the same. They check the differences by
doing a binary diff plus a hash over the files. They only could allow
changes refered to timestamps. They don't know anything about Smalltalk...

Best regards,
Martin

On Mon, 30 Jul 2007 18:41:03 -0300, Avi Bryant <[hidden email]> wrote:

> On 7/30/07, Martin Troielli <[hidden email]> wrote:
>> Hi Klaus,
>>
>> Thanks for the information. We have modified the VM in order to reduce
>> the
>> GC activity inhibiting it until the fileIn processes are done, but had
>> no
>> luck. The produced files were different with less differences.
>> We think we have to follow an approach similar to yours. We thought to
>> generate a serialized file with all the CompiledMethods we use, without
>> change the base image, merging them only when squeak starts up. We hope
>> that this process does not demand too much time, since we have also a
>> lot
>> of resources to bring up at that time :S
>
> Maybe I'm alone in being unclear on this, but what is the root goal
> here? Maybe there's a simpler way to achieve it.
>
> Avi
>

--
Ing. Martín Troielli - [hidden email]

psiware | desarrollo de software
tel. +54 (341) 411-3966, 448-8572
Rosario S2000CVV, Santa Fe, ARGENTINA
www.psiware.com.ar

johnmci

Re: How to generate identically image file after snapshots

On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:

> Hi Avi,
>
> The main goal is to certify a software development. The certifier
> must to check that a set of source files produce a binary output.
>
> We give them:
> 1 - The final image and VM
> 2 - The Smalltalk source files (fileOuts of our development)
> 3 - The VM C source files
> 4 - The base image
> 5 - A make script that compiles the VM, filesIn the smalltalk
> source files on the base image and produces a final images and VM.
>
> They need to check that the two images, the one we give (1) and the
> generated by our script (5) are the same. They check the
> differences by doing a binary diff plus a hash over the files. They
> only could allow changes refered to timestamps. They don't know
> anything about Smalltalk...
>
> Best regards,
> Martin

12 years back I had a client like this. Let's see if I remember...

you could try doing

| m |
m := OrderedCollection new.
SystemNavigation default allBehaviorsDo: [ :behavior |
behavior selectors do: [ :sel |
decompiled := Decompiler new decompile: sel in: behavior.
m add: decompiled]].
^m

where you sort the behaviors by the class name, then sort the
selectors and instead of collecting the decompiled value you
stream the print string out to a stream. This should give you all the
source code for the image in a sorted order which you then can
compare as text files.

Think of it as decompiling the binary to see if the assembly
instructions are the same.

What's missing is the globals and class variable values, but you
might not need those... ?

Perhaps even a file out of all the methods in the image after the
build might help?

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===

Martin Troielli

Re: How to generate identically image file after snapshots

Hi John,

Yes, I think that way we'll show that from different image files the
extracted "source code" is exactly the same. I hope the certifiers could
understand this...

Thanks anyway,
Martin

On Wed, 01 Aug 2007 05:15:37 -0300, John M McIntosh
<[hidden email]> wrote:

>
> On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:
>
>> Hi Avi,
>>
>> The main goal is to certify a software development. The certifier must
>> to check that a set of source files produce a binary output.
>>
>> We give them:
>> 1 - The final image and VM
>> 2 - The Smalltalk source files (fileOuts of our development)
>> 3 - The VM C source files
>> 4 - The base image
>> 5 - A make script that compiles the VM, filesIn the smalltalk source
>> files on the base image and produces a final images and VM.
>>
>> They need to check that the two images, the one we give (1) and the
>> generated by our script (5) are the same. They check the differences by
>> doing a binary diff plus a hash over the files. They only could allow
>> changes refered to timestamps. They don't know anything about
>> Smalltalk...
>>
>> Best regards,
>> Martin
>
> 12 years back I had a client like this. Let's see if I remember...
>
> you could try doing
>
> | m |
> m := OrderedCollection new.
> SystemNavigation default allBehaviorsDo: [ :behavior |
> behavior selectors do: [ :sel |
> decompiled := Decompiler new decompile: sel in: behavior.
> m add: decompiled]].
> ^m
>
> where you sort the behaviors by the class name, then sort the selectors
> and instead of collecting the decompiled value you
> stream the print string out to a stream. This should give you all the
> source code for the image in a sorted order which you then can
> compare as text files.
>
> Think of it as decompiling the binary to see if the assembly
> instructions are the same.
>
> What's missing is the globals and class variable values, but you might
> not need those... ?
>
>
> Perhaps even a file out of all the methods in the image after the build
> might help?
>
> --
> ========================================================================
> ===
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
>
>

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/