[squeak-dev] A few more arguments to instantiating object memory based on another one

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] A few more arguments to instantiating object memory based on another one

Igor Stasenko
Hi folks,

soon Hydra will provide a support to instantiate new interpreter
instance from current object memory, e.g. not based on images which
residing on file system.

The main focus for this feature is to create a tiny images, with
limited behavior for off-loading processing from main interpreter to
separate worker interpreter.
Since Hydra already has mechanisms to transfer data between
interpreters, the need in initially packing new image(s) with data is
minimal.
The most important (and interesting) IMO is to define and transfer a
behavior (classes & their methods) which is minimal for solving some
problem in its domain.

Now, why i think its more convenient than having separate images?
First it is easier to support and distribute: you having a single
'bloated' main image which carrying all necessary code & data and
don't need to build a bunch of small images and manage them in
distribution.

Firing new interpreter instance through copying data from base heap to
new heap could be even faster than reading & running image from file,
because no disk i/o and all operations performed in memory.

A primitive, which doing copy & run takes two arguments: an array of
object refs to be cloned into new heap and array of stubs in a form of
pairs (oop + index of oop in first array which will replace reference
to original oop).
Before doing anything, the primitive check if given arguments forming
a closed object memory graph e.g there is no references outside of it.

These two arrays can be pre-generated and sit in base image, so you
may have different sets of precalculated graphs for different needs
and then simply spawn new interpreter(s) at system startup.
Also, as far as you controlling development & distribution cycle, you
can keep such arrays within image and recalculate them when it needs
to.
And you can always include any mechanisms for error handling in
mini-images which could tell if anything goes wrong (like handling
unknown messages, catching bugs etc).

Also, i'm looking forward for integration with Spoon main feature -
behavior imprinting, when consumer image asks provider image to
deliver behavior required to run some code.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
Igor Stasenko wrote:

> Hi folks,
>
> soon Hydra will provide a support to instantiate new interpreter
> instance from current object memory, e.g. not based on images which
> residing on file system.
>
> The main focus for this feature is to create a tiny images, with
> limited behavior for off-loading processing from main interpreter to
> separate worker interpreter.
> Since Hydra already has mechanisms to transfer data between
> interpreters, the need in initially packing new image(s) with data is
> minimal.
> The most important (and interesting) IMO is to define and transfer a
> behavior (classes & their methods) which is minimal for solving some
> problem in its domain.
>
> Now, why i think its more convenient than having separate images?
> First it is easier to support and distribute: you having a single
> 'bloated' main image which carrying all necessary code & data and
> don't need to build a bunch of small images and manage them in
> distribution.
>
> Firing new interpreter instance through copying data from base heap to
> new heap could be even faster than reading & running image from file,
> because no disk i/o and all operations performed in memory.
>
> A primitive, which doing copy & run takes two arguments: an array of
> object refs to be cloned into new heap and array of stubs in a form of
> pairs (oop + index of oop in first array which will replace reference
> to original oop).
> Before doing anything, the primitive check if given arguments forming
> a closed object memory graph e.g there is no references outside of it.
>
> These two arrays can be pre-generated and sit in base image, so you
> may have different sets of precalculated graphs for different needs
> and then simply spawn new interpreter(s) at system startup.
> Also, as far as you controlling development & distribution cycle, you
> can keep such arrays within image and recalculate them when it needs
> to.
> And you can always include any mechanisms for error handling in
> mini-images which could tell if anything goes wrong (like handling
> unknown messages, catching bugs etc).
>
> Also, i'm looking forward for integration with Spoon main feature -
> behavior imprinting, when consumer image asks provider image to
> deliver behavior required to run some code.
>
>  


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
Oops, that last send was just a leeeeetle bit premature.  The real
response will be along shortly.

Sheepishly,
Josh


Joshua Gargus wrote:

> Igor Stasenko wrote:
>> Hi folks,
>>
>> soon Hydra will provide a support to instantiate new interpreter
>> instance from current object memory, e.g. not based on images which
>> residing on file system.
>>
>> The main focus for this feature is to create a tiny images, with
>> limited behavior for off-loading processing from main interpreter to
>> separate worker interpreter.
>> Since Hydra already has mechanisms to transfer data between
>> interpreters, the need in initially packing new image(s) with data is
>> minimal.
>> The most important (and interesting) IMO is to define and transfer a
>> behavior (classes & their methods) which is minimal for solving some
>> problem in its domain.
>>
>> Now, why i think its more convenient than having separate images?
>> First it is easier to support and distribute: you having a single
>> 'bloated' main image which carrying all necessary code & data and
>> don't need to build a bunch of small images and manage them in
>> distribution.
>>
>> Firing new interpreter instance through copying data from base heap to
>> new heap could be even faster than reading & running image from file,
>> because no disk i/o and all operations performed in memory.
>>
>> A primitive, which doing copy & run takes two arguments: an array of
>> object refs to be cloned into new heap and array of stubs in a form of
>> pairs (oop + index of oop in first array which will replace reference
>> to original oop).
>> Before doing anything, the primitive check if given arguments forming
>> a closed object memory graph e.g there is no references outside of it.
>>
>> These two arrays can be pre-generated and sit in base image, so you
>> may have different sets of precalculated graphs for different needs
>> and then simply spawn new interpreter(s) at system startup.
>> Also, as far as you controlling development & distribution cycle, you
>> can keep such arrays within image and recalculate them when it needs
>> to.
>> And you can always include any mechanisms for error handling in
>> mini-images which could tell if anything goes wrong (like handling
>> unknown messages, catching bugs etc).
>>
>> Also, i'm looking forward for integration with Spoon main feature -
>> behavior imprinting, when consumer image asks provider image to
>> deliver behavior required to run some code.
>>
>>  
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
In reply to this post by Igor Stasenko
(Wishing myself more success with writing this email before sending it :-) )

Igor Stasenko wrote:

> Hi folks,
>
> soon Hydra will provide a support to instantiate new interpreter
> instance from current object memory, e.g. not based on images which
> residing on file system.
>
> The main focus for this feature is to create a tiny images, with
> limited behavior for off-loading processing from main interpreter to
> separate worker interpreter.
> Since Hydra already has mechanisms to transfer data between
> interpreters, the need in initially packing new image(s) with data is
> minimal.
> The most important (and interesting) IMO is to define and transfer a
> behavior (classes & their methods) which is minimal for solving some
> problem in its domain.
>  
Not that it's immediately relevant, but keep in mind that we'll
eventually want to be able to share behavior between images.  Possibly
the most important thing is how much happier this will make the L2
caches on future multi-core chips.
> Now, why i think its more convenient than having separate images?
>  
If you had to pick between one or the other, I can see how your argument
makes sense.  However, I don't see why you can't trivially have both.  
Furthermore, there are use-cases where the ability to load an image from
disk (or from the network) might be useful.  More on both of these
points below.
> First it is easier to support and distribute: you having a single
> 'bloated' main image which carrying all necessary code & data and
> don't need to build a bunch of small images and manage them in
> distribution.
>
> Firing new interpreter instance through copying data from base heap to
> new heap could be even faster than reading & running image from file,
> because no disk i/o and all operations performed in memory.
>  
This seems like an unfair comparison.  A better comparison would be
comparing your method to running an image once it has already been
loaded from a file (since, of course, you can store an image as a a
ByteArray in memory just as easily as you can store your
object-graph-array).

> A primitive, which doing copy & run takes two arguments: an array of
> object refs to be cloned into new heap and array of stubs in a form of
> pairs (oop + index of oop in first array which will replace reference
> to original oop).
> Before doing anything, the primitive check if given arguments forming
> a closed object memory graph e.g there is no references outside of it.
>
> These two arrays can be pre-generated and sit in base image, so you
> may have different sets of precalculated graphs for different needs
> and then simply spawn new interpreter(s) at system startup.
> Also, as far as you controlling development & distribution cycle, you
> can keep such arrays within image and recalculate them when it needs
> to.
> And you can always include any mechanisms for error handling in
> mini-images which could tell if anything goes wrong (like handling
> unknown messages, catching bugs etc).
>
> Also, i'm looking forward for integration with Spoon main feature -
> behavior imprinting, when consumer image asks provider image to
> deliver behavior required to run some code.
>
>  
The technical details of your approach sound good to me (without having
thought deeply enough to provide truly constructive criticism).  However...

My main concern is that your argument against separate images is
disingenuous.  They won't be slower if you store them as ByteArrays
within the main image.  In fact, I believe that the opposite would be
true; don't you agree?  From a performance standpoint, it seems like
separate images are the better option.

Separate images allow (security implementations aside) nifty things like
mobile code... I can download an image from a server or a P2P network
and run it in my image.  I don't yet know what I would do with this
ability, but as we ge more experience with the object-capability
security model (hello Newspeak!) I'm sure that there will be no shortage
of good ideas.

Of course, these separate images need to be built somehow, and it seems
to me that this is where your ideas fit in (for development more than
deployment).

Cheers,
Josh


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Igor Stasenko
2008/8/15 Joshua Gargus <[hidden email]>:

> (Wishing myself more success with writing this email before sending it :-) )
>
> Igor Stasenko wrote:
>>
>> Hi folks,
>>
>> soon Hydra will provide a support to instantiate new interpreter
>> instance from current object memory, e.g. not based on images which
>> residing on file system.
>>
>> The main focus for this feature is to create a tiny images, with
>> limited behavior for off-loading processing from main interpreter to
>> separate worker interpreter.
>> Since Hydra already has mechanisms to transfer data between
>> interpreters, the need in initially packing new image(s) with data is
>> minimal.
>> The most important (and interesting) IMO is to define and transfer a
>> behavior (classes & their methods) which is minimal for solving some
>> problem in its domain.
>>
>
> Not that it's immediately relevant, but keep in mind that we'll eventually
> want to be able to share behavior between images.  Possibly the most
> important thing is how much happier this will make the L2 caches on future
> multi-core chips.
>>
>> Now, why i think its more convenient than having separate images?
>>
>
> If you had to pick between one or the other, I can see how your argument
> makes sense.  However, I don't see why you can't trivially have both.
>  Furthermore, there are use-cases where the ability to load an image from
> disk (or from the network) might be useful.  More on both of these points
> below.

You can load & run image from disk. This feature is main one in
initial release of Hydra.

>>
>> First it is easier to support and distribute: you having a single
>> 'bloated' main image which carrying all necessary code & data and
>> don't need to build a bunch of small images and manage them in
>> distribution.
>>
>> Firing new interpreter instance through copying data from base heap to
>> new heap could be even faster than reading & running image from file,
>> because no disk i/o and all operations performed in memory.
>>
>
> This seems like an unfair comparison.  A better comparison would be
> comparing your method to running an image once it has already been loaded
> from a file (since, of course, you can store an image as a a ByteArray in
> memory just as easily as you can store your object-graph-array).

Hmm, why its unfair? the difference lies only in the place where VM
getting info for creating new instance of interpreter:
a) by loading image from disk
b) by cloning provided set of objects to new heap from existing one.

the rest of operations for creating and initializing new interpreter
instance is same.

Of course, producing new image is process based on some heuristics
using base image, this could take much more time, of course. But as i
said, it could be compensated by keeping precalculated data in base
image.
The extra memory requirements for two arrays (which representing new
image) can't be compared with full object memory snapshot which you
need to keep separately on disk.

>>
>> A primitive, which doing copy & run takes two arguments: an array of
>> object refs to be cloned into new heap and array of stubs in a form of
>> pairs (oop + index of oop in first array which will replace reference
>> to original oop).
>> Before doing anything, the primitive check if given arguments forming
>> a closed object memory graph e.g there is no references outside of it.
>>
>> These two arrays can be pre-generated and sit in base image, so you
>> may have different sets of precalculated graphs for different needs
>> and then simply spawn new interpreter(s) at system startup.
>> Also, as far as you controlling development & distribution cycle, you
>> can keep such arrays within image and recalculate them when it needs
>> to.
>> And you can always include any mechanisms for error handling in
>> mini-images which could tell if anything goes wrong (like handling
>> unknown messages, catching bugs etc).
>>
>> Also, i'm looking forward for integration with Spoon main feature -
>> behavior imprinting, when consumer image asks provider image to
>> deliver behavior required to run some code.
>>
>>
>
> The technical details of your approach sound good to me (without having
> thought deeply enough to provide truly constructive criticism).  However...
>
> My main concern is that your argument against separate images is
> disingenuous.  They won't be slower if you store them as ByteArrays within
> the main image.  In fact, I believe that the opposite would be true; don't
> you agree?  From a performance standpoint, it seems like separate images are
> the better option.

Well, if you want to do a real-time spawn  & kill dozens interpreters,
then i need to disappoint you:
for initializing new interpreter instance there a lot of things
besides loading new image in memory which could make this process
really slow. First of all - this is initialization of plugins &
interpreter states.
Of couse, this could be improved by postponing plugin initialization
up to point where it really needed, but i think it will be hard to do
with current VMMaker design.

>
> Separate images allow (security implementations aside) nifty things like
> mobile code... I can download an image from a server or a P2P network and
> run it in my image.  I don't yet know what I would do with this ability, but
> as we ge more experience with the object-capability security model (hello
> Newspeak!) I'm sure that there will be no shortage of good ideas.
>

Surely, one could use a bytearray to instantiate new image.
Even now, you can just write new image to temp file first and then
instantiate new interpreter from that image.
And then, later we can add a primitive which could simply take new
image from bytearray.
This could be useful, but not very valuable to my thinking, since its
not adding anything new in the ways how new images could be produced.

Also, don't forget about possible future use cases, when we possible
meet with model how to support cross-heap references by making images
interconnected with each other using far referencing.
With this hypothetical model, you will not need to form a closed graph
of objects, you just define a set of objects which will be cloned into
new heap, while rest references will be threated by VM as far
references to base heap.

> Of course, these separate images need to be built somehow, and it seems to
> me that this is where your ideas fit in (for development more than
> deployment).
>

Yes, what i actually proposing is the way how you can build image &
run it without doing any file/stream based i/o, also conserving memory
space by reusing/copying already existing objects in original image.

> Cheers,
> Josh
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Igor Stasenko
2008/8/15 Igor Stasenko <[hidden email]>:

> 2008/8/15 Joshua Gargus <[hidden email]>:
>> (Wishing myself more success with writing this email before sending it :-) )
>>
>> Igor Stasenko wrote:
>>>
>>> Hi folks,
>>>
>>> soon Hydra will provide a support to instantiate new interpreter
>>> instance from current object memory, e.g. not based on images which
>>> residing on file system.
>>>
>>> The main focus for this feature is to create a tiny images, with
>>> limited behavior for off-loading processing from main interpreter to
>>> separate worker interpreter.
>>> Since Hydra already has mechanisms to transfer data between
>>> interpreters, the need in initially packing new image(s) with data is
>>> minimal.
>>> The most important (and interesting) IMO is to define and transfer a
>>> behavior (classes & their methods) which is minimal for solving some
>>> problem in its domain.
>>>
>>
>> Not that it's immediately relevant, but keep in mind that we'll eventually
>> want to be able to share behavior between images.  Possibly the most
>> important thing is how much happier this will make the L2 caches on future
>> multi-core chips.
>>>
>>> Now, why i think its more convenient than having separate images?
>>>
>>
>> If you had to pick between one or the other, I can see how your argument
>> makes sense.  However, I don't see why you can't trivially have both.
>>  Furthermore, there are use-cases where the ability to load an image from
>> disk (or from the network) might be useful.  More on both of these points
>> below.
>
> You can load & run image from disk. This feature is main one in
> initial release of Hydra.
>
>>>
>>> First it is easier to support and distribute: you having a single
>>> 'bloated' main image which carrying all necessary code & data and
>>> don't need to build a bunch of small images and manage them in
>>> distribution.
>>>
>>> Firing new interpreter instance through copying data from base heap to
>>> new heap could be even faster than reading & running image from file,
>>> because no disk i/o and all operations performed in memory.
>>>
>>
>> This seems like an unfair comparison.  A better comparison would be
>> comparing your method to running an image once it has already been loaded
>> from a file (since, of course, you can store an image as a a ByteArray in
>> memory just as easily as you can store your object-graph-array).
>
> Hmm, why its unfair? the difference lies only in the place where VM
> getting info for creating new instance of interpreter:
> a) by loading image from disk
> b) by cloning provided set of objects to new heap from existing one.
>
> the rest of operations for creating and initializing new interpreter
> instance is same.
>
> Of course, producing new image is process based on some heuristics
> using base image, this could take much more time, of course. But as i
> said, it could be compensated by keeping precalculated data in base
> image.
> The extra memory requirements for two arrays (which representing new
> image) can't be compared with full object memory snapshot which you
> need to keep separately on disk.
>
>>>
>>> A primitive, which doing copy & run takes two arguments: an array of
>>> object refs to be cloned into new heap and array of stubs in a form of
>>> pairs (oop + index of oop in first array which will replace reference
>>> to original oop).
>>> Before doing anything, the primitive check if given arguments forming
>>> a closed object memory graph e.g there is no references outside of it.
>>>
>>> These two arrays can be pre-generated and sit in base image, so you
>>> may have different sets of precalculated graphs for different needs
>>> and then simply spawn new interpreter(s) at system startup.
>>> Also, as far as you controlling development & distribution cycle, you
>>> can keep such arrays within image and recalculate them when it needs
>>> to.
>>> And you can always include any mechanisms for error handling in
>>> mini-images which could tell if anything goes wrong (like handling
>>> unknown messages, catching bugs etc).
>>>
>>> Also, i'm looking forward for integration with Spoon main feature -
>>> behavior imprinting, when consumer image asks provider image to
>>> deliver behavior required to run some code.
>>>
>>>
>>
>> The technical details of your approach sound good to me (without having
>> thought deeply enough to provide truly constructive criticism).  However...
>>
>> My main concern is that your argument against separate images is
>> disingenuous.  They won't be slower if you store them as ByteArrays within
>> the main image.  In fact, I believe that the opposite would be true; don't
>> you agree?  From a performance standpoint, it seems like separate images are
>> the better option.
>
> Well, if you want to do a real-time spawn  & kill dozens interpreters,
> then i need to disappoint you:
> for initializing new interpreter instance there a lot of things
> besides loading new image in memory which could make this process
> really slow. First of all - this is initialization of plugins &
> interpreter states.
> Of couse, this could be improved by postponing plugin initialization
> up to point where it really needed, but i think it will be hard to do
> with current VMMaker design.
>
... or maybe i'm not fair in this point. I think i need to investigate
this option more deeply, because it is really would be better to do
lazy initialization for plugin states for new interpreter instance(s)
because new interpreter can possibly use only few of them.

>>
>> Separate images allow (security implementations aside) nifty things like
>> mobile code... I can download an image from a server or a P2P network and
>> run it in my image.  I don't yet know what I would do with this ability, but
>> as we ge more experience with the object-capability security model (hello
>> Newspeak!) I'm sure that there will be no shortage of good ideas.
>>
>
> Surely, one could use a bytearray to instantiate new image.
> Even now, you can just write new image to temp file first and then
> instantiate new interpreter from that image.
> And then, later we can add a primitive which could simply take new
> image from bytearray.
> This could be useful, but not very valuable to my thinking, since its
> not adding anything new in the ways how new images could be produced.
>
> Also, don't forget about possible future use cases, when we possible
> meet with model how to support cross-heap references by making images
> interconnected with each other using far referencing.
> With this hypothetical model, you will not need to form a closed graph
> of objects, you just define a set of objects which will be cloned into
> new heap, while rest references will be threated by VM as far
> references to base heap.
>
>> Of course, these separate images need to be built somehow, and it seems to
>> me that this is where your ideas fit in (for development more than
>> deployment).
>>
>
> Yes, what i actually proposing is the way how you can build image &
> run it without doing any file/stream based i/o, also conserving memory
> space by reusing/copying already existing objects in original image.
>
>> Cheers,
>> Josh
>>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Klaus D. Witzel
In reply to this post by Joshua Gargus-2
On Fri, 15 Aug 2008 09:11:21 +0200, Joshua Gargus wrote:

> (Wishing myself more success with writing this email before sending it  
> :-) )
>
> Igor Stasenko wrote:
>> Hi folks,
>>
>> soon Hydra will provide a support to instantiate new interpreter
>> instance from current object memory, e.g. not based on images which
>> residing on file system.
>>
>> The main focus for this feature is to create a tiny images, with
>> limited behavior for off-loading processing from main interpreter to
>> separate worker interpreter.
>> Since Hydra already has mechanisms to transfer data between
>> interpreters, the need in initially packing new image(s) with data is
>> minimal.
>> The most important (and interesting) IMO is to define and transfer a
>> behavior (classes & their methods) which is minimal for solving some
>> problem in its domain.
>>
> Not that it's immediately relevant, but keep in mind that we'll  
> eventually want to be able to share behavior between images.

This is my part of Igor's enterprise ;) we discuss my crazy cross-heap  
pointerage approaches with Igor as sparring partner ;)

The main attention got GC, which has (among others) these aspects:

o distributing object allocation is very promising
  (Guillermo Adrián Molina: distributing alloc is the main
   application of parallel processing [native threads in
   Huemul], email communication)

o 90% of each Process (remember 1:1 to a thread) references
   are in the same chunk of process generated objects, 9% of
   the references are to globals, and 1% to other process
   generated objects (Guillermo Adrián Molina, native threads
   in Huemul, email communication)

o local GC gets often in the way when doing things in parallel

o garbage can by come cyclic cross-heap and so unreclaimable

> Possibly the most important thing is how much happier this will make the  
> L2 caches on future multi-core chips.

:)

>> Now, why i think its more convenient than having separate images?
>>
> If you had to pick between one or the other, I can see how your argument  
> makes sense.  However, I don't see why you can't trivially have both.  
> Furthermore, there are use-cases where the ability to load an image from  
> disk (or from the network) might be useful.  More on both of these  
> points below.
>> First it is easier to support and distribute: you having a single
>> 'bloated' main image which carrying all necessary code & data and
>> don't need to build a bunch of small images and manage them in
>> distribution.
>>
>> Firing new interpreter instance through copying data from base heap to
>> new heap could be even faster than reading & running image from file,
>> because no disk i/o and all operations performed in memory.
>>
> This seems like an unfair comparison.  A better comparison would be  
> comparing your method to running an image once it has already been  
> loaded from a file (since, of course, you can store an image as a a  
> ByteArray in memory just as easily as you can store your  
> object-graph-array).

The focus here is, what is needed for a new parallel computational task to  
be offloaded *on*the*fly* for running in another thread+heap; if one needs  
a harddisk for that, that part must be purchased and installed and  
formatted and populated (joking ;)

>> A primitive, which doing copy & run takes two arguments: an array of
>> object refs to be cloned into new heap and array of stubs in a form of
>> pairs (oop + index of oop in first array which will replace reference
>> to original oop).
>> Before doing anything, the primitive check if given arguments forming
>> a closed object memory graph e.g there is no references outside of it.
>>
>> These two arrays can be pre-generated and sit in base image, so you
>> may have different sets of precalculated graphs for different needs
>> and then simply spawn new interpreter(s) at system startup.
>> Also, as far as you controlling development & distribution cycle, you
>> can keep such arrays within image and recalculate them when it needs
>> to.
>> And you can always include any mechanisms for error handling in
>> mini-images which could tell if anything goes wrong (like handling
>> unknown messages, catching bugs etc).
>>
>> Also, i'm looking forward for integration with Spoon main feature -
>> behavior imprinting, when consumer image asks provider image to
>> deliver behavior required to run some code.
>>
>>
> The technical details of your approach sound good to me (without having  
> thought deeply enough to provide truly constructive criticism).  
> However...
>
> My main concern is that your argument against separate images is  
> disingenuous.  They won't be slower if you store them as ByteArrays  
> within the main image.

But then they are always in the way when GC comes around :( This would  
invalidate all the pointers of the parallel thread and require global  
synchronization :(

Not a good idea :( we want things to run in parallel independent of each  
other's GC.

> In fact, I believe that the opposite would be true; don't you agree?  
> From a performance standpoint, it seems like separate images are the  
> better option.

When creation of bytearray versus creation of separate heap can be  
ignored, there would be no difference in terms of performance (it's all  
oops all the way down, anyways). Only that bytearrays are not usable for  
parallel processing.

> Separate images allow (security implementations aside) nifty things like  
> mobile code...

Yes, sure, every thread+heap in Hydra represents an .image that can be  
snaphotted and used however you like it. There can even be .images,  
created in the way that Igor discribes, which won't need the HydraVM  
power, just the stock Squeak VM power.

> I can download an image from a server or a P2P network and run it in my  
> image.  I don't yet know what I would do with this ability, but as we ge  
> more experience with the object-capability security model (hello  
> Newspeak!) I'm sure that there will be no shortage of good ideas.
>
> Of course, these separate images need to be built somehow, and it seems  
> to me that this is where your ideas fit in (for development more than  
> deployment).

No, there's no limit for deployment, it depends on what application the  
separate .image contains. Some will require HydraVM power when deployed,  
others not.

/Klaus

> Cheers,
> Josh
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
2008/8/15 Joshua Gargus [hidden email]:
  
(Wishing myself more success with writing this email before sending it :-) )

Igor Stasenko wrote:
    
Hi folks,

soon Hydra will provide a support to instantiate new interpreter
instance from current object memory, e.g. not based on images which
residing on file system.

The main focus for this feature is to create a tiny images, with
limited behavior for off-loading processing from main interpreter to
separate worker interpreter.
Since Hydra already has mechanisms to transfer data between
interpreters, the need in initially packing new image(s) with data is
minimal.
The most important (and interesting) IMO is to define and transfer a
behavior (classes & their methods) which is minimal for solving some
problem in its domain.

      
Not that it's immediately relevant, but keep in mind that we'll eventually
want to be able to share behavior between images.  Possibly the most
important thing is how much happier this will make the L2 caches on future
multi-core chips.
    
Now, why i think its more convenient than having separate images?

      
If you had to pick between one or the other, I can see how your argument
makes sense.  However, I don't see why you can't trivially have both.
 Furthermore, there are use-cases where the ability to load an image from
disk (or from the network) might be useful.  More on both of these points
below.
    

You can load & run image from disk. This feature is main one in
initial release of Hydra.

  
First it is easier to support and distribute: you having a single
'bloated' main image which carrying all necessary code & data and
don't need to build a bunch of small images and manage them in
distribution.

Firing new interpreter instance through copying data from base heap to
new heap could be even faster than reading & running image from file,
because no disk i/o and all operations performed in memory.

      
This seems like an unfair comparison.  A better comparison would be
comparing your method to running an image once it has already been loaded
from a file (since, of course, you can store an image as a a ByteArray in
memory just as easily as you can store your object-graph-array).
    

Hmm, why its unfair? the difference lies only in the place where VM
getting info for creating new instance of interpreter:
a) by loading image from disk
b) by cloning provided set of objects to new heap from existing one.
  
We both agree that loading from disk is slow, and that b) can be faster than a) because of it.  However, you are the one who is arguing that this speed difference is in itself a reason to prefer cloning a set of object to loading an image.  Since the speed difference is erased when the image starts off as a ByteArray rather than on disk, this part of your argument is not valid.
the rest of operations for creating and initializing new interpreter
instance is same.

Of course, producing new image is process based on some heuristics
using base image, this could take much more time, of course. But as i
said, it could be compensated by keeping precalculated data in base
image.
The extra memory requirements for two arrays (which representing new
image) can't be compared with full object memory snapshot which you
need to keep separately on disk.
  
That's not true.  In a deployment image (ignoring developement), the main image could contain only ByteArrays representing the images to be spawned; the arrays you describe (and the objects they contain) would have been garbage-collected.  Therefore, the memory required is the same.
  
A primitive, which doing copy & run takes two arguments: an array of
object refs to be cloned into new heap and array of stubs in a form of
pairs (oop + index of oop in first array which will replace reference
to original oop).
Before doing anything, the primitive check if given arguments forming
a closed object memory graph e.g there is no references outside of it.

These two arrays can be pre-generated and sit in base image, so you
may have different sets of precalculated graphs for different needs
and then simply spawn new interpreter(s) at system startup.
Also, as far as you controlling development & distribution cycle, you
can keep such arrays within image and recalculate them when it needs
to.
And you can always include any mechanisms for error handling in
mini-images which could tell if anything goes wrong (like handling
unknown messages, catching bugs etc).

Also, i'm looking forward for integration with Spoon main feature -
behavior imprinting, when consumer image asks provider image to
deliver behavior required to run some code.


      
The technical details of your approach sound good to me (without having
thought deeply enough to provide truly constructive criticism).  However...

My main concern is that your argument against separate images is
disingenuous.  They won't be slower if you store them as ByteArrays within
the main image.  In fact, I believe that the opposite would be true; don't
you agree?  From a performance standpoint, it seems like separate images are
the better option.
    

Well, if you want to do a real-time spawn  & kill dozens interpreters,
then i need to disappoint you:
for initializing new interpreter instance there a lot of things
besides loading new image in memory which could make this process
really slow. First of all - this is initialization of plugins &
interpreter states.
Of couse, this could be improved by postponing plugin initialization
up to point where it really needed, but i think it will be hard to do
with current VMMaker design.
  
You would know better than I.  If it turns out that image-loading/object-cloning is not the bottleneck, then the performance issue is moot (and is therefore still not a good argument for preferring cloning vs. loading a ByteArray image ;-) )

Besides, as you hint at, it makes sense to spawn/kill interpreter quite infrequently.  Instead of firing up an interpreter to perform one task and then killing it, it makes more sense to treat it as a work-queue processor that is idle when there is nothing to do.

  
Separate images allow (security implementations aside) nifty things like
mobile code... I can download an image from a server or a P2P network and
run it in my image.  I don't yet know what I would do with this ability, but
as we ge more experience with the object-capability security model (hello
Newspeak!) I'm sure that there will be no shortage of good ideas.

    

Surely, one could use a bytearray to instantiate new image.
Even now, you can just write new image to temp file first and then
instantiate new interpreter from that image.
And then, later we can add a primitive which could simply take new
image from bytearray.
This could be useful, but not very valuable to my thinking, since its
not adding anything new in the ways how new images could be produced.

  
If your main goal is to explore new ways to produce new images, then you're right, this is not valuable at all.

However, if your goals also include exploring new ways to deploy images, then I think that it is worth thinking about.
Also, don't forget about possible future use cases, when we possible
meet with model how to support cross-heap references by making images
interconnected with each other using far referencing.
With this hypothetical model, you will not need to form a closed graph
of objects, you just define a set of objects which will be cloned into
new heap, while rest references will be threated by VM as far
references to base heap.
  
That's interesting.  I wasn't thinking of that, thanks.

I'm curious about what the object-capability folks would think of this, as opposed to loading images from ByteArrays and explicitly "injecting" the capabilities (far-refs) that you want the spawned image to have.  Seems like it would be easy to accidentally grant more authority than you intend.

Nevertheless, it's a very cool thought.
  
Of course, these separate images need to be built somehow, and it seems to
me that this is where your ideas fit in (for development more than
deployment).

    

Yes, what i actually proposing is the way how you can build image &
run it without doing any file/stream based i/o, also conserving memory
space by reusing/copying already existing objects in original image.

  
Pardon me for missing some context whilst jumping into the middle of this discussion.  :-)

Cheers,
Josh



  
Cheers,
Josh

    


  



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:

> On Fri, 15 Aug 2008 09:11:21 +0200, Joshua Gargus wrote:
>
>> (Wishing myself more success with writing this email before sending
>> it :-) )
>>
>> Igor Stasenko wrote:
>>> Hi folks,
>>>
>>> soon Hydra will provide a support to instantiate new interpreter
>>> instance from current object memory, e.g. not based on images which
>>> residing on file system.
>>>
>>> The main focus for this feature is to create a tiny images, with
>>> limited behavior for off-loading processing from main interpreter to
>>> separate worker interpreter.
>>> Since Hydra already has mechanisms to transfer data between
>>> interpreters, the need in initially packing new image(s) with data is
>>> minimal.
>>> The most important (and interesting) IMO is to define and transfer a
>>> behavior (classes & their methods) which is minimal for solving some
>>> problem in its domain.
>>>
>> Not that it's immediately relevant, but keep in mind that we'll
>> eventually want to be able to share behavior between images.
>
> This is my part of Igor's enterprise ;) we discuss my crazy cross-heap
> pointerage approaches with Igor as sparring partner ;)
>
> The main attention got GC, which has (among others) these aspects:
>
> o distributing object allocation is very promising
>  (Guillermo Adrián Molina: distributing alloc is the main
>   application of parallel processing [native threads in
>   Huemul], email communication)
>
> o 90% of each Process (remember 1:1 to a thread) references
>   are in the same chunk of process generated objects, 9% of
>   the references are to globals, and 1% to other process
>   generated objects (Guillermo Adrián Molina, native threads
>   in Huemul, email communication)
>
> o local GC gets often in the way when doing things in parallel
>
> o garbage can by come cyclic cross-heap and so unreclaimable

That's all very interesting (especially the measurements about which
objects a Process references).  I'll reluctanly resist the temptation to
take the conversation in 10 different directions :-)

(big snip)

>> The technical details of your approach sound good to me (without
>> having thought deeply enough to provide truly constructive
>> criticism).  However...
>>
>> My main concern is that your argument against separate images is
>> disingenuous.  They won't be slower if you store them as ByteArrays
>> within the main image.
>
> But then they are always in the way when GC comes around :( This would
> invalidate all the pointers of the parallel thread and require global
> synchronization :(
>
> Not a good idea :( we want things to run in parallel independent of
> each other's GC.
>
I think that there is a misunderstanding.  I'm saying that you can store
a prototype of an image as a ByteArray in the main image, but you
wouldn't actually run a spawned interpreter using the ByteArray as the
object memory!  You would use it to populate a separate, newly-spawned
HydraVM object memory.

It would actually be pretty funny to implement it the way you thought I
meant, in the same way that Intercal and Lolcode are funny (except this
would be more of an inside joke).  But certainly not practical!

(hmm, maybe we could combine them... you could spawn a new interpreter
with the command "I can has new interpreter?"... what do you think?)

>> In fact, I believe that the opposite would be true; don't you agree?  
>> From a performance standpoint, it seems like separate images are the
>> better option.
>
> When creation of bytearray versus creation of separate heap can be
> ignored, there would be no difference in terms of performance (it's
> all oops all the way down, anyways). Only that bytearrays are not
> usable for parallel processing.
Now that the confusion above has been cleared up...

Wouldn't it be faster to spawn a new object-memory from an image in a
ByteArray (which requires a memcpy() and a single pass through the image
to relocate oops by a fixed amount) compared to the scheme that Igor
describes?

(snip the rest, where we are in agreement)

Cheers,
Josh

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Klaus D. Witzel
On Fri, 15 Aug 2008 19:24:02 +0200, Joshua Gargus wrote:

> Klaus D. Witzel wrote:
...
>> The main attention got GC, which has (among others) these aspects:
>>
...
>
> That's all very interesting (especially the measurements about which  
> objects a Process references).  I'll reluctanly resist the temptation to  
> take the conversation in 10 different directions :-)

Feel free to send them by email :)

> (big snip)
>
>>> The technical details of your approach sound good to me (without  
>>> having thought deeply enough to provide truly constructive  
>>> criticism).  However...
>>>
>>> My main concern is that your argument against separate images is  
>>> disingenuous.  They won't be slower if you store them as ByteArrays  
>>> within the main image.
>>
>> But then they are always in the way when GC comes around :( This would  
>> invalidate all the pointers of the parallel thread and require global  
>> synchronization :(
>>
>> Not a good idea :( we want things to run in parallel independent of  
>> each other's GC.
>>
> I think that there is a misunderstanding.  I'm saying that you can store  
> a prototype of an image as a ByteArray in the main image, but you  
> wouldn't actually run a spawned interpreter using the ByteArray as the  
> object memory!  You would use it to populate a separate, newly-spawned  
> HydraVM object memory.

I thought about that but didn't find it interesting; this is what already  
happens when a snapshot is written and read in again.  
InterpreterSimulator, which holds the bytearray that you want (in a  
Bitmap) does this with help of its "real work" superclasses. No need to  
develop that again, IMHO.

> It would actually be pretty funny to implement it the way you thought I  
> meant, in the same way that Intercal and Lolcode are funny (except this  
> would be more of an inside joke).  But certainly not practical!
>
> (hmm, maybe we could combine them... you could spawn a new interpreter  
> with the command "I can has new interpreter?"... what do you think?)

Snapshit can't baby has? More humor and more imperatives, please :)  
Lolcode and Intercal are not easy for people sans English mother tongue :)

>>> In fact, I believe that the opposite would be true; don't you agree?  
>>> From a performance standpoint, it seems like separate images are the  
>>> better option.
>>
>> When creation of bytearray versus creation of separate heap can be  
>> ignored, there would be no difference in terms of performance (it's all  
>> oops all the way down, anyways). Only that bytearrays are not usable  
>> for parallel processing.
> Now that the confusion above has been cleared up...
>
> Wouldn't it be faster to spawn a new object-memory from an image in a  
> ByteArray (which requires a memcpy() and a single pass through the image  
> to relocate oops by a fixed amount)
> compared to the scheme that Igor describes?

No, there a lot of disadvantages with this. Lets' say that computing the  
desired object graph takes a minute, +100 milliseconds for your single  
pass. And thereafter your ByteArray is unusable, because for every change  
(or is it bug free? and maintenace free?) you have to go through the whole  
process again.

So what is wrong with holding the desired object graph in an array  
(sometimes two arrays)? If you really want bytes (a BitMap) out of this  
then you can put it things a new and idle Hydra thread+heap and push the  
button with the "snapshot" label on it, there you go.

Perhaps we misunderstand each other on what the content of the object  
graph / your bytearray is?

/Klaus

> (snip the rest, where we are in agreement)
>
> Cheers,
> Josh
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Igor Stasenko
2008/8/15 Klaus D. Witzel <[hidden email]>:

> On Fri, 15 Aug 2008 19:24:02 +0200, Joshua Gargus wrote:
>
>> Klaus D. Witzel wrote:
>
> ...
>>>
>>> The main attention got GC, which has (among others) these aspects:
>>>
> ...
>>
>> That's all very interesting (especially the measurements about which
>> objects a Process references).  I'll reluctanly resist the temptation to
>> take the conversation in 10 different directions :-)
>
> Feel free to send them by email :)
>
>> (big snip)
>>
>>>> The technical details of your approach sound good to me (without having
>>>> thought deeply enough to provide truly constructive criticism).  However...
>>>>
>>>> My main concern is that your argument against separate images is
>>>> disingenuous.  They won't be slower if you store them as ByteArrays within
>>>> the main image.
>>>
>>> But then they are always in the way when GC comes around :( This would
>>> invalidate all the pointers of the parallel thread and require global
>>> synchronization :(
>>>
>>> Not a good idea :( we want things to run in parallel independent of each
>>> other's GC.
>>>
>> I think that there is a misunderstanding.  I'm saying that you can store a
>> prototype of an image as a ByteArray in the main image, but you wouldn't
>> actually run a spawned interpreter using the ByteArray as the object memory!
>>  You would use it to populate a separate, newly-spawned HydraVM object
>> memory.
>
> I thought about that but didn't find it interesting; this is what already
> happens when a snapshot is written and read in again. InterpreterSimulator,
> which holds the bytearray that you want (in a Bitmap) does this with help of
> its "real work" superclasses. No need to develop that again, IMHO.
>
>> It would actually be pretty funny to implement it the way you thought I
>> meant, in the same way that Intercal and Lolcode are funny (except this
>> would be more of an inside joke).  But certainly not practical!
>>
>> (hmm, maybe we could combine them... you could spawn a new interpreter
>> with the command "I can has new interpreter?"... what do you think?)
>
> Snapshit can't baby has? More humor and more imperatives, please :) Lolcode
> and Intercal are not easy for people sans English mother tongue :)
>
>>>> In fact, I believe that the opposite would be true; don't you agree?
>>>>  From a performance standpoint, it seems like separate images are the better
>>>> option.
>>>
>>> When creation of bytearray versus creation of separate heap can be
>>> ignored, there would be no difference in terms of performance (it's all oops
>>> all the way down, anyways). Only that bytearrays are not usable for parallel
>>> processing.
>>
>> Now that the confusion above has been cleared up...
>>
>> Wouldn't it be faster to spawn a new object-memory from an image in a
>> ByteArray (which requires a memcpy() and a single pass through the image to
>> relocate oops by a fixed amount)
>> compared to the scheme that Igor describes?
>
> No, there a lot of disadvantages with this. Lets' say that computing the
> desired object graph takes a minute, +100 milliseconds for your single pass.
> And thereafter your ByteArray is unusable, because for every change (or is
> it bug free? and maintenace free?) you have to go through the whole process
> again.
>
> So what is wrong with holding the desired object graph in an array
> (sometimes two arrays)? If you really want bytes (a BitMap) out of this then
> you can put it things a new and idle Hydra thread+heap and push the button
> with the "snapshot" label on it, there you go.
>
> Perhaps we misunderstand each other on what the content of the object graph
> / your bytearray is?
>

My 2 cents. The main difference is in approaches how you treating an
object memory
- do you want to treat it as a big blob of dead bytes
- or you want to treat it as a collection of live objects
interconnected with each other

The first approach give us little freedom in the ways how we can
operate with it: its just a bunch of dead bytes, which can start
living only in running interpreter.
With second, you keep staying with objects all the time - you don't
need to care about object formats, file formats etc - this is VM
responsibility which provides us abstraction layer and we don't need
to care about it anymore. So with a proper tools written, you will
have a full control on what is going on and how to form new object
memory without going deep in undertanding of VM internals.

For some people my words may be sound as herecy, but i think i'm not
alone with this POV - why operating with dead, rigid and hardly
maintainable data placed in files when we having much more beatiful
and powerful concepts found in smalltalk.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: a few more arguments to instantiating an object memory based on another one

ccrraaiigg
In reply to this post by Joshua Gargus-2

 > ...you could spawn a new interpreter with the command "I can has new
 > interpreter?"...

      Hey, that would make a great Quoth[1] utility. :)


-C

[1] http://netjam.org/quoth



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Joshua Gargus-2
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:

> On Fri, 15 Aug 2008 19:24:02 +0200, Joshua Gargus wrote:
>> I think that there is a misunderstanding.  I'm saying that you can
>> store a prototype of an image as a ByteArray in the main image, but
>> you wouldn't actually run a spawned interpreter using the ByteArray
>> as the object memory!  You would use it to populate a separate,
>> newly-spawned HydraVM object memory.
>
> I thought about that but didn't find it interesting; this is what
> already happens when a snapshot is written and read in again.
> InterpreterSimulator, which holds the bytearray that you want (in a
> Bitmap) does this with help of its "real work" superclasses. No need
> to develop that again, IMHO.
>
Not develop it again, just use it.


>> It would actually be pretty funny to implement it the way you thought
>> I meant, in the same way that Intercal and Lolcode are funny (except
>> this would be more of an inside joke).  But certainly not practical!
>>
>> (hmm, maybe we could combine them... you could spawn a new
>> interpreter with the command "I can has new interpreter?"... what do
>> you think?)
>
> Snapshit can't baby has? More humor and more imperatives, please :)
> Lolcode and Intercal are not easy for people sans English mother
> tongue :)
>
:-)

>>>> In fact, I believe that the opposite would be true; don't you
>>>> agree?  From a performance standpoint, it seems like separate
>>>> images are the better option.
>>>
>>> When creation of bytearray versus creation of separate heap can be
>>> ignored, there would be no difference in terms of performance (it's
>>> all oops all the way down, anyways). Only that bytearrays are not
>>> usable for parallel processing.
>> Now that the confusion above has been cleared up...
>>
>> Wouldn't it be faster to spawn a new object-memory from an image in a
>> ByteArray (which requires a memcpy() and a single pass through the
>> image to relocate oops by a fixed amount)
>> compared to the scheme that Igor describes?
>
> No, there a lot of disadvantages with this. Lets' say that computing
> the desired object graph takes a minute, +100 milliseconds for your
> single pass. And thereafter your ByteArray is unusable, because for
> every change (or is it bug free? and maintenace free?) you have to go
> through the whole process again.
>
I'm trying to be clear that I think that Igor's idea is a promising way
to develop and create new object-memories.  I'm simply suggesting that
once you've created an object memory (using Igor's method, or via a
declarative specification, or whatever), then it is possibly better to
have production code spawn new interpreters from a "snapshot image".
> So what is wrong with holding the desired object graph in an array
> (sometimes two arrays)?
Nothing, when you're developing.  But when I have a running production
system and I decide that I need to spawn a new interpreter, I want it to
happen as fast as possible.  I don't want to be doing fancy traversals
of an object graph, I'd rather memcpy() a pre-created image and run
through it once to fix offsets.
> If you really want bytes (a BitMap) out of this then you can put it
> things a new and idle Hydra thread+heap and push the button with the
> "snapshot" label on it, there you go.
Sure, that's a fine way to implement it.  I don't care how it's
implemented, although simpler and reusing existing code is better.
>
> Perhaps we misunderstand each other on what the content of the object
> graph / your bytearray is?
I think that you misunderstand the contents of my bytearray; it's just a
"snapshot image" (which could, as you suggest, be created by pushing the
"snapshot" button).

It's possible that I'm overestimating the speed advantage of starting up
from a snapshot compared to creating an object-memory via Igor's method,
but I believe that I understand his general approach.

Cheers,
Josh

>
> /Klaus
>
>> (snip the rest, where we are in agreement)
>>
>> Cheers,
>> Josh
>>
>
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: A few more arguments to instantiating object memory based on another one

Klaus D. Witzel
On Sat, 16 Aug 2008 09:46:51 +0200, Joshua Gargus wrote:

> Klaus D. Witzel wrote:
>> On Fri, 15 Aug 2008 19:24:02 +0200, Joshua Gargus wrote:
>>> I think that there is a misunderstanding.  I'm saying that you can  
>>> store a prototype of an image as a ByteArray in the main image, but  
>>> you wouldn't actually run a spawned interpreter using the ByteArray as  
>>> the object memory!  You would use it to populate a separate,  
>>> newly-spawned HydraVM object memory.
>>
>> I thought about that but didn't find it interesting; this is what  
>> already happens when a snapshot is written and read in again.  
>> InterpreterSimulator, which holds the bytearray that you want (in a  
>> Bitmap) does this with help of its "real work" superclasses. No need to  
>> develop that again, IMHO.
>>
> Not develop it again, just use it.

But the source code in Interpreter+ObjectMemory is not usable for that  
unless you pass it an OS specific file handle.

Therefore this (passing of a bytearray which represents some .image) is  
not of interest to me.

>
>>> It would actually be pretty funny to implement it the way you thought  
>>> I meant, in the same way that Intercal and Lolcode are funny (except  
>>> this would be more of an inside joke).  But certainly not practical!
...

>>>>> In fact, I believe that the opposite would be true; don't you  
>>>>> agree?  From a performance standpoint, it seems like separate images  
>>>>> are the better option.
>>>>
>>>> When creation of bytearray versus creation of separate heap can be  
>>>> ignored, there would be no difference in terms of performance (it's  
>>>> all oops all the way down, anyways). Only that bytearrays are not  
>>>> usable for parallel processing.
>>> Now that the confusion above has been cleared up...
>>>
>>> Wouldn't it be faster to spawn a new object-memory from an image in a  
>>> ByteArray (which requires a memcpy() and a single pass through the  
>>> image to relocate oops by a fixed amount)
>>> compared to the scheme that Igor describes?
>>
>> No, there a lot of disadvantages with this. Lets' say that computing  
>> the desired object graph takes a minute, +100 milliseconds for your  
>> single pass. And thereafter your ByteArray is unusable, because for  
>> every change (or is it bug free? and maintenace free?) you have to go  
>> through the whole process again.
>>
> I'm trying to be clear that I think that Igor's idea is a promising way  
> to develop and create new object-memories.  I'm simply suggesting that  
> once you've created an object memory (using Igor's method, or via a  
> declarative specification, or whatever), then it is possibly better to  
> have production code spawn new interpreters from a "snapshot image".

Yes, from within the running .image is meant by both of us, to be clear.

The only difference we have is about the format of things passed to the  
routine which populates the new heap.

>> So what is wrong with holding the desired object graph in an array  
>> (sometimes two arrays)?
> Nothing, when you're developing.  But when I have a running production  
> system and I decide that I need to spawn a new interpreter, I want it to  
> happen as fast as possible.

Sure, but the input to that imaginary routine cannot be accepted  
unverified and this extra work cannot be done with less work than that for  
allocating objects in the new heap (in fact: cloneing ;)

I've taken the best parts out of ObjectMemory>>#clone: and  
ObjectMemory>>#allocateChunk: for that, with the following assumptions:

o fromOop is valid oop
o heap was allocated sufficiently large
o freeBlock is valid
o headerTypeBytes is valid
o lastHash is valid

Essentially this does: (freeBlock := freeBlock + numBytes "occupied by the  
clone") and copies the words (newOop[i] := fromOop[i]) thereby mapping  
references to new locations.

But this routine doesn't have to validate any oop, since the fromOop's are  
still alive and in good shape, and participating in cell division happily  
;)

> I don't want to be doing fancy traversals of an object graph,

o.k. NP. Once you have the object graph, it can be used times and again,  
like a blueprint for a new thing.

> I'd rather memcpy() a pre-created image and run through it once to fix  
> offsets.

Please not. Please do a Smalltalk typical sanitycheck, validate the input  
(your bytearray) as best you can. It will pay back (Murphy says: sooner ;)  
or later.

And if validation fails, inform the user instead of crashing the system  
(please).

Believe me, proper validation of unknown oops cannot be faster than proper  
cloneing of living objects ;)

>> If you really want bytes (a BitMap) out of this then you can put it  
>> things a new and idle Hydra thread+heap and push the button with the  
>> "snapshot" label on it, there you go.
> Sure, that's a fine way to implement it.  I don't care how it's  
> implemented, although simpler and reusing existing code is better.

I take your word that you "don't care" ;) and propose to go the way that  
was suggested earlier :)

>>
>> Perhaps we misunderstand each other on what the content of the object  
>> graph / your bytearray is?
> I think that you misunderstand the contents of my bytearray; it's just a  
> "snapshot image" (which could, as you suggest, be created by pushing the  
> "snapshot" button).
>
> It's possible that I'm overestimating the speed advantage of starting up  
> from a snapshot compared to creating an object-memory via Igor's method,

 From gut feeling it looks to be the same (#reverseBytesInImage is not  
needed because endianess is the same; also what #adjustAllOopsBy: would  
have to do).

But extra validation and crash prevention will count for the difference,  
IMO.

> but I believe that I understand his general approach.

Okay, happyness :)

/Klaus

> Cheers,
> Josh
>>
>> /Klaus
>>
>>> (snip the rest, where we are in agreement)
>>>
>>> Cheers,
>>> Josh
>>>
>>
>