[squeak-dev] Creating an image from first principles

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Creating an image from first principles

timrowledge

On 8-Jul-08, at 1:01 PM, Eliot Miranda wrote:
>
> - a system built in this way doesn't have to be a complete  
> Smalltalk.  It can, for example, omit a compiler, or omit reflective  
> parts of the system without which one couldn't inspect, make  
> modifications to the class hierarchy, decompile, etc, etc.  So a  
> system built from teh ground up is much more easily made secure.
It could also be a very minimalist image intended to perform a single  
specific task and then die. As a (probably dumb) example, spawn out an  
image with just enough capability to load a webpage from a specific  
URL and write the content somewhere for later use. If you have a  
machine with  thousands of cores available this might be a useful way  
of fetching net content.
Instead of starting up a thread (or ten) and scheduling it and having  
worries about semaphores and global access etc,   leave it to an image  
that can only do the job you want. Remove all the multi-thread stuff  
from the system and use one image, one thread of control.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: FA: Failsafe Armed



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Creating an image from first principles

Igor Stasenko
In reply to this post by Eliot Miranda-2
2008/7/8 Eliot Miranda <[hidden email]>:

>
>
> On Mon, Jul 7, 2008 at 7:49 PM, Andreas Raab <[hidden email]> wrote:
>>
>> Folks -
>>
>> Eliot and I had a great lunch conversation today and it convinced me that
>> I really should write up an idea that I had earlier and that is actually
>> pretty simple: How to create your own image from scratch.
>> Here is how it goes.
>>
>> Start with the interpreter simulator and a (literally) empty object
>> memory. Read a series of class definitions (you can use either MC class defs
>> or simply parse simple class definitions from sources) that are sufficient
>> to define all of the kernel structures that are required by the running VM
>> (incl. Object, Behavior, Class, Integer, Array, Process, CompiledMethod,
>> ContextPart, Semaphore etc. etc. etc.). Create those by calling the
>> allocators explicitly and set them up such that the structure is correct
>> (format, superclasses, metaclasses etc). Create nil, true and false based on
>> these definitions.
>>
>> At this point we have a skeleton of classes that we can use to instantiate
>> all behaviors required by a running image.
>>
>> Next, make a modification to the compiler that allows one to create a
>> compiled method in the simulator from a MethodNode (which should be
>> straightforward since the simulator exposes all of the good stuff for
>> creating new objects and instances).
>>
>> Now we can create new compiled methods in the new image as long as they
>> don't refer to any globals.
>>
>> Next, find a way of dealing with two issues: a) adding the compiled method
>> "properly" (e.g., deal with symbol interning and modifying
>> MethodDictionaries) and b) global name lookups performed by the compiler
>> (since the image is prototypical we can't have it send actual messages; not
>> even simulated ones ;-)
>>
>> The latter issue is the only one that doesn't seem completely obvious
>> which is why I would advocate that a bootstrap kernel mustn't use class
>> variables or shared pools (in which case the lookup is again trivial since
>> you know all the possible names from compiling the original structure).
>
> I don't understand why this is difficult.  Here's how I think it works.
> Every time the compiler to simulated objects creates an object that is a
> global it also creates an association for the global in the simulator's heap
> and adds the global to a suitable scope dictionary it maintains.  So it
> maintains shadow scopes for Smalltalk (or nemaspaces when we have them) and
> class pools etc.  Then the scope lookup mechanism uses these scopes when
> compiling methods.  Lookups for globals will find the right associations
> even though the dictionaries holding those associations don't yet exist in
> the simlulator's heap.  Once enough of the bootstrap is complete the
> compiler can then create the globals (Smalltalk, non-empty class pools) and
> populate them using the associations.  The creation and hashing of the
> dictionaries is done by the simulator, but the compiler generates the
> invocations of the dictionary creation code using sequences of associations
> it extracts from its shadow scope dictionaries.
>

Right. Exactly in the way how i done things in CorruptVM and it works just fine.
I starting a bootstrap by sending single message, which leads to
creating a first object, which is then tries fill all its slots, like
vtable(class), which triggers creating a class, which triggers
creating a bunch of another objects, compiling and installing methods
in new classes, interning symbols etc etc.

I think same could be done for Hydra easily. Even easier, because you
don't have to deal with different object format.
If you need, you may use my code as reference or as reference, how not
to do things :)
http://www.squeaksource.com/CorruptVM

>
>> Now we can load all the source we want to be in our bootstrap image.
>>
>> Lastly, do the bootstrap: Instantiate the first process, its first
>> context, the first message. Run it in the simulator to set up the remaining
>> parts of the kernel image (Delay, ProcessorScheduler etc).
>>
>> Voila, at this point we have a fully functioning kernel image, created
>> completely from first principles.
>>
>> Once you have the kernel image there is no end to the fun: Since you can
>> now start sending messages "into" the image (by way of the simulator) you
>> can compile any code you want (incl. pools and class vars) and lookup the
>> names properly by sending a message to the interpreter simulator. And then
>> you just save the image and are ready to go.
>>
>> Anyone interested?
>
> Oh no.  No.  Not at all.  Not in the least.  No, really, no.  Um, ah, no.
>
>
>> Cheers,
>>  - Andreas
>>
>> PS. Oh, and I'd be also interested in defining a good interface to do this
>> by means of Hydra, i.e., instead of having to run the simulator run the
>> compiled VM on an "empty image" to do all of this "for real" instead of in
>> the simulator.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Re: Creating an image from first principles

Klaus D. Witzel
In reply to this post by Eliot Miranda-2
On Tue, 08 Jul 2008 19:41:44 +0200, Eliot Miranda wrote:

> On Tue, Jul 8, 2008 at 12:08 AM, Klaus D. Witzel wrote:
...

>> Even when doing the blueprint trick, what remains to be done is  
>> simulating
>> additions to the method dictionary.
>
>
> Let me try again (clearly too early in the a.m.)
>
>
> In the image (not in the simulator) create proxy objects for the  
> selectors
> in the simulator heap that answer the identity hashs of the objects in  
> the
> simulator heap.  Create a method dictionary in the image populated with  
> the
> proxy objects.  The order of the objects in the image method dictionary  
> is
> the correct order for the method dictionary in the simulator.
>
> In general if you want to perform a computation on objects in the  
> simulator
> heap before you've bootstrapped the code to perform the computation,  
> create
> proxy objects in the image and run the computation on them.  Right?

Yeah, right. This is the Smalltalk way of doing things, thank you :)


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Creating an image from first principles

stephane ducasse
In reply to this post by Eliot Miranda-2
> But the most compelling reason for me is to do with the system's  
> architecture.  The Smalltalk system should be architected as an  
> onion, each layer of the onion being composed of a set of components  
> (like techtonic plates).  If it is architected like this, with  
> components at the centre not using anything in outer layers (*) then  
> removing components becomes much easier. So the normal mode of  
> development is to develop an application in a large well-facilitated  
> development image.  Once developed the programmer clones the image  
> and unloads the components they think they don't need and tests the  
> resulting application.  That differs from the current strippng  
> approach in that one is removing coarse-grained components with well-
> understood functionalities and boundaries instead of trying to infer  
> the subset of code still used by the system.  So starting from a  
> system that is built from the ground up should enable Squeak to  
> evolve into a properly modular system.

Yes we need a nice bootstrappable onion

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Creating an image from first principles

Paolo Bonzini-2
In reply to this post by Colin Putney
> Something like this is going to be full of subtleties, so maybe
> simulation offers benefits that gestation doesn't. Any thoughts?

I like the simulation approach a lot; it is very similar to what GNU
Smalltalk does to bootstrap its images, except that gst obviously uses C
code rather than Smalltalk -- but that does not matter, it's just a
different programming language.  If you're interested I have a few
documents on how this is done; they're written in normal English so no
GPL woes! >:->

As Craig pointed out, the devil is in the details of starting up all
threads; but if this is done, I think it would benefit all of Squeak.

Note that you can have "doits" well before you finish bootstrapping the
system.  You can run each doit in a separate Process and simulate until
that process exits.

> It's class and pool variables that are tricky because even
> Dictionary>>at: may not exist in the image that you're trying to compile
> so doing the lookup directly in there would be quite tricky. It would be
> doable if one assumed a particular organization of the classes (i.e.,
> the n-th iVar is the dictionary of class vars) and then interpreted it
> externally but it seems like an unnecessary complication for an initial
> bootstrap.

That's what gst does, and it's not very complicated after all.

Paolo

12