Folks -
Eliot and I had a great lunch conversation today and it convinced me that I really should write up an idea that I had earlier and that is actually pretty simple: How to create your own image from scratch. Here is how it goes. Start with the interpreter simulator and a (literally) empty object memory. Read a series of class definitions (you can use either MC class defs or simply parse simple class definitions from sources) that are sufficient to define all of the kernel structures that are required by the running VM (incl. Object, Behavior, Class, Integer, Array, Process, CompiledMethod, ContextPart, Semaphore etc. etc. etc.). Create those by calling the allocators explicitly and set them up such that the structure is correct (format, superclasses, metaclasses etc). Create nil, true and false based on these definitions. At this point we have a skeleton of classes that we can use to instantiate all behaviors required by a running image. Next, make a modification to the compiler that allows one to create a compiled method in the simulator from a MethodNode (which should be straightforward since the simulator exposes all of the good stuff for creating new objects and instances). Now we can create new compiled methods in the new image as long as they don't refer to any globals. Next, find a way of dealing with two issues: a) adding the compiled method "properly" (e.g., deal with symbol interning and modifying MethodDictionaries) and b) global name lookups performed by the compiler (since the image is prototypical we can't have it send actual messages; not even simulated ones ;-) The latter issue is the only one that doesn't seem completely obvious which is why I would advocate that a bootstrap kernel mustn't use class variables or shared pools (in which case the lookup is again trivial since you know all the possible names from compiling the original structure). Now we can load all the source we want to be in our bootstrap image. Lastly, do the bootstrap: Instantiate the first process, its first context, the first message. Run it in the simulator to set up the remaining parts of the kernel image (Delay, ProcessorScheduler etc). Voila, at this point we have a fully functioning kernel image, created completely from first principles. Once you have the kernel image there is no end to the fun: Since you can now start sending messages "into" the image (by way of the simulator) you can compile any code you want (incl. pools and class vars) and lookup the names properly by sending a message to the interpreter simulator. And then you just save the image and are ready to go. Anyone interested? Cheers, - Andreas PS. Oh, and I'd be also interested in defining a good interface to do this by means of Hydra, i.e., instead of having to run the simulator run the compiled VM on an "empty image" to do all of this "for real" instead of in the simulator. |
Hi Andreas-- > Anyone interested? Of course. :) I tried this approach in 2003. There ended up being so much trial and error producing something that the VM would run that I decided to try the "one last shrink" approach first (with a network interface for adding already-compiled methods retained, since I wanted to end up with that in there anyway). In particular: > ...do the bootstrap: Instantiate the first process, its first context, > the first message. Run it in the simulator to set up the remaining > parts of the kernel image (Delay, ProcessorScheduler etc). I guess I found that last sentence easier said than done. :) But I figured that once I'd proven what those essential components were, I would come back to this approach, now that I knew what I was doing. :) I don't even think the global lookup issue is all that hard. (I advocate having no system dictionary; keep each class literal in the name slot of the class itself, and have a means of traversing the class tree from a well-known starting point. Scanning through memory for the class and pool literals you want is straightforward.) -C -- Craig Latta improvisational musical informaticist www.netjam.org Smalltalkers do: [:it | All with: Class, (And love: it)] |
In reply to this post by Andreas.Raab
On 7-Jul-08, at 7:49 PM, Andreas Raab wrote: > Voila, at this point we have a fully functioning kernel image, > created completely from first principles. > > Once you have the kernel image there is no end to the fun: Since you > can now start sending messages "into" the image (by way of the > simulator) you can compile any code you want (incl. pools and class > vars) and lookup the names properly by sending a message to the > interpreter simulator. And then you just save the image and are > ready to go. > > Anyone interested? > > PS. Oh, and I'd be also interested in defining a good interface to > do this by means of Hydra, i.e., instead of having to run the > simulator run the compiled VM on an "empty image" to do all of this > "for real" instead of in the simulator. Interesting approach. It seems a little more complicated than Alejandro's "gestation" approach, though. With gestation, the child image is created *inside* the parent image, carefully avoiding out-pointers. Then the child image is written out to disk with a system tracer. Something like this is going to be full of subtleties, so maybe simulation offers benefits that gestation doesn't. Any thoughts? Colin |
In reply to this post by ccrraaiigg
Craig Latta wrote:
> > > ...do the bootstrap: Instantiate the first process, its first context, > > the first message. Run it in the simulator to set up the remaining > > parts of the kernel image (Delay, ProcessorScheduler etc). > > I guess I found that last sentence easier said than done. :) It doesn't strike me as particularly difficiult but then I haven't tried it yet. What problems were you running into? > I don't even think the global lookup issue is all that hard. (I > advocate having no system dictionary; keep each class literal in the > name slot of the class itself, and have a means of traversing the class > tree from a well-known starting point. Scanning through memory for the > class and pool literals you want is straightforward.) Actually, true *globals* (which for the bootstrap means only classes) are trivial to deal with: Since the skeleton is created first you have the oops for all the globals right there and then, so setting up a mapping that the compiler uses for these globals is utterly trivial. It's class and pool variables that are tricky because even Dictionary>>at: may not exist in the image that you're trying to compile so doing the lookup directly in there would be quite tricky. It would be doable if one assumed a particular organization of the classes (i.e., the n-th iVar is the dictionary of class vars) and then interpreted it externally but it seems like an unnecessary complication for an initial bootstrap. Cheers, - Andreas |
In reply to this post by Andreas.Raab
On 7-Jul-08, at 7:49 PM, Andreas Raab wrote: > > > Start with the interpreter simulator and a (literally) empty object > memory. [snip] Well as Craig mentioned this is pretty much what he was talking about in 03 or so. And Alejandro was of course doing his Fenix project where the build happened in 'live' memory. And I did it *by hand* - in *hex* - in 1987 for a suite of unit tests for an ARM assembler VM. It's a very plausible idea. I think doing it in a simulated OM is better than the Fenix technique precisely because it is totally contained and no leaks are possible. Craig's visualisation work includes some tools for handling images in simulatio that ought to help. It ought to be possible to build the image foetus and analyse it remotely before ever risking running the simulator stepper. And of course it is all repeatable until you get it right, at which point you snapshot. > Now we can create new compiled methods in the new image as long as > they don't refer to any globals. Why not build a globals dictionary into the new image? Surely it wouldn't add much to the effort? You've already had to add Arrays, go one step further for Dictionary. Pool vars would work with our 'new' pool dictionary mechanism (how many years ago was that?). Maybe globals can be implemented in a similar fashion. > > Anyone interested? Duh. Of course. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim A hacker does for love what others would not do for money. |
In reply to this post by Andreas.Raab
> > > ...do the bootstrap: Instantiate the first process, its first > > > context, the first message. Run it in the simulator to set up the > > > remaining parts of the kernel image (Delay, ProcessorScheduler > > > etc). > > > > I guess I found that last sentence easier said than done. :) > > It doesn't strike me as particularly difficiult but then I haven't > tried it yet. What problems were you running into? Well, it just wasn't clear what all should go into the target object memory. That's what I consider the meaningful "first principles" here. Certainly the Fenix team had an idea of this for particular dedicated applications, but I'm after what's needed for a memory that can become a full development environment, without any further assistance from the tools that created it. (This is why I started from one.) > [globals] Ah, I misunderstood you. thanks, -C -- Craig Latta improvisational musical informaticist www.netjam.org Smalltalkers do: [:it | All with: Class, (And love: it)] |
In reply to this post by Colin Putney
Colin Putney wrote:
> It seems a little more complicated than Alejandro's "gestation" > approach, though. With gestation, the child image is created *inside* > the parent image, carefully avoiding out-pointers. Then the child image > is written out to disk with a system tracer. > > Something like this is going to be full of subtleties, so maybe > simulation offers benefits that gestation doesn't. Any thoughts? I don't know enough about that "gestation" approach. How does it work? Can I try it? It doesn't seem immediately obvious to me what the process is like - as an example, where do nil, true, false come from and how are they made to be instances of the class that is inside the child image instead of the parent? How is process initialization handled? If anyone has ever produced an image with the approach I'd be curious to hear more about the experience, what was easy and what was difficult. Cheers, - Andreas |
In reply to this post by Andreas.Raab
On Tue, 08 Jul 2008 07:19:45 +0200, Andreas Raab wrote:
> Craig Latta wrote: >> > ...do the bootstrap: Instantiate the first process, its first >> context, >> > the first message. Run it in the simulator to set up the remaining >> > parts of the kernel image (Delay, ProcessorScheduler etc). >> I guess I found that last sentence easier said than done. :) > > It doesn't strike me as particularly difficiult but then I haven't tried > it yet. What problems were you running into? > >> I don't even think the global lookup issue is all that hard. (I >> advocate having no system dictionary; keep each class literal in the >> name slot of the class itself, and have a means of traversing the class >> tree from a well-known starting point. Scanning through memory for the >> class and pool literals you want is straightforward.) > > Actually, true *globals* (which for the bootstrap means only classes) > are trivial to deal with: Since the skeleton is created first you have > the oops for all the globals right there and then, so setting up a > mapping that the compiler uses for these globals is utterly trivial. > It's class and pool variables that are tricky I suggest that vars are the same as in the "blueprint" classes (thank you Michael for mentioning that term elsewhere :) so, the methods can be compiled against ordinary classes which serve as blueprint. And globals can be in a single Sharedpool for the time of creating the image from first principles (see my patch to SharedPool>>#bindingOf:). Having said that, globals could alternately be treated as if they where class variables of Object blueprint. Both variants can avoid having a Smalltalk global dictionary. > because even Dictionary>>at: may not exist in the image that you're > trying to compile so doing the lookup directly in there would be quite > tricky. It would be doable if one assumed a particular organization of > the classes (i.e., the n-th iVar is the dictionary of class vars) and > then interpreted it externally but it seems like an unnecessary > complication for an initial bootstrap. Even when doing the blueprint trick, what remains to be done is simulating additions to the method dictionary. > Cheers, > - Andreas > |
Andreas, the principles you have described is exactly the same, how i
doing bootstrap object memory in CorruptVM :) First, i even having a special methods SomeClass>>defineIn: anObjectMemory. for each interesting classes. But then i removed them , putting all code in CVMachineSimulator. Its because to use these methods from where they located, i would need to create new instances of these classes/or class objects in host image (which holds simulator/bootstrapper). I done this mainly to avoid invoking unwanted code (who knows what #new in some class does). But , in fact i think it would be better to keep #defineIn: for each class. It just needs some more grounded thought. And for creating intances of some class, we can do similar: SomeClass>>defineInstanceIn: anObjectMemory arguments: array -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Andreas.Raab
Dear Andreas,
I have used the gestation technique with success in some situations I need to change shape of core classes and to build new smalltalk platforms before a VM implementation complete (we parasited SqueakVM and generate Squeak compatible images to be run and tested in customized squeakVM while high perfomance VM was in development). We used a refactored(modified) version of System Tracer to build the image, and once the image is built you can run it with a valid VM (e.g. a modified squeakVM) or with a simulator as usual. The system tracer was adapted to cut/adapt the links from host image when saving the image of gestated system. It was very simple and easy to build a image by gestation. We used fileIns in log format but with a minimal change to let us evaluate chunks in the context of the builder instead of compile chunk in "UndefineObject", to make it possible to express in chunk format the booting expressions of the image in gestation; doing that way, you do not need to express any behavior/code in the host machine, all code that guides the boot process is read from smalltalk files. We also build some modified browsers and inspectors to let us test & modify an image during gestation (before saving the image) and to fileOut changes... We also built some tools to trace activation of methods (in the VM) and run them while the image "is in the womb" tracing the methods that are required to run... and we implemented a facility to add the classes and methods activated from hosting image (a kind of nutrition from parent while in gestation). I has put some (old) resources in my wiki, there is also an image there to play with. The image is an old Sq 1.8 or so. I did not made efforts to port the work to new versions of Squeak, but I know that in the past Edgar has used the framework for his work (or at least for learning with it) and provably he has something near the current version. The URL of my work is http://www.aleReimondo.com.ar/ImageGestation all the best, Ale. ----- Original Message ----- From: "Andreas Raab" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Tuesday, July 08, 2008 2:40 AM Subject: [squeak-dev] Re: Creating an image from first principles > Colin Putney wrote: >> It seems a little more complicated than Alejandro's "gestation" approach, >> though. With gestation, the child image is created *inside* the parent >> image, carefully avoiding out-pointers. Then the child image is written >> out to disk with a system tracer. >> >> Something like this is going to be full of subtleties, so maybe >> simulation offers benefits that gestation doesn't. Any thoughts? > > I don't know enough about that "gestation" approach. How does it work? Can > I try it? It doesn't seem immediately obvious to me what the process is > like - as an example, where do nil, true, false come from and how are they > made to be instances of the class that is inside the child image instead > of the parent? How is process initialization handled? If anyone has ever > produced an image with the approach I'd be curious to hear more about the > experience, what was easy and what was difficult. > > Cheers, > - Andreas > > |
In reply to this post by Colin Putney
Hi,
> Something like this is going to be full of subtleties, so maybe > simulation offers benefits that gestation doesn't. Any thoughts? As I know simulation works on an image (a snapshot). One of the problems in Andreas proposal (as I understood) is that you can´t work on the system you are building while it does not has the minimal tools (e.g. to send messages, or to add/remove/debug methods...) Doing by gestation you can use all th etools of the hosting system, because both systems are "working" and all the objects the system in gestation are defined as normal objects in the host system. The system in gestation is part of the parent system and all the machinery and tools are normal (but modified) tools of the parent. cheers, Ale. ----- Original Message ----- From: "Colin Putney" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Tuesday, July 08, 2008 1:41 AM Subject: Re: [squeak-dev] Creating an image from first principles > > On 7-Jul-08, at 7:49 PM, Andreas Raab wrote: > >> Voila, at this point we have a fully functioning kernel image, created >> completely from first principles. >> >> Once you have the kernel image there is no end to the fun: Since you can >> now start sending messages "into" the image (by way of the simulator) >> you can compile any code you want (incl. pools and class vars) and >> lookup the names properly by sending a message to the interpreter >> simulator. And then you just save the image and are ready to go. >> >> Anyone interested? >> >> PS. Oh, and I'd be also interested in defining a good interface to do >> this by means of Hydra, i.e., instead of having to run the simulator run >> the compiled VM on an "empty image" to do all of this "for real" instead >> of in the simulator. > > Interesting approach. > > It seems a little more complicated than Alejandro's "gestation" approach, > though. With gestation, the child image is created *inside* the parent > image, carefully avoiding out-pointers. Then the child image is written > out to disk with a system tracer. > > Something like this is going to be full of subtleties, so maybe > simulation offers benefits that gestation doesn't. Any thoughts? > > Colin > > |
In reply to this post by Andreas.Raab
On Mon, Jul 7, 2008 at 7:49 PM, Andreas Raab <[hidden email]> wrote: Folks - I don't understand why this is difficult. Here's how I think it works. Every time the compiler to simulated objects creates an object that is a global it also creates an association for the global in the simulator's heap and adds the global to a suitable scope dictionary it maintains. So it maintains shadow scopes for Smalltalk (or nemaspaces when we have them) and class pools etc. Then the scope lookup mechanism uses these scopes when compiling methods. Lookups for globals will find the right associations even though the dictionaries holding those associations don't yet exist in the simlulator's heap. Once enough of the bootstrap is complete the compiler can then create the globals (Smalltalk, non-empty class pools) and populate them using the associations. The creation and hashing of the dictionaries is done by the simulator, but the compiler generates the invocations of the dictionary creation code using sequences of associations it extracts from its shadow scope dictionaries. Now we can load all the source we want to be in our bootstrap image. Oh no. No. Not at all. Not in the least. No, really, no. Um, ah, no. Cheers, |
In reply to this post by Klaus D. Witzel
On Tue, Jul 8, 2008 at 12:08 AM, Klaus D. Witzel <[hidden email]> wrote:
In the image (not in the simulator) create proxy objects for the selectors in the simulator heap that answer the identity hashs of the objects in the simulator hash. Create a method dictionary in the image populated with the proxy objects. The order of the objects in the image method dictionary is the correct order for the method dictionary in the simulator. In general if you want to perform a computation on objects in the simulator heap before you've bootstrapped the code to perform the computation, create proxy objects in the image and run the computation on them. Right? |
In reply to this post by Andreas.Raab
On Mon, Jul 7, 2008 at 10:19 PM, Andreas Raab <[hidden email]> wrote:
Oops. Late to the party. Ignore my earlier post. Yes, its trivial. |
In reply to this post by Klaus D. Witzel
On Tue, Jul 8, 2008 at 12:08 AM, Klaus D. Witzel <[hidden email]> wrote: In the image (not in the simulator) create proxy objects for the selectors in the simulator heap that answer the identity hashs of the objects in the simulator heap. Create a method dictionary in the image populated with the proxy objects. The order of the objects in the image method dictionary is the correct order for the method dictionary in the simulator.
Let me try again (clearly too early in the a.m.) In general if you want to perform a computation on objects in the simulator heap before you've bootstrapped the code to perform the computation, create proxy objects in the image and run the computation on them. Right? |
In reply to this post by Andreas.Raab
At Mon, 07 Jul 2008 19:49:18 -0700,
Andreas Raab wrote: > > Anyone interested? Sure. Even just getting the kernel part going will help us a lot. I wrote a much less precise ideas but using InterpreterSimulator is a good one. http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129571.html http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-June/129554.html -- Yoshiki |
<newbie alert>
Ok, I have a question.. What is this for? The idea kind of makes sense to me, but I am not sure I understand what to use it for, which is i guess why I don't get it :( Sorry for interupting you high-level discusion with my simple question ;)
On Tue, Jul 8, 2008 at 2:36 PM, Yoshiki Ohshima <[hidden email]> wrote: At Mon, 07 Jul 2008 19:49:18 -0700, -- David Zmick /dz0004455\ http://dz0004455.googlepages.com http://dz0004455.blogspot.com |
Am 08.07.2008 um 21:21 schrieb David Zmick: > <newbie alert> > Ok, I have a question.. What is this for? The idea kind of makes > sense to me, but I am not sure I understand what to use it for, > which is i guess why I don't get it :( Sorry for interupting you > high-level discusion with my simple question ;) http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-May/128753.html - Bert - |
On Jul 8, 2008, at 12:35 PM, Bert Freudenberg wrote: > > Am 08.07.2008 um 21:21 schrieb David Zmick: > >> <newbie alert> >> Ok, I have a question.. What is this for? The idea kind of makes >> sense to me, but I am not sure I understand what to use it for, >> which is i guess why I don't get it :( Sorry for interupting you >> high-level discusion with my simple question ;) > > > http://lists.squeakfoundation.org/pipermail/squeak-dev/2008-May/128753.html That's one use. Another is to build up small images so that many images can be launched within HydraVM without using up too much memory. Josh > > > - Bert - > > > |
In reply to this post by Bert Freudenberg
On Tue, Jul 8, 2008 at 12:35 PM, Bert Freudenberg <[hidden email]> wrote:
That's a reason. But being able to create a system from sources provides more benefit than that. - it allows more speed and flexibility in deciding how the system is implemented since the bootstrapped system doesn't necessarily have to be an evolution of the current system. - it is much easier to build systems up from components than find out what can be taken away to keep something running, so producing a minimal deployment of some application is much easier using a system built from the ground up. - a system built in this way doesn't have to be a complete Smalltalk. It can, for example, omit a compiler, or omit reflective parts of the system without which one couldn't inspect, make modifications to the class hierarchy, decompile, etc, etc. So a system built from teh ground up is much more easily made secure. But the most compelling reason for me is to do with the system's architecture. The Smalltalk system should be architected as an onion, each layer of the onion being composed of a set of components (like techtonic plates). If it is architected like this, with components at the centre not using anything in outer layers (*) then removing components becomes much easier. So the normal mode of development is to develop an application in a large well-facilitated development image. Once developed the programmer clones the image and unloads the components they think they don't need and tests the resulting application. That differs from the current strippng approach in that one is removing coarse-grained components with well-understood functionalities and boundaries instead of trying to infer the subset of code still used by the system. So starting from a system that is built from the ground up should enable Squeak to evolve into a properly modular system. (*) I am assuming that components do include class extensions so that e.g. when one adds the "Tools-Inspector" component it does extend classes in the kernel layers beneath it with appropriate extensions to permit their inspection.
|
Free forum by Nabble | Edit this page |