On Feb 11, 2008 11:42 AM, tim Rowledge <[hidden email]> wrote:
But the execution of that block (a BlockContext?) will be added to the top of the current execution stack thus preserving the linearity of the stack, right? At this stage, I'm not sure what "stack linearity" is either... I'm assuming that Paul was referring to a stack being a linked list rather than a tree? Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
On 10-Feb-08, at 2:53 PM, Michael van der Gulik wrote:
A stack is linear last-in first-out. In C you branch to a subroutine, a stackframe is built on the stack and you excute code based on that. When you return from the subroutine all the memory in that stackframe is known to be free to reuse immediately if wanted by the next subroutine call. If you can get a handle to the actual stackframe and pass it to some other code then you *cannot* reuse that memory until you have some way of knowing it is no longer needed. So now you have a sandbar in your stack and a tricky problem to solve. In squeak we do it the simple minded way and don't have contiguous stack but instead have actual explicitly allocated objects in the heap. That completely avoids the problem, at a cost in performance. VW uses a lot of very clever code to allow the system to live with a linearised machine compatible stack and yet present a 'proper' object to the programmer. It works, it's fast, but the complexity is at times bewildering. I'm reasonably sure this is explained in the Blue Book and certainly in various seminal papers such as the Deutsch-Schiffman 84 paper 'Efficient implementation of the smalltalk-80 system' (http://portal.acm.org/citation.cfm?id=800542) and the Miranda '87 paper 'BrouHaHa- A portable Smalltalk Interpreter'. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim A hacker does for love what others would not do for money. |
On Feb 11, 2008 12:29 PM, tim Rowledge <[hidden email]> wrote:
Oh - that's what he meant by a linear stack - stack frames are contiguous in memory. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
In reply to this post by Igor Stasenko
Igor-
You suggested "enable multiple versions of same package in same image and keep track of package dependency". That's been an inspirational suggestion for me, and I've been thinking about how to implement it for a Squeak/JVM. I don't have a definite solution yet, but here are some thoughts on it. I feel it may come down to either picking one of two paths. We could make a complex system for supporting multiple global system dictionaries (or the equivalent) to allow multiple applications with different dependencies to live together in one memory image. That's really just an extension of the status-quo in some ways, packing ever more stuff into one bigger and bigger image. Or, we can break the monolithic image into small images which each just support one application well (call them "mini-images"). Each mini-image might in turn depend upon some other common mini-images for defining common classes. This alternative would probably require Spoon-like http://netjam.org/spoon/ remote development and remote-debugging support to work best (but it doesn't absolutely have to, as there easily could be a development tools mini-image included by reference even in the tiniest mini-image). Personally, I think the second approach is ultimately simpler and more elegant, and does a better job of bringing Smalltalk forward in a now network-oriented world. See: "Principles of Design -- Tim Berners-Lee " http://www.w3.org/DesignIssues/Principles.html "Principles such as simplicity and modularity are the stuff of software engineering; decentralization and tolerance are the life and breath of Internet." You may well know all these issues, but I just thought I'd put it down for others comments as I understand it (in case I was wrong or missed something). Probably I'll have outlined some approaches people here know about already created for Squeak or other systems, and anyone should feel free to point me to them. Anyway, feel free to stop reading here, but what follows is more details on how I came to think about this and arrive at those two possible paths. =============== how it is now, and a simple approach The biggest aspect of this is resolving globals. For review, if I recall correctly this is traditionally done in Squeak by the VM knowing about a SystemDictionary called "Smalltalk" (the VM needs to know about it absolutely to resolve a circular dependency of not being able to look up the global "Smalltalk". :-). When a CompiledMethod being executed does something like make a new instance of a class, it fetches the current instance (typically of a class) associated with the name of the global and sends it a message or stores it in a variable. Using named globals allows late binding of classes by the compiled method. If you didn't care about late binding, like in Forth referring to a previously defined word, you could just make a hard link as a pointer to the class at compile time in the compiled method. But then you could not replace or remove the class in its entirety later. There is room for only one version of a class at a time this normal way -- just one key in the Smalltalk system dictionary with one value. The simplest way around this might be to have system dictionary values for keys be dictionaries. Then you could tag each item with a version. But the executing code would still need to resolve which one it wanted. And I don't see how that would be easy. But maybe it might be? And then there is a deeper problem related to composites of objects which might include instances pointing to two or more different versions of the same class. But we can ignore that for now. :-) == A deeper analysis (or, "owww, my brain hurts". :-) Python has a straightforward way to resolve this -- it supports a sea of objects, and when you load code, the old classes get overridden in the equivalent of a system dictionary with new classes, but the existing instances still point to the old classes so those still hang around but are not accessible by name. This makes it difficult to do development in a live system, and you end up issuing special code to load things in differently (not making new classes) if you want to do Smalltalk-style dynamic development. But there is no reason you cannot simply load two version of the same module (source file) and hang on to them somehow. Squeak could certainly do something similar if it had modules or classes which could exist without names. When I try to generalize this global idea, there are other approaches. In PataPata (in Python/Jython, trying to retrofit them with Squeak-like capacities) I gave each object (typically Morphic-like GUI components) a "world" instance variable. That pointed to what was essentially the equivalent of a Smalltalk system dictionary to store globals or key functionality. In practice, each major window was in its own world, although that wasn't strictly required. Then I could have several worlds in the same process, where each was somewhat self-consistent. But objects could still slip from one to another, typically when opening an inspector//browser tool (itself in its own world) on another world and maybe copying an object from one place to another. Beyond globals, another reason for each object to have a pointer to its "world" was that when I serialized a world I just wanted the objects from that world to be written out and no others, so I could check that pointer to make sure the serialization wasn't wandering into writing out objects from other worlds (I didn't pursue the concept of nested worlds, which might have been possible). I was planning to use unnamed references to parents from prototypes (for inherited behavior and constants) in PataPata, based on how Self did prototypes and links, but I decided in the end to reference prototypes representing parents by by name, for the purpose of documenting intent. But that left a global lookup problem, resolved by having *every* prototype have a "World" pointer. And there were predictable problems when worlds pointed to themselves which I had to work around (especially when loading worlds). [Self has a fancier way of getting names for unnamed prototypes I did not want to try pursuing based on determining paths from a root.] Anyway, generalizing on this "object-focused late binding lookup" approach, objects can point to a global system dictionary, or they can point to other objects in some consistently structured way (typically "parent" or "container" or "class") which might in turn allow a path to find a global (that process might even percolate up and then back down, say to *search* for an object with a certain value; I supported this in PataPata to find widgets with a certain name in the same window as a widget executing some behavior). But there is another way to do this, which is to have the thread, process, stack frame, or virtual machine hold onto a global system dictionary object somehow. This is closer to how Squeak does it with a system dictionary, except there might be one system dictionary per process or thread or stack frame. The difference is that the entity executing the code knows where to look for globals even if the objects being used for executing code do not (which presumably saves on memory, and provides a more consistent notion of what versions of classes a process want to see, assuming that is a good idea :-). In a most extreme case, the user running the program might know the object ID or memory location of the global system dictionary and pass it in as needed (this might happen in a debugger session). I might call this an "execution-focused late binding lookup" approach. For completeness, there is another approach which is to have globals stored in relation to the memory where the objects are stored (or processes executed) if memory is partitioned somehow. So if you have an object or process memory location, you can find the global system dictionary that goes with it by looking somewhere special in that memory chunk (beginning, end, standard offset). Deep in the reality of a virtual machine, it might even be using this approach in various ways (like making sure the pointer to the system dictionary is, say, the first handle in an object memory table). Probably someone who has a PhD in computer science could tell me the proper terms for these approaches towards late binding? :-) And of course, you can use more than one at a time. NewtonScript, for example, found variables by having two different types of lookup, based on a parent slot and visual containment. Maybe you could use all of the approaches at once in some system just for fun. I don't think I'd want to debug anything in it through. :-) Anyway, this doesn't answer how specifically to do what you propose, but it does suggest some possible points of intervention -- mainly instances or processes. But this leads to a deeper point. A Smalltalk VM (or any OO VM system like it, like the JVM objects or Python objects) has problem with multiple global objects if objects sharing the same VM in different global spaces can point at each other directly. Essentially, if you can have multiple global system dictionaries, you end up in a situation where an object from a "module" in one set of interconnected versions of modules can be reference by an object in a "module" in another interconnected set of different module versions. At that point, what governs the objects behavior, specially late binding lookup of globals? Should it be governed by the module the object came from? Or should it be governed by the module which it is now connected to? Or should it be governed by the process executing and calling a method of the object (and that process might lookup its globals in yet another way)? And similarly, when you absorb an instance form another module, should its class still point to the old class or should it point to the class in the new module? In general, this issue is a variant of a deeper problem related to OO: http://mail.python.org/pipermail/edu-sig/2007-April/007852.html as I feel the idea, that objects can stand alone and be somehow meaningful, is at the root of a lot of evil in the Smalltalk universe (e.g. "bitrot". :-) Anyway, just from random comments here over the years, I get the feeling that in their hearts the original Squeak Central people (Dan Ingalls especially) understand this and use heavily customized images in practice as coherent wholes, but perhaps they have never had the time to generalize this idea to a philosophical principle. Certainly just fighting for objects at all, as well as messages and VMs and good tools must have taken up lots of energy. Part of this issue may depend on whether you think of an object like a single-celled creature like an Amoeba, or whether instead you think of an object as part of a biological entirety, like as a protein molecule in a cell, or a highly regulated cell in a large multi-cellar entity. If objects can't meaningfully stand alone, then it seems like we need some coherent philosophical approach to how they fit together into modules or images. Loading multiple versions of the same classes seems to strain this possible coherence, as useful as it might be. It's not that it won;t work, it's just that the mental complexity starts increasing to the point where you may have to be really clever (and really alert) to keep track of it all. :-) === two competing approaches Because of all these difficulties and complexity, I'm inclined to lean towards suggesting that images should be smaller, :-) and a VM's could either be lightweight or perhaps could support multiple open images at once. Then you can load one version of a module into a larger set of other modules, and maintain that set for one application. This total image defines an ecology of objects, and the objects and their classes all make sense in relation to each other (as well as whatever I/O they choose to do through the VM to the rest of the world). This is sort of like a living cell. And you could then load a different version of code modules into another *different* image and maintain that set for a different application. And when these applications want to communicate, it will be from one image to another, through their different VMs, presumably via sockets or shared memory or files or whatever, via some common serialization process. There are already several approaches for distributed objects in Smalltalk, so I doubt this will be much of a problem, and the JVM and Java offer other possibilities for remote procedure calls and such. I think that a minimal image ("mini-image") approach might come closest to bringing some sanity to the idea of personal images (like Dan Ingalls seems to like). Every image would be a custom mix of module versions and hacked up base class code. The image would know with a little developer help which objects belonged to which modules. To help with this, one would need easy tools to export module versions and configurations. An important aspect of such an approach might be Spoon-like remote debugging, and remote development of minimal images so you could have, say, one image open with your favorite debugging tools and over a socket just plug those tools into other images you wanted to modify or debug; this isn't strictly necessary -- but conceptually it makes things more elegant, especially since then the development tools can have different versions of base classes than thee system being debugged or developed. I get the feeling the Squeak ecosystem has most of the parts of all of this, they just haven't been all put together and polished toward this end. Still, for the JVM, which is what interests me right now, all the objects do live in one world, and the JVM has a big memory footprint. So, given memory footprint and startup time, even with the newer JVM's sharing some memory across VM instances, I think we might have to end up living with multiple system dictionaries in one JVM unless JVMs improve further? Or maybe if we discover they are good enough now? In that case, I end up wondering if a "world" instance variable added to every underlying Java object is such a bad idea after all. :-) Or the alternative of a "world" instance variable stored in each thread (or process) is also possible. Of course, globals are rarely looked up, so more indirect ways of storing them might be more efficient trading off time for memory. So this is a second alternative approach which is closer to the direction you outline. == best solution long term? After considering two paths in the previous two paragraphs, I think using lightweight images with only one system dictionary are a better way to go long term. They are just simpler and already well understood. If you, say, want a little clock up on your screen implemented in Squeak (instead of Lively Kernel :-), you just have a clock image. Ideally, that's all it does -- it's a clock. If you want to inspect the clock, you fire up your development image in another JVM and connect to that clock JVM (maybe using a universal debugging registry service). Maybe your development image even gives you a copy of the image of the clock window with drag-and-drop overlays on another screen. Or it might put annotations over the original window by temporarily inserting a "glasspane" if the clock application was using Swing widgets, or by the usual Squeak ways if the Clock application used Morphic widgets. To save space and maybe help with upgrades, perhaps the Clock application image depends on another larger base image. I did that in PataPata where worlds could require other worlds to be loaded first. Since I stored images as textual Python code which could rebuild a world of objects procedurally, that worked out OK. Here is an example of simple PataPata world; I would expect a Squeak clock image built in a similar fashion would be about the same tiny size and also written out as textual source: http://patapata.svn.sourceforge.net/viewvc/patapata/tags/PataPata_v204/WorldDandelionGarden.py?revision=315&view=markup (One fudge, the bitmap was store outside the image in a file.) Note the line: world.worldLibraries = [world.newWorldFromFile("WorldCommon.py")] which is what defines the other worlds this world depends on. So, for Squeak, this would be like saying your small image depends on other images which load first. Obviously you have to have any supporting images around or you can't load your dependent one, but for the most part you just typically depend on common downloaded images. If images are stored as text (essentially, a Smalltalk program needed to rebuild the image) dependencies are a lot less scary since you could always just go in and start cutting and pasting in a text editor (but hopefully there would be better tools for this). How to track and merge changes to base classes in supporting images is obviously an issue, and it is not one PataPata tried to solve (beyond the fact that prototypes made it easy to override base class behavior for most things). But, since at runtime the supporting packages will be loaded, you can easily modify it in the live image and then write out a modified version of the base image again with a different version number, and hope somebody down the road can reconcile your changes if you want them to move forward with the supporting image. In this lightweight approach, images might also become modules stored in some source code repository if desired, or really, they might become more like (ENVY-ish?) configuration maps on top of available stored modules. So, to try to provide an example, you might save your running Clock image as module Clock-1.1.4 which also depends on BaseClasses-3.4.2. (This would require a worldwide way to identify Squeak modules uniquely.) Of course you might not store Clock-1.1.4 on a server; it might be stored on a local drive (perhaps in a Jar file, leading to Java classpath problems, but nothing is perfect :-). You might open up Clock-1.1.4, modify it using Spoon-like remote tools, and maybe even save it back under the same version number if no one else depended on it (perhaps with an automatic minor sequential internal revision number bump just in case). These names and version numbers might also be more like human readable suggestions than absolutes -- for example each "image" "module" could have a unique UUID (plus perhaps save sequence) and dependencies could be expressed as lists of acceptable UUIDs as well as names, with some sort of sophisticated matching algorithm to trying resolve dependency issues and search for modules various places. For this Clock example, when you work on the clock you might pull up another image of development tools (browser, debugger, inspector, and so on). But the versions of these (or the base classes they depend on) don't really matter to the clock application. All that matters is that somehow the two JVMs (or JVM processes) agree on how to talk to each other to add new methods, return results, single step code, follow object references, and so on. Presumably one could have a fairly standard protocol for that -- maybe even an extensible one (perhaps Spoon has this?). Let's say something odd is happening with the Clock. You want to see how an older version works. Well, you just open up that older clock image. Then you might even open up a "image comparing" utility image :-) which lets you connect to both the running Clock images simultaneously and compare versions of all the classes looking for differences. Still unsatisfied, maybe you clone the older image (to start a third clock running) and bit by bit copy classes or modules from the new image to the copy of the old until you find where the clock starts to behave oddly. Then you make a change (remotely) to the first clock image and see if it fixes the problem. Perhaps it turns out your code is perfect but the anomaly is due to a really deep problem in code supporting Squeak/JVM -- so you drop down a level conceptually and pull up a JVM debugger image, or maybe even just Eclipse, :-) http://www.eclipsezone.com/eclipse/forums/t53459.html connect to the JVM supporting that Clock image directly, and start swearing as you try to figure out what the Squeak/JVM maintainers did wrong this time. :-) If you wish, all of your actions with the multiple Squeak-ish VMs could have been logged to some common history repository somewhere to replay the entire multi-VM development session back to everyone who doesn't believe you that it's a JVM level issue. :-) Presumably one could build testing tools for this architecture as well. And Squeak in C could go down this mini-image route too. As I think a little more about this, I am still perhaps stuck with the problem that even in these mini-images, there would need to be some way to link specific objects back to specific modules so a modified module could be written back out with all its related objects. This is because a mini-image is not just code, it is code plus live objects. And so when objects are created, they would have to be assigned somehow to a specific module or source mini-image. So, perhaps this mini-image solution needs to have a "world" field (or "module" or "segment") in every object anyway, just so the modified objects can be written back out into the right mini-image or module? Or, if this was implemented in C, the image would be carved up into memory segments, with new objects allocated to the chunk of memory going with the specific min-image that was loaded. Squeak already has an image segment effort: http://wiki.squeak.org/squeak/1213 "ImageSegments and project swapping are still in the experimental stage" But it is binary, not textual source. And it is based on specific roots, not some sort of tag for each object. I guess both might take about the same amount of space -- instead of tagging each item with its segment (world), you have a big array which points to each object in the segment. Maybe you might want both? So objects know their segment and segments know their objects? And I find it a little amusing I am putting up windows in PataPata defined by textual mini-image files of 3688 bytes (assuming a bitmap loads off the network or from a local file :-) while they are talking about binary image segments of 10s of megabytes. And as I read more on modular Squeak, I'm realizing that with mini-images the idea of a "project" would probably go away entirely. And any tool which compared mini-images would have to have some way of representing objects in two different mini-images so it could look for similarities and differences. At the very least, maybe like Les Tyrrell's OASIS project: http://wiki.squeak.org/squeak/1056 But there is a big difference between loading representations of objects (instances or their classes) to look at them and loading objects to use them. Anyway, no easy solution. But I still think this second mini-image approach is simpler conceptually than attempting to keep different versions of the same things in the same VM. Both are possible, of course. === Anyway, maybe someone reading this might have a better suggestion or a better (simpler, clearer) way of looking at this issue. --Paul Fernhout Igor Stasenko wrote: > Ken Causey wrote: >> [snip] >> Within this community I've come to feel that the only day to day >> practical solution is to do it and then ask for forgiveness when it goes >> all pear shaped (badly). Of course when that happens it really helps >> when it is something that can be readily reversed with no harm done. >> And that's where it seems we have a problem because the current release >> management schemes don't well-support removing something readily and in >> such a way that few if any are inconvenienced. I don't have a ready >> solution to that, it is something I find myself thinking about more and >> more. >> [snip] > > There is a solution: enable multiple versions of same package in same > image and keep track of package dependency. > So, when you loading an updated package, all code which worked before, > continues to work in same way as it was before. > We need a way to be able developer to choose, what parts of system can > use new version and what should use older version due to > incompatibility reasons by simply checking dependencies and updating > dependency links. > > Also, this would help a lot in maintaining packages: a package author > can easily keep track of his package dependencies, and may or may not > wish to release his package with updated dependencies, which use > latest versions of packages, his package depends from. > > Of course, this is somewhat idealistic, and there is many caveats, but > if done well, will allow us to mix things without fear that something > will not work due to incompatibilities. |
In reply to this post by Michael van der Gulik-2
Michael van der Gulik schrieb:
> > > On Feb 11, 2008 12:29 PM, tim Rowledge <[hidden email] > <mailto:[hidden email]>> wrote: > > > ... > > A stack is linear last-in first-out. In C you branch to a > subroutine, a stackframe is built on the stack and you excute code > based on that. When you return from the subroutine all the memory > in that stackframe is known to be free to reuse immediately if > wanted by the next subroutine call. > ... > > > Oh - that's what he meant by a linear stack - stack frames are > contiguous in memory. Actually, it's pretty simple to use a linear stack in memory without giving up the semantics of Smalltalk blocks. There are two uses of the creating context in a block: 1. Access to variables shared between the block and the context or other blocks. 2. Non-local return (i.e. a method return from a block) For the first kind of use, variables could be placed into a separate array whose lifetime is independent of that of the creating context. The non-local return is a bit more tricky. One possibility is to keep a pointer to the stack segment and the position of the frame within that segment in the array of variables, and point to that array from the stack frame. When a non-local return is attempted, the return code must check whether the block is executed in the same process as the creating context, and whether the frame corresponding to the creating context still exists (by just checking whether it points to the array of variables). VA Smalltalk does something like that. For non-reflective block semantics, that's enough. I think it is not too difficult to do continuations as well using such a stack architecture. In the case of VA, things get complicated mostly because it is not possible to manipulate the stack frame array directly without causing all sorts of havoc. I think Instantiations is working on this to allow Seaside to run on VA ST. One nice side effect of a "linear" stack without separate context objects is that you can overlap stack frames, i.e. objects pushed onto the sender's stack frame in preparation can become part of the called method's stack frame without copying. Of course, you need to get the frame linkage slots out of the way somehow, but that's not too difficult either. The implementations in BrouHaHa and VW which try to hide the "cheating" are ingenious, but I think that a linear stack implementation can be exposed to the image without causing too much trouble, making the VM design much simpler. Cheers, Hans-Martin |
In reply to this post by Paul D. Fernhout
On Feb 11, 2008 7:14 PM, Paul D. Fernhout <[hidden email]> wrote: Igor- <sniiiiiiiiiiiiiiiiiip> For the record, that email was 382 lines long or about 7 pages if printed, and it's just one of many of your posts! How /do/ you manage to write so much stuff in a day, Paul!? I wish I could be as productive. Gulik. |
In reply to this post by Paul D. Fernhout
paul
cut your emails into chunk else really few people will read them. Stef On Feb 11, 2008, at 7:14 AM, Paul D. Fernhout wrote: > Igor- > > You suggested "enable multiple versions of same package in same > image and > keep track of package dependency". That's been an inspirational > suggestion > for me, and I've been thinking about how to implement it for a > Squeak/JVM. > > I don't have a definite solution yet, but here are some thoughts on > it. > > I feel it may come down to either picking one of two paths. > > We could make a complex system for supporting multiple global system > dictionaries (or the equivalent) to allow multiple applications with > different dependencies to live together in one memory image. That's > really > just an extension of the status-quo in some ways, packing ever more > stuff > into one bigger and bigger image. > > Or, we can break the monolithic image into small images which each > just > support one application well (call them "mini-images"). Each mini- > image > might in turn depend upon some other common mini-images for defining > common > classes. This alternative would probably require Spoon-like > http://netjam.org/spoon/ > remote development and remote-debugging support to work best (but it > doesn't > absolutely have to, as there easily could be a development tools > mini-image > included by reference even in the tiniest mini-image). > > Personally, I think the second approach is ultimately simpler and more > elegant, and does a better job of bringing Smalltalk forward in a now > network-oriented world. See: > "Principles of Design -- Tim Berners-Lee " > http://www.w3.org/DesignIssues/Principles.html > "Principles such as simplicity and modularity are the stuff of > software > engineering; decentralization and tolerance are the life and breath of > Internet." > > You may well know all these issues, but I just thought I'd put it > down for > others comments as I understand it (in case I was wrong or missed > something). Probably I'll have outlined some approaches people here > know > about already created for Squeak or other systems, and anyone should > feel > free to point me to them. > > Anyway, feel free to stop reading here, but what follows is more > details on > how I came to think about this and arrive at those two possible paths. > > =============== how it is now, and a simple approach > > The biggest aspect of this is resolving globals. For review, if I > recall > correctly this is traditionally done in Squeak by the VM knowing > about a > SystemDictionary called "Smalltalk" (the VM needs to know about it > absolutely to resolve a circular dependency of not being able to > look up the > global "Smalltalk". :-). When a CompiledMethod being executed does > something like make a new instance of a class, it fetches the current > instance (typically of a class) associated with the name of the > global and > sends it a message or stores it in a variable. Using named globals > allows > late binding of classes by the compiled method. > > If you didn't care about late binding, like in Forth referring to a > previously defined word, you could just make a hard link as a > pointer to the > class at compile time in the compiled method. But then you could not > replace > or remove the class in its entirety later. > > There is room for only one version of a class at a time this normal > way -- > just one key in the Smalltalk system dictionary with one value. > > The simplest way around this might be to have system dictionary > values for > keys be dictionaries. Then you could tag each item with a version. > But the > executing code would still need to resolve which one it wanted. And > I don't > see how that would be easy. But maybe it might be? > > And then there is a deeper problem related to composites of objects > which > might include instances pointing to two or more different versions > of the > same class. But we can ignore that for now. :-) > > == A deeper analysis (or, "owww, my brain hurts". :-) > > Python has a straightforward way to resolve this -- it supports a > sea of > objects, and when you load code, the old classes get overridden in the > equivalent of a system dictionary with new classes, but the existing > instances still point to the old classes so those still hang around > but are > not accessible by name. This makes it difficult to do development in > a live > system, and you end up issuing special code to load things in > differently > (not making new classes) if you want to do Smalltalk-style dynamic > development. But there is no reason you cannot simply load two > version of > the same module (source file) and hang on to them somehow. Squeak > could > certainly do something similar if it had modules or classes which > could > exist without names. > > When I try to generalize this global idea, there are other > approaches. In > PataPata (in Python/Jython, trying to retrofit them with Squeak-like > capacities) I gave each object (typically Morphic-like GUI > components) a > "world" instance variable. That pointed to what was essentially the > equivalent of a Smalltalk system dictionary to store globals or key > functionality. In practice, each major window was in its own world, > although > that wasn't strictly required. Then I could have several worlds in > the same > process, where each was somewhat self-consistent. > > But objects could still slip from one to another, typically when > opening an > inspector//browser tool (itself in its own world) on another world > and maybe > copying an object from one place to another. Beyond globals, another > reason > for each object to have a pointer to its "world" was that when I > serialized > a world I just wanted the objects from that world to be written out > and no > others, so I could check that pointer to make sure the serialization > wasn't > wandering into writing out objects from other worlds (I didn't > pursue the > concept of nested worlds, which might have been possible). > > I was planning to use unnamed references to parents from prototypes > (for > inherited behavior and constants) in PataPata, based on how Self did > prototypes and links, but I decided in the end to reference prototypes > representing parents by by name, for the purpose of documenting > intent. But > that left a global lookup problem, resolved by having *every* > prototype have > a "World" pointer. And there were predictable problems when worlds > pointed > to themselves which I had to work around (especially when loading > worlds). > [Self has a fancier way of getting names for unnamed prototypes I > did not > want to try pursuing based on determining paths from a root.] > > Anyway, generalizing on this "object-focused late binding lookup" > approach, > objects can point to a global system dictionary, or they can point > to other > objects in some consistently structured way (typically "parent" or > "container" or "class") which might in turn allow a path to find a > global > (that process might even percolate up and then back down, say to > *search* > for an object with a certain value; I supported this in PataPata to > find > widgets with a certain name in the same window as a widget executing > some > behavior). > > But there is another way to do this, which is to have the thread, > process, > stack frame, or virtual machine hold onto a global system dictionary > object > somehow. This is closer to how Squeak does it with a system > dictionary, > except there might be one system dictionary per process or thread or > stack > frame. The difference is that the entity executing the code knows > where to > look for globals even if the objects being used for executing code > do not > (which presumably saves on memory, and provides a more consistent > notion of > what versions of classes a process want to see, assuming that is a > good idea > :-). In a most extreme case, the user running the program might know > the > object ID or memory location of the global system dictionary and > pass it in > as needed (this might happen in a debugger session). I might call > this an > "execution-focused late binding lookup" approach. > > For completeness, there is another approach which is to have globals > stored > in relation to the memory where the objects are stored (or processes > executed) if memory is partitioned somehow. So if you have an object > or > process memory location, you can find the global system dictionary > that goes > with it by looking somewhere special in that memory chunk > (beginning, end, > standard offset). Deep in the reality of a virtual machine, it might > even be > using this approach in various ways (like making sure the pointer to > the > system dictionary is, say, the first handle in an object memory > table). > > Probably someone who has a PhD in computer science could tell me the > proper > terms for these approaches towards late binding? :-) > > And of course, you can use more than one at a time. NewtonScript, for > example, found variables by having two different types of lookup, > based on a > parent slot and visual containment. Maybe you could use all of the > approaches at once in some system just for fun. I don't think I'd > want to > debug anything in it through. :-) > > Anyway, this doesn't answer how specifically to do what you propose, > but it > does suggest some possible points of intervention -- mainly > instances or > processes. > > But this leads to a deeper point. A Smalltalk VM (or any OO VM > system like > it, like the JVM objects or Python objects) has problem with > multiple global > objects if objects sharing the same VM in different global spaces > can point > at each other directly. > > Essentially, if you can have multiple global system dictionaries, > you end up > in a situation where an object from a "module" in one set of > interconnected > versions of modules can be reference by an object in a "module" in > another > interconnected set of different module versions. At that point, what > governs > the objects behavior, specially late binding lookup of globals? > Should it be > governed by the module the object came from? Or should it be > governed by the > module which it is now connected to? Or should it be governed by the > process > executing and calling a method of the object (and that process might > lookup > its globals in yet another way)? And similarly, when you absorb an > instance > form another module, should its class still point to the old class > or should > it point to the class in the new module? > > In general, this issue is a variant of a deeper problem related to OO: > http://mail.python.org/pipermail/edu-sig/2007-April/007852.html > as I feel the idea, that objects can stand alone and be somehow > meaningful, > is at the root of a lot of evil in the Smalltalk universe (e.g. > "bitrot". :-) > > Anyway, just from random comments here over the years, I get the > feeling > that in their hearts the original Squeak Central people (Dan Ingalls > especially) understand this and use heavily customized images in > practice as > coherent wholes, but perhaps they have never had the time to > generalize this > idea to a philosophical principle. Certainly just fighting for > objects at > all, as well as messages and VMs and good tools must have taken up > lots of > energy. > > Part of this issue may depend on whether you think of an object like a > single-celled creature like an Amoeba, or whether instead you think > of an > object as part of a biological entirety, like as a protein molecule > in a > cell, or a highly regulated cell in a large multi-cellar entity. If > objects > can't meaningfully stand alone, then it seems like we need some > coherent > philosophical approach to how they fit together into modules or > images. > > Loading multiple versions of the same classes seems to strain this > possible > coherence, as useful as it might be. It's not that it won;t work, > it's just > that the mental complexity starts increasing to the point where you > may have > to be really clever (and really alert) to keep track of it all. :-) > > === two competing approaches > > Because of all these difficulties and complexity, I'm inclined to lean > towards suggesting that images should be smaller, :-) and a VM's could > either be lightweight or perhaps could support multiple open images > at once. > Then you can load one version of a module into a larger set of other > modules, and maintain that set for one application. This total image > defines > an ecology of objects, and the objects and their classes all make > sense in > relation to each other (as well as whatever I/O they choose to do > through > the VM to the rest of the world). This is sort of like a living > cell. And > you could then load a different version of code modules into another > *different* image and maintain that set for a different application. > And > when these applications want to communicate, it will be from one > image to > another, through their different VMs, presumably via sockets or shared > memory or files or whatever, via some common serialization process. > There > are already several approaches for distributed objects in Smalltalk, > so I > doubt this will be much of a problem, and the JVM and Java offer other > possibilities for remote procedure calls and such. I think that a > minimal > image ("mini-image") approach might come closest to bringing some > sanity to > the idea of personal images (like Dan Ingalls seems to like). Every > image > would be a custom mix of module versions and hacked up base class > code. The > image would know with a little developer help which objects belonged > to > which modules. To help with this, one would need easy tools to > export module > versions and configurations. An important aspect of such an approach > might > be Spoon-like remote debugging, and remote development of minimal > images so > you could have, say, one image open with your favorite debugging > tools and > over a socket just plug those tools into other images you wanted to > modify > or debug; this isn't strictly necessary -- but conceptually it makes > things > more elegant, especially since then the development tools can have > different versions of base classes than thee system being debugged > or > developed. I get the feeling the Squeak ecosystem has most of the > parts of > all of this, they just haven't been all put together and polished > toward > this end. > > Still, for the JVM, which is what interests me right now, all the > objects do > live in one world, and the JVM has a big memory footprint. So, given > memory > footprint and startup time, even with the newer JVM's sharing some > memory > across VM instances, I think we might have to end up living with > multiple > system dictionaries in one JVM unless JVMs improve further? Or maybe > if we > discover they are good enough now? In that case, I end up wondering > if a > "world" instance variable added to every underlying Java object is > such a > bad idea after all. :-) Or the alternative of a "world" instance > variable > stored in each thread (or process) is also possible. Of course, > globals are > rarely looked up, so more indirect ways of storing them might be more > efficient trading off time for memory. So this is a second alternative > approach which is closer to the direction you outline. > > == best solution long term? > > After considering two paths in the previous two paragraphs, I think > using > lightweight images with only one system dictionary are a better way > to go > long term. They are just simpler and already well understood. > > If you, say, want a little clock up on your screen implemented in > Squeak > (instead of Lively Kernel :-), you just have a clock image. Ideally, > that's > all it does -- it's a clock. If you want to inspect the clock, you > fire up > your development image in another JVM and connect to that clock JVM > (maybe > using a universal debugging registry service). Maybe your > development image > even gives you a copy of the image of the clock window with drag-and- > drop > overlays on another screen. Or it might put annotations over the > original > window by temporarily inserting a "glasspane" if the clock > application was > using Swing widgets, or by the usual Squeak ways if the Clock > application > used Morphic widgets. > > To save space and maybe help with upgrades, perhaps the Clock > application > image depends on another larger base image. I did that in PataPata > where > worlds could require other worlds to be loaded first. Since I stored > images > as textual Python code which could rebuild a world of objects > procedurally, > that worked out OK. Here is an example of simple PataPata world; I > would > expect a Squeak clock image built in a similar fashion would be > about the > same tiny size and also written out as textual source: > http://patapata.svn.sourceforge.net/viewvc/patapata/tags/PataPata_v204/WorldDandelionGarden.py?revision=315&view=markup > (One fudge, the bitmap was store outside the image in a file.) > Note the line: > world.worldLibraries = [world.newWorldFromFile("WorldCommon.py")] > which is what defines the other worlds this world depends on. So, for > Squeak, this would be like saying your small image depends on other > images > which load first. > > Obviously you have to have any supporting images around or you can't > load > your dependent one, but for the most part you just typically depend on > common downloaded images. If images are stored as text (essentially, a > Smalltalk program needed to rebuild the image) dependencies are a > lot less > scary since you could always just go in and start cutting and > pasting in a > text editor (but hopefully there would be better tools for this). > > How to track and merge changes to base classes in supporting images is > obviously an issue, and it is not one PataPata tried to solve > (beyond the > fact that prototypes made it easy to override base class behavior > for most > things). But, since at runtime the supporting packages will be > loaded, you > can easily modify it in the live image and then write out a modified > version > of the base image again with a different version number, and hope > somebody > down the road can reconcile your changes if you want them to move > forward > with the supporting image. > > In this lightweight approach, images might also become modules > stored in > some source code repository if desired, or really, they might become > more > like (ENVY-ish?) configuration maps on top of available stored > modules. So, > to try to provide an example, you might save your running Clock > image as > module Clock-1.1.4 which also depends on BaseClasses-3.4.2. (This > would > require a worldwide way to identify Squeak modules uniquely.) Of > course you > might not store Clock-1.1.4 on a server; it might be stored on a > local drive > (perhaps in a Jar file, leading to Java classpath problems, but > nothing is > perfect :-). You might open up Clock-1.1.4, modify it using Spoon-like > remote tools, and maybe even save it back under the same version > number if > no one else depended on it (perhaps with an automatic minor sequential > internal revision number bump just in case). These names and version > numbers > might also be more like human readable suggestions than absolutes -- > for > example each "image" "module" could have a unique UUID (plus perhaps > save > sequence) and dependencies could be expressed as lists of acceptable > UUIDs > as well as names, with some sort of sophisticated matching algorithm > to > trying resolve dependency issues and search for modules various > places. > > For this Clock example, when you work on the clock you might pull up > another > image of development tools (browser, debugger, inspector, and so > on). But > the versions of these (or the base classes they depend on) don't > really > matter to the clock application. All that matters is that somehow > the two > JVMs (or JVM processes) agree on how to talk to each other to add new > methods, return results, single step code, follow object references, > and so > on. Presumably one could have a fairly standard protocol for that -- > maybe > even an extensible one (perhaps Spoon has this?). Let's say > something odd is > happening with the Clock. You want to see how an older version > works. Well, > you just open up that older clock image. Then you might even open up a > "image comparing" utility image :-) which lets you connect to both the > running Clock images simultaneously and compare versions of all the > classes > looking for differences. Still unsatisfied, maybe you clone the > older image > (to start a third clock running) and bit by bit copy classes or > modules from > the new image to the copy of the old until you find where the clock > starts > to behave oddly. Then you make a change (remotely) to the first > clock image > and see if it fixes the problem. Perhaps it turns out your code is > perfect > but the anomaly is due to a really deep problem in code supporting > Squeak/JVM -- so you drop down a level conceptually and pull up a JVM > debugger image, or maybe even just Eclipse, :-) > http://www.eclipsezone.com/eclipse/forums/t53459.html > connect to the JVM supporting that Clock image directly, and start > swearing > as you try to figure out what the Squeak/JVM maintainers did wrong > this > time. :-) If you wish, all of your actions with the multiple Squeak- > ish VMs > could have been logged to some common history repository somewhere > to replay > the entire multi-VM development session back to everyone who doesn't > believe > you that it's a JVM level issue. :-) Presumably one could build > testing > tools for this architecture as well. > > And Squeak in C could go down this mini-image route too. > > As I think a little more about this, I am still perhaps stuck with the > problem that even in these mini-images, there would need to be some > way to > link specific objects back to specific modules so a modified module > could be > written back out with all its related objects. This is because a > mini-image > is not just code, it is code plus live objects. And so when objects > are > created, they would have to be assigned somehow to a specific module > or > source mini-image. So, perhaps this mini-image solution needs to > have a > "world" field (or "module" or "segment") in every object anyway, > just so the > modified objects can be written back out into the right mini-image or > module? Or, if this was implemented in C, the image would be carved > up into > memory segments, with new objects allocated to the chunk of memory > going > with the specific min-image that was loaded. > > Squeak already has an image segment effort: > http://wiki.squeak.org/squeak/1213 > "ImageSegments and project swapping are still in the experimental > stage" > But it is binary, not textual source. And it is based on specific > roots, not > some sort of tag for each object. I guess both might take about the > same > amount of space -- instead of tagging each item with its segment > (world), > you have a big array which points to each object in the segment. > Maybe you > might want both? So objects know their segment and segments know their > objects? And I find it a little amusing I am putting up windows in > PataPata > defined by textual mini-image files of 3688 bytes (assuming a bitmap > loads > off the network or from a local file :-) while they are talking > about binary > image segments of 10s of megabytes. > > And as I read more on modular Squeak, I'm realizing that with mini- > images > the idea of a "project" would probably go away entirely. > > And any tool which compared mini-images would have to have some way of > representing objects in two different mini-images so it could look for > similarities and differences. At the very least, maybe like Les > Tyrrell's > OASIS project: > http://wiki.squeak.org/squeak/1056 > But there is a big difference between loading representations of > objects > (instances or their classes) to look at them and loading objects to > use them. > > Anyway, no easy solution. But I still think this second mini-image > approach > is simpler conceptually than attempting to keep different versions > of the > same things in the same VM. Both are possible, of course. > > === > > Anyway, maybe someone reading this might have a better suggestion or a > better (simpler, clearer) way of looking at this issue. > > --Paul Fernhout > > Igor Stasenko wrote: >> Ken Causey wrote: >>> [snip] >>> Within this community I've come to feel that the only day to day >>> practical solution is to do it and then ask for forgiveness when >>> it goes >>> all pear shaped (badly). Of course when that happens it really >>> helps >>> when it is something that can be readily reversed with no harm done. >>> And that's where it seems we have a problem because the current >>> release >>> management schemes don't well-support removing something readily >>> and in >>> such a way that few if any are inconvenienced. I don't have a ready >>> solution to that, it is something I find myself thinking about >>> more and >>> more. >>> [snip] >> >> There is a solution: enable multiple versions of same package in same >> image and keep track of package dependency. >> So, when you loading an updated package, all code which worked >> before, >> continues to work in same way as it was before. >> We need a way to be able developer to choose, what parts of system >> can >> use new version and what should use older version due to >> incompatibility reasons by simply checking dependencies and updating >> dependency links. >> >> Also, this would help a lot in maintaining packages: a package author >> can easily keep track of his package dependencies, and may or may not >> wish to release his package with updated dependencies, which use >> latest versions of packages, his package depends from. >> >> Of course, this is somewhat idealistic, and there is many caveats, >> but >> if done well, will allow us to mix things without fear that something >> will not work due to incompatibilities. > > |
I'll try to be short.
1. No, smalltalk VM (at least squeak) doesn't care about globals (in most cases). It uses a special objects table, which can be replaced on the fly. It simply because VM don't need to access globals when doing method lookup. All objects refer to its classes directly. 2. To get rid of globals you have to change only few lines in compiler code :) Of course, you should provide something another in exchange. Btw, if you search mail archives, you'll find a discussion about that. 3. The main barrier in making multiple versions of same class/package to live is support of dev tools (browser/compiler). VM don't require groundbreaking changes to support this. The exception is tagged oops (smallintegers) and well known singletons: nil/true/false objects. Even if you will have multiple SmallInteger classes, instances will be able to use only one of them. This is a sacrifice.. Well, but you can always make boxed integers :) 2 Paul: most of these ideas can find a way into world, when Michael van der Gulik will release his SecureSqueak project. So, i suggest, you better discuss details with him in first place, since he is the most interested person in this area. My idea of having multiple versions of packages was just a fruit of discussion with him :) Also, i noticed that Mike's view on many things in different areas are very similar to mine, which is good :) Who knows, maybe we'll join our efforts someday. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Michael van der Gulik-2
Michael-
I guess it is a matter of priorities. This is important to me. Plus we don't have broadcast TV. :-) Also, which is more productive? Writing a seven page email which gets ignored by most and *hopefully* trashed as advocating a design which is uninformed or redundant or massively incomplete by a few people who are really clued in on these issues (like yourself or Igor or whoever), [thanks for your comments, Igor] or spending a person-year making such a system and only then finding out after the fact it is uninformed, redundant, or massively incomplete? After a dozen years, the Squeak ecosystem of projects and people is so diverse it is hard to know what everyone is up to or has done (and I've been away from it for quite a while in Python-land). Obviously, writing code is more productive, if it gets used. But the problem I am concerned about here is in part people writing code for Squeak (e.g. with or without traits) and it being lost. You can have a lot of time to write and read long emails if you don't write a lot of code which just ends up getting thrown away instead. :-) Of course, for most programmers reading and writing code is more enjoyable than reading and writing design documents (or related documents). I do write code (eventually. :-) And I am at this point of needing design feedback precisely because of some code (PataPata) I was ultimately unhappy with (though I thought it was a productive experiment, since you learn from experiments whether they succeed or fail). Still, I'll concede as in a previously supplied link relating to Chandler that designs are always fraught with the peril that they missed some key idea you only find out deep in implementation which makes the whole project pointless. Still, an experienced designer is able to a limited extent to simulate a paper design in his or her head and get as feel for it, at least to the point of seeing obvious incompletenesses. But ultimately, it is true, the proof of a design idea is in working and useful code. --Paul Fernhout Michael van der Gulik wrote: > For the record, that email was 382 lines long or about 7 pages if printed, > and it's just one of many of your posts! > > How /do/ you manage to write so much stuff in a day, Paul!? I wish I could > be as productive. |
In reply to this post by Igor Stasenko
Igor-
Thanks for the feedback. I'll look more into what Mike is doing with SecureSqueak. Your thoughts also help me clarify something for myself as to VMs. I realize now that I am mosly concerned with writing out sets of code and objects (like in PataPata) whether they are called "image segments", "modules", "parcels", or whatever, as opposed to writing out images. And much of this issue of globals for me revolves around how to make it possible to track what objects go in what image segment (or module or whatever). And when an object is created is the most obvious time to make that assignment to an segment/module/parcel/whatever. I'll agree good tool support for this whole process is essential. For example, if you were using an inspector, you may often want to know what module the object was supposed to belong with (or even what image if it was browsed remotely like with Spoon), as well as maybe change that relationship. This is probably true as well for loading multiple version of the same class into the same image (assuming you wanted to write the class and its instances back out later). And this class/instance relationships is another way to keep objects separated into modules or some classes of objects But ultimately, if I write out a live window along with the class which defines it, I need to write out all the mall integers or collections which define the window and its behavior and state as well, and those instances of core classes will usually be defined elsewhere. As I wrote here: http://patapata.sourceforge.net/critique.html there were many disappointments with PataPata, but the small text based images were something I was pleased with (like the 4K example I linked to which defines a live window). See also: "Power Of Plain Text" http://www.c2.com/cgi/wiki/quickDiff?PowerOfPlainText Anyway, I see as a matter of emphasis what I should be focusing on in a Squeak/JVM is indeed good tool support as well as whatever it takes within the underlying infrastructure to be able to round up objects and say they belong together in some package to be written out. And naturally, the objects might want to know what module they belong to too, if they need to use that information someway (like module specific globals). --Paul Fernhout Igor Stasenko wrote: > I'll try to be short. > 1. No, smalltalk VM (at least squeak) doesn't care about globals (in > most cases). It uses a special objects table, which can be replaced on > the fly. > It simply because VM don't need to access globals when doing method > lookup. All objects refer to its classes directly. > > 2. To get rid of globals you have to change only few lines in compiler > code :) Of course, you should provide something another in exchange. > Btw, if you search mail archives, you'll find a discussion about that. > > 3. The main barrier in making multiple versions of same class/package > to live is support of dev tools (browser/compiler). VM don't require > groundbreaking changes to support this. > The exception is tagged oops (smallintegers) and well known > singletons: nil/true/false objects. Even if you will have multiple > SmallInteger classes, instances will be able to use only one of them. > This is a sacrifice.. Well, but you can always make boxed integers :) > > 2 Paul: most of these ideas can find a way into world, when Michael > van der Gulik will release his SecureSqueak project. > So, i suggest, you better discuss details with him in first place, > since he is the most interested person in this area. My idea of having > multiple versions of packages was just a fruit of discussion with him > :) > Also, i noticed that Mike's view on many things in different areas are > very similar to mine, which is good :) Who knows, maybe we'll join our > efforts someday. |
In reply to this post by Igor Stasenko
On Feb 12, 2008 12:38 AM, Igor Stasenko <[hidden email]> wrote:
Sorry, I didn't catch the parent post to this. My namespaces design[1] will allow the developer to load different versions of the same package into the image, and instantiate classes from either. The issues are: - Class comparison, because although two classes might have the same name, they'll be different classes. - Objects in the VM's special objects array. These are discussed in the last section of [1]. In a few weeks once I've got my package management system[2] working, I'll be making a new Metaclass hierarchy which will exist in the image alongside the existing Metaclass hierarchy. This will allow me to make radical changes to it without worrying about breaking the image. Note that these are brain dumps rather than documentation, and that they constantly change: [1] http://gulik.pbwiki.com/Namespaces [2] http://gulik.pbwiki.com/Packages Also, i noticed that Mike's view on many things in different areas are My plans are to design a secure kernel myself and then invite other developers to help when it is usable. I'm only working on this a couple of hours a week as a hobby, so don't hold your breath waiting. I'm usually on IRC when I'm working on it (yay broadband!). Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/ |
Free forum by Nabble | Edit this page |