On Thursday 20 Aug 2009 8:05:14 am Jecel Assumpcao Jr wrote:
> > This is what making a huge difference, for instance, between > > applications with open source code and applications shipped in binary > > form - you can only report bugs, but can't realy make any suggestions > > about what happening. > > All of the tools that created the bits in the first place, as well as > the tools to change them are inside the same image as the bits. So I > don't agree with your analogy. I think a better analogy is the way public key cryptography certificates are constituted. What matters is not whether a certificate is encoded in ASCII or binary but whether there is a chain of trust. Should anyone lose their certificate, it can be reconstituted from its parent cert in the chain. But if you happen to lose a root cert then no new chains can be reconstituted. You are stuck with the existing chains originating from this root. The key tools for a 'binary' encoding are the equality and diff tools. Given two images A and B, check if they are equivalent. If not, find the difference D that will reconstitute B from A. Subbu |
K. K. Subramaniam wrote on Thu, 20 Aug 2009 18:19:24 +0530
> On Thursday 20 Aug 2009 8:05:14 am Jecel Assumpcao Jr wrote: > > All of the tools that created the bits in the first place, as well as > > the tools to change them are inside the same image as the bits. So I > > don't agree with your analogy. > I think a better analogy is the way public key cryptography certificates are > constituted. What matters is not whether a certificate is encoded in ASCII or > binary but whether there is a chain of trust. Should anyone lose their > certificate, it can be reconstituted from its parent cert in the chain. But if > you happen to lose a root cert then no new chains can be reconstituted. You > are stuck with the existing chains originating from this root. Hmm... I didn't understand this analogy very well. Don't certificates in the middle of the chain also involve a pair of public/private keys? If so, it seems to me that losing the private key in the middle would be as fatal as losing the root one (though it was affect fewer people). Was the analogy about how a chain of certificates is like a chain of images (starting all the way back from Smalltalk-76)? > The key tools for a 'binary' encoding are the equality and diff tools. Given > two images A and B, check if they are equivalent. If not, find the difference D > that will reconstitute B from A. This is what the Debian guy was asking for, but the idea was to convert images into some kind of XML and then use traditional text diff to deal with that. I used to have great success with diff but in the past few years its results have become useless for me (probably some default settting has been changed and I would have to force it to work the old way), so I am not sure about the value of this approach. A tool that actually understood images, as you proposed, might work better. It wouldn't be too easy to write, however (see other thread about object identity). -- Jecel |
In reply to this post by K. K. Subramaniam
On Monday 24 Aug 2009 3:08:00 am Jecel Assumpcao Jr wrote:
> Was the analogy about how a chain of certificates is like a chain of > images (starting all the way back from Smalltalk-76)? Yes. > > The key tools for a 'binary' encoding are the equality and diff tools. > > Given two images A and B, check if they are equivalent. If not, find the > > difference D that will reconstitute B from A. > > This is what the Debian guy was asking for, but the idea was to convert > images into some kind of XML and then use traditional text diff to deal > with that. I meant a difference operator, not the diff(1) program. Change set browser is a good tool but is incomplete. It does not track and log all changes (e.g. class variables). Subbu |
In reply to this post by Colin Putney
On 19-Aug-09, at 5:45 PM, Jecel Assumpcao Jr wrote: > http://wiki.squeak.org/squeak/584 > > The idea is to be more like the Etoys users which can load binary > projects containing not only the code they need but also hand crafted > objects which have no source (like a drawing, some nested Morphs or > even > some text). This is very simplistic compared to Spoon, and my proposal > was even more simplistic. In particular, this doesn't handle the case > where any changes to bytecodes or object format are needed. Interesting. I note, though, that the wiki page you mention doesn't actually say much about development. It's mostly concerned with efficient ways of moving objects between images. Reliably reconstructing part of one image in another is certainly a crucial part of collaborative development, but it's not everything. The other key feature of Monticello is merging. If you and I have the same chunk in different image, and we make differing but compatible changes, how can we create a chunk that contains both sets of changes? I submit that any tool that can do that will have explicit knowledge of the semantics of objects it's merging, whether Smalltalk code, Etoys projects or something else. So the wonderful generality of the Chunky Images idea only gets you so far, and you still need a tool like Monticello to actually create collaboratively. In Monticello 2 I've tried to address this idea explicitly: the core versioning engine is knows nothing about the semantics of the objects it's versioning, but it does rely on pluggable domain models that do. Colin |
We're bumping up against the homoiconicity of the system, aren't we? That code is really just a kind of data. Has anyone ever done a diff tool for whole images, not just source methods? It would be fantabulous if I didn't have to write an installer script for my package, instead having the necessary objects brought over directly.
Seems like the mother of all problems is: moving things around that way between images of different formats. Would some future descendant of SystemTracer perhaps be of use? - Ron
On Sun, Aug 23, 2009 at 9:14 PM, Colin Putney <[hidden email]> wrote:
|
In reply to this post by Colin Putney
Colin Putney wrote:
> I note, though, that the wiki page you mention doesn't actually say > much about development. That is left up to other tools in this proposal. > It's mostly concerned with efficient ways of > moving objects between images. It can't even do that either - an object can only be reloaded into the exact same image from which it was extracted (unlike ImageSegments). The idea is that by letting several "images" live side by side in memory and disk without taking up too much space then you won't mind dedicating an "image" for each Squeak application you use. > Reliably reconstructing part of one > image in another is certainly a crucial part of collaborative > development, but it's not everything. You are right, but as I mentioned above my proposal doesn't even try to do that much. > The other key feature of Monticello is merging. If you and I have the > same chunk in different image, and we make differing but compatible > changes, how can we create a chunk that contains both sets of changes? You would have to use Monticello or something similar (in which case you would be limited to source code). For "merging" generic objects my idea was to use Croquet, but then you would be limited to the equivalent of instant messaging rather than email. > I submit that any tool that can do that will have explicit knowledge > of the semantics of objects it's merging, whether Smalltalk code, > Etoys projects or something else. Yes. And I have given up on automatic conflict resolution after working many years on the problem (for the more general Neo Smalltalk modules, since for Chunky Squeak you can't even get conflicts in the first place as it is so limited). > So the wonderful generality of the Chunky Images idea only gets you so > far, and you still need a tool like Monticello to actually create > collaboratively. In Monticello 2 I've tried to address this idea > explicitly: the core versioning engine is knows nothing about the > semantics of the objects it's versioning, but it does rely on > pluggable domain models that do. I should have mentioned http://wiki.squeak.org/squeak/5637 (Neo Smalltalk "groups") as well since Chunky Squeak is just a very stripped down version of it. In that system you do have merging in something that is similar to "commit" in transactional systems. -- Jecel |
In reply to this post by K. K. Subramaniam
K. K. Subramaniam wrote:
> I meant a difference operator, not the diff(1) program. Change set browser is a > good tool but is incomplete. It does not track and log all changes (e.g. class > variables). I meant the same thing, but mentioned the text diff program as an example of what some people would like to be able to use. Back when Smalltalk-80 used an object table it wouldn't have been that hard to create a difference operator for binary images since objects never changed their "oop". With direct pointers it is far more complicated to decide that two objects in separate images are actually the same. The best strategy is probably to start out with classes and processes and do a breadth first search. -- Jecel |
In reply to this post by Casey Ransberger
Ronald Spengler wrote:
> We're bumping up against the homoiconicity of the system, aren't we? > That code is really just a kind of data. Has anyone ever done a diff tool > for whole images, not just source methods? Like I said in another reply, given that Squeak objects don't have a fixed identity (see other thread) it isn't very easy. But I don't think it is impossible in practice since objects mostly exist in very stereotyped patterns. > It would be fantabulous if I didn't have to write an installer script for > my package, instead having the necessary objects brought over directly. That is what I want. And I partly had it in Smalltalk V/Win (also released as Smalltalk Express). In that system your image started out as an essentially empty v.exe file plus a bunch of .dll files with objects and code. Some of these had to be shipped with the application while many (with all the development tools) couldn't (due to the license). The .dll files had lots of stuff you wouldn't need but you had to ship them even if you only needed a single object. And there were no tools to tell you that you were using any objects from a .dll so you might ship if even if you didn't need it. But I don't think all these problems would be too hard to fix. > Seems like the mother of all problems is: moving things around that way > between images of different formats. Would some future descendant of > SystemTracer perhaps be of use? If you want to send messages between these different images and you want to be able to send over some of the arguments (instead of just a far reference back to the sending image) then this problem has to be solved anyway. Perhaps we are talking about something more like the Corba serialization format than the SystemTracer, but it certainly is related to the latter. -- Jecel |
In reply to this post by K. K. Subramaniam
On Mon, Aug 24, 2009 at 7:35 PM, Jecel Assumpcao Jr <[hidden email]> wrote:
That only works for things that are created in exactly the same order with no intervening operations. If I were to compile the source for method A followed by compiling the source for method B I would end up with different oops for methods A and B than if I were to first compile method B's source followed by method A's source. Things would also be different if in between compiling I performed some other arbitrary action that caused allocations whose results were discarded.
The problem is orthogonal to direct pointers. With direct pointers it is far more complicated to What is needed is generic structural comparison. One problem in this is that certain collections are unordered and therefore comparing structure reachable from unordered collections may involve a combinatoric explosion (compare all possible pair-wise combinations, succeeding if a match is found). Another problem is what I'll call incidental concrete difference. Are these two equivalent or not for the purposes of comparison or not? (1 to: 3) #(1 2 3)? (etc)
The schema for code representation in the system is well-defined and several ordering operations exist to allow comparison; selector-method-pairs in method dictionaries can be ordered by lexicographic order of selectors and sibling subclasses can be ordered by lexicographic order of class names. Hence structural comparison of Smalltalk code is straight-forward. Generalising to arbitrary object structures isn't at all straight-forward unless analogous schema are introduced.
Writing a recursive structural equality tester in Smalltalk is straight-forward (I have code written for the Newspeak project I could post if you're interested) but it fails for unordered collections and for incidental concrete difference.
|
In reply to this post by Casey Ransberger
On Mon, Aug 24, 2009 at 7:48 PM, Jecel Assumpcao Jr <[hidden email]> wrote:
and our experience at ParcPlace-Digitalk when we compared SLLs (Smalltalk V's system of object DLLs requiring VM support, quite similar to image segments) with Parcels (VW's system of a conventional but optimized object pickling format) was that parcels were as fast, if not faster, and far less brittle. Making the file format identical to the object format and linking in objects to the heap instead of parsing them seems like a really cool idea but it ties the representation far too closely to a particular implementation of the memory manager and object model. Going with a "soft"pickling format allows one to concentrate on important issues like naming assumed structure (what are the prerequisites of a component) and how to update it in the presence of schema changes, e.g. what if a component includes an instance of some class which has gained or lost instance variables in the loading image when compared to the image that published the component.
Again you should check-out the VW parcel system and the communications framework OpenTalk. The OpenTalk marshaller is influenced by the parcel object marshaller but the two have differences.
|
In reply to this post by Casey Ransberger
On Mon, Aug 24, 2009 at 6:15 PM, Ronald Spengler <[hidden email]> wrote: We're bumping up against the homoiconicity of the system, aren't we? That code is really just a kind of data. Has anyone ever done a diff tool for whole images, not just source methods? It would be fantabulous if I didn't have to write an installer script for my package, instead having the necessary objects brought over directly. Why is this such a mother of a problem? In VW we used a single parcel format that could represent compiled code that could be loaded into either a 32-bit or a 64-bit image even when the size of SmallInteger, the number of tag bits, the existence or not of SmallDouble, the size of identify hashes, the way class references are encoded in instances, etc all differed between the 32-bit and 64-bit systems.
|
In reply to this post by Igor Stasenko
On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> wrote: 2009/8/20 Eliot Miranda <[hidden email]>: We didn't disallow representation of arbitrary data but we also didn't support it. The only thing the Parcel system supports (as in the tool set, rather than what one can extend the framework to do in specific circumstances) is to represent code, which it does very well.
What are these mistakes? Can you be specific? I think the parcel system has been a major success. VW is now deployed as a system of components, the base image and a much larger suite of parcels. Parcels are not tied to a particular version or implementation and yet are still fast to publish and load. What's not to like?
|
2009/8/25 Eliot Miranda <[hidden email]>:
> > > On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> wrote: >> >> 2009/8/20 Eliot Miranda <[hidden email]>: >> > Hi Igor, >> > >> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]> >> > wrote: >> >> >> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>: >> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700: >> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote: >> >> >> >> >> >> > For example, I would far prefer to >> >> >> > see Squeak move to a binary based development model (I would >> >> >> > mention >> >> >> > Projects and Etoys here) than the current source based things we >> >> >> > are >> >> >> > doing (trunk, bob or whatever). >> >> >> >> >> >> Forgive me for seizing on a throw-away comment like this, but would >> >> >> you mind expanding on this a bit? Are you saying you prefer >> >> >> something >> >> >> spoonish, where CompiledMethods are passed directly from image to >> >> >> image? Something else? >> >> > >> >> > Heh, I got asked about this on IRC as well. Though I had actually >> >> > started to explain this a little in the original email, I ended up >> >> > deleting it to keep on topic. With a new subject line I don't feel I >> >> > have to worry about that. Some details about this (with a few >> >> > drawings) >> >> > can be found in the Chunky Squeak wiki page: >> >> > >> >> > http://wiki.squeak.org/squeak/584 >> >> > >> >> > The idea is to be more like the Etoys users which can load binary >> >> > projects containing not only the code they need but also hand crafted >> >> > objects which have no source (like a drawing, some nested Morphs or >> >> > even >> >> > some text). This is very simplistic compared to Spoon, and my >> >> > proposal >> >> > was even more simplistic. In particular, this doesn't handle the case >> >> > where any changes to bytecodes or object format are needed. >> >> > >> >> >> >> The central question, which arising immediately is, what is the >> >> credible way(s) to reproduce such artifacts? >> >> When we having a source code, we could (re)compile it on a different >> >> system. But what you propose to do with pure binary data, a soup of >> >> objects, in respect that it is incredibly hard to understand, what >> >> bits you need and what's not, in case if you need to do clean-up , >> >> refactor, rewrite and simply analyze what is happening. >> >> This is what making a huge difference, for instance, between >> >> applications with open source code and applications shipped in binary >> >> form - you can only report bugs, but can't realy make any suggestions >> >> about what happening. >> >> I don't think that developers of Squeak should be victims of such >> >> situation(s). >> > >> > it is possible to have your cake and eat it too. One can create a >> > binary format that includes source and includes the meta-source for its >> > creation. But including a binary representation allows much faster >> > loading, >> > loading without a >> > compiler, and source hiding if one choses not to include the source. >> > >> > There are other advantages, such as not cluttering up the changes file when one loads a package In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format. In >> > the parcel file the source pointers of compiled methods are the >> > positions of >> > their source in the parcel source file. When one loads a parcel the >> > SourceFileManager adds the file to its set of managed files and assigns >> > an >> > index for the source file. The parcle loader then swizzles all the >> > source >> > pointers so that they include the source file index along with the >> > position. >> > So accessing the source for a method loaded form a parcel accesses that >> > parcel's source file. We used a floating-point like format for source >> > pointers, where the exponent was the source file index, and the mantissa >> > was >> > the position in the file. >> > We didn't create a single file format, having two separate files for >> > binary >> > and source, which is probably a mistake. A format with a short header, >> > followed by source, followed by binary, followed by metasource, would be >> > easier to manage than three separate files. >> > We didn't include any metasource, but we did include pre-read, load and >> > unload actions. I did a very bad job on version numbering and >> > prerequisite >> > selection. >> > That's not the whole story but enough to start answering your question. >> > If >> > there is a well-defined definition of the objects in a package and that >> > definition is included in the package as metasource, then one can >> > comprehend >> > the binary package's contents by examining the metasource and can >> > reproduce >> > creating the package, provided that the tools are careful to impose >> > ordering, etc. >> > best >> > Eliot >> >> I think you inevitably made wrong decisions, because you went this way >> by allowing an >> arbitrary binary data , held by package. >> In such situations it is much more easier to make a mistakes. >> But sure, one who's making no mistakes is one who doing nothing :) > > We didn't disallow representation of arbitrary data but we also didn't > support it. The only thing the Parcel system supports (as in the tool set, > rather than what one can extend the framework to do in specific > circumstances) is to represent code, which it does very well. > What are these mistakes? Can you be specific? I think the parcel system > has been a major success. VW is now deployed as a system of components, the > base image and a much larger suite of parcels. Parcels are not tied to a > particular version or implementation and yet are still fast to publish and > load. What's not to like? I referred mainly to your own statements about mistake(s). I don't know about parcels so much to tell exactly where is the flaws. I'm still wondering, how you could unload a parcel if its not longer needed, but there are still object(s) which used/created by parcel sitting in image. A basic use case is: developer needs some specific tool (like UI design tool) when he working on application. But at the moment when he ships the application, it is no longer needed. >> >> >> Obviously one of the side of such problem is uniform object memory, >> where each object could >> reference any other object and limited only by a imagination of people. >> There is no layers or any other means which could establish a certain >> barriers (which we calling a modules) >> in smalltalk. >> It means, that once you integrated the parcel into image, and started >> using it, you may have a hard times trying to unload it. >> It is possible to develop an image as an artifact, which contains both >> binary & sources , but such approach >> having a drawbacks, which we, by the way, trying to overcome nowadays. >> Practice shows that such approach is credible only >> for a small group of individuals, but becomes a bottleneck if you >> adopt such scheme for a wider community. >> >> So, i think , that before entering this domain (allowing binary data), >> first we should solve more basic problems of smalltalk & its design - >> modularity, name spaces, layering & etc etc.. Only the we could return >> to original question and solve it. >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > > > > -- Best regards, Igor Stasenko AKA sig. |
On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <[hidden email]> wrote: 2009/8/25 Eliot Miranda <[hidden email]>: Ah, ok, Sorry :)
Smalltalk has this problem with or without binary loading; they're called obsolete classes :) However, the problem of knowing what to remove when the user says "unload" means that a loaded parcel requires a data structure that names the classes and methods it loaded. In addition we maintain overrides, the older versions of methods and class definitions, in a stack, so that these can be restored when unloading a parcel. I made lots of mistakes here (not allowing the tools to publish a parcel that has code overridden by others, not integrating source management and browsing queries with overridden code, not compressing the changes correctly with overridden code, etc, etc). Tests would have helped :/
VW did (does?) test for open instances of applications when we unload a parcel so that if the parcel contains a subclass(s) of ApplicationModel (VW's top-level GUI app class) all open applications are tested to see if they contain instances of the class(es) and a warning is issued.
A basic use case is: developer needs some specific tool (like UI Right. I don't know of an automatic solution, but a good convention is to split all packages into a development and deployment pair where the deployment half is a prerequisite of the development half. Sticking to the convention and using good names makes it easier to remember to remove deevelopment components and to guess which parts of someone else's components are development only.
I added a bulk instancesOf primitive that answered all instances of an Array of classes that my colleague Steve Dahl wanted to use in instance migration on class redefinition. This could be used to look for all instances of the classes defined by a parcel prior to unload. Do a GC, collect all instaces of classes defined (rather than redefined) by a parcel and warn if non-empty (if in a dev image).
|
Eliot,
thanks for all your wonderful comments and insights about the Parcel system in VisualWorks. My experience with it is extremely limited (I once loaded idass, the chip simulation system, as a parcel into VW 5i NC) and so I cited V/Win as an existence proof. You are correct that an object table wouldn't help in general when comparing two images - I was thinking of the specific case of when one is known to be directly derived from the other like Squeak 3.8 from 3.7. This was from the discussion of doing a security audit to allow Squeak to be included in Debian. Comparing unordered collections has all the complications you mentioned and in Smalltalk this is supposed to be solved in #=, which experience tells us not to trust too much. For Neo Smalltalk I didn't do pure memory dumps but had a binary format that was reasonably compressed. And it didn't have small integers but only variable sized ones and these became SmallIntegers or LargeIntegers when read in. That made the binary format compatible between the 16 and 36 bit versions of Neo Smalltalk. One idea for Neo modules that Dan thought a bit excessive was to divide each into four related modules: the actual objects (I'll call this the "deployment module" to use your term), the sources (just a bunch of String objects), the documentation (nicely formated text, with possibly pictures or even movies) and the tests. In different situations you might want different subsets of these. For example, while browsing through SqueakMap you might click on "see more..." and get the full docs in your machine. Then you might click on an example and create a new object (in a new module) that would bring in the deployment module to support it. If you ever try to look at the code for this new object in the system browser or the debugger then the sources module would get loaded. There would be links to the tests in the documentation but they might also get loaded through the SUnit tool. If you close all windows with the documentation, the doc module will eventually be unloaded. Of course, I am supposing that objects in a module can point to objects in a separate module in the above description. And that module loading/unloading is a kind of crude virtual memory. -- Jecel |
In reply to this post by Eliot Miranda-2
2009/8/25 Eliot Miranda <[hidden email]>:
> > > On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <[hidden email]> wrote: >> >> 2009/8/25 Eliot Miranda <[hidden email]>: >> > >> > >> > On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> >> > wrote: >> >> >> >> 2009/8/20 Eliot Miranda <[hidden email]>: >> >> > Hi Igor, >> >> > >> >> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]> >> >> > wrote: >> >> >> >> >> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>: >> >> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700: >> >> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote: >> >> >> >> >> >> >> >> > For example, I would far prefer to >> >> >> >> > see Squeak move to a binary based development model (I would >> >> >> >> > mention >> >> >> >> > Projects and Etoys here) than the current source based things >> >> >> >> > we >> >> >> >> > are >> >> >> >> > doing (trunk, bob or whatever). >> >> >> >> >> >> >> >> Forgive me for seizing on a throw-away comment like this, but >> >> >> >> would >> >> >> >> you mind expanding on this a bit? Are you saying you prefer >> >> >> >> something >> >> >> >> spoonish, where CompiledMethods are passed directly from image >> >> >> >> to >> >> >> >> image? Something else? >> >> >> > >> >> >> > Heh, I got asked about this on IRC as well. Though I had actually >> >> >> > started to explain this a little in the original email, I ended up >> >> >> > deleting it to keep on topic. With a new subject line I don't feel >> >> >> > I >> >> >> > have to worry about that. Some details about this (with a few >> >> >> > drawings) >> >> >> > can be found in the Chunky Squeak wiki page: >> >> >> > >> >> >> > http://wiki.squeak.org/squeak/584 >> >> >> > >> >> >> > The idea is to be more like the Etoys users which can load binary >> >> >> > projects containing not only the code they need but also hand >> >> >> > crafted >> >> >> > objects which have no source (like a drawing, some nested Morphs >> >> >> > or >> >> >> > even >> >> >> > some text). This is very simplistic compared to Spoon, and my >> >> >> > proposal >> >> >> > was even more simplistic. In particular, this doesn't handle the >> >> >> > case >> >> >> > where any changes to bytecodes or object format are needed. >> >> >> > >> >> >> >> >> >> The central question, which arising immediately is, what is the >> >> >> credible way(s) to reproduce such artifacts? >> >> >> When we having a source code, we could (re)compile it on a different >> >> >> system. But what you propose to do with pure binary data, a soup of >> >> >> objects, in respect that it is incredibly hard to understand, what >> >> >> bits you need and what's not, in case if you need to do clean-up , >> >> >> refactor, rewrite and simply analyze what is happening. >> >> >> This is what making a huge difference, for instance, between >> >> >> applications with open source code and applications shipped in >> >> >> binary >> >> >> form - you can only report bugs, but can't realy make any >> >> >> suggestions >> >> >> about what happening. >> >> >> I don't think that developers of Squeak should be victims of such >> >> >> situation(s). >> >> > >> >> > it is possible to have your cake and eat it too. One can create >> >> > a >> >> > binary format that includes source and includes the meta-source for >> >> > its >> >> > creation. But including a binary representation allows much faster >> >> > loading, >> >> > loading without a >> >> > compiler, and source hiding if one choses not to include the source. >> >> > >> >> > >> >> > There are other advantages, such as not cluttering up the changes file when one loads a package In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format. In >> >> > the parcel file the source pointers of compiled methods are the >> >> > positions of >> >> > their source in the parcel source file. When one loads a parcel the >> >> > SourceFileManager adds the file to its set of managed files and >> >> > assigns >> >> > an >> >> > index for the source file. The parcle loader then swizzles all the >> >> > source >> >> > pointers so that they include the source file index along with the >> >> > position. >> >> > So accessing the source for a method loaded form a parcel accesses >> >> > that >> >> > parcel's source file. We used a floating-point like format for >> >> > source >> >> > pointers, where the exponent was the source file index, and the >> >> > mantissa >> >> > was >> >> > the position in the file. >> >> > We didn't create a single file format, having two separate files for >> >> > binary >> >> > and source, which is probably a mistake. A format with a short >> >> > header, >> >> > followed by source, followed by binary, followed by metasource, would >> >> > be >> >> > easier to manage than three separate files. >> >> > We didn't include any metasource, but we did include pre-read, load >> >> > and >> >> > unload actions. I did a very bad job on version numbering and >> >> > prerequisite >> >> > selection. >> >> > That's not the whole story but enough to start answering your >> >> > question. >> >> > If >> >> > there is a well-defined definition of the objects in a package and >> >> > that >> >> > definition is included in the package as metasource, then one can >> >> > comprehend >> >> > the binary package's contents by examining the metasource and can >> >> > reproduce >> >> > creating the package, provided that the tools are careful to impose >> >> > ordering, etc. >> >> > best >> >> > Eliot >> >> >> >> I think you inevitably made wrong decisions, because you went this way >> >> by allowing an >> >> arbitrary binary data , held by package. >> >> In such situations it is much more easier to make a mistakes. >> >> But sure, one who's making no mistakes is one who doing nothing :) >> > >> > We didn't disallow representation of arbitrary data but we also didn't >> > support it. The only thing the Parcel system supports (as in the tool >> > set, >> > rather than what one can extend the framework to do in specific >> > circumstances) is to represent code, which it does very well. >> > What are these mistakes? Can you be specific? I think the parcel >> > system >> > has been a major success. VW is now deployed as a system of components, >> > the >> > base image and a much larger suite of parcels. Parcels are not tied to >> > a >> > particular version or implementation and yet are still fast to publish >> > and >> > load. What's not to like? >> >> I referred mainly to your own statements about mistake(s). > > Ah, ok, Sorry :) > >> >> I don't know about parcels so much to tell exactly where is the flaws. >> I'm still wondering, how you could unload a parcel if its not longer >> needed, but >> there are still object(s) which used/created by parcel sitting in image. > > Smalltalk has this problem with or without binary loading; they're called > obsolete classes :) However, the problem of knowing what to remove when the > user says "unload" means that a loaded parcel requires a data structure that > names the classes and methods it loaded. In addition we maintain overrides, > the older versions of methods and class definitions, in a stack, so that > these can be restored when unloading a parcel. I made lots of mistakes here > (not allowing the tools to publish a parcel that has code overridden by > others, not integrating source management and browsing queries with > overridden code, not compressing the changes correctly with overridden code, > etc, etc). Tests would have helped :/ > VW did (does?) test for open instances of applications when we unload a > parcel so that if the parcel contains a subclass(s) of ApplicationModel > (VW's top-level GUI app class) all open applications are tested to see if > they contain instances of the class(es) and a warning is issued. >> >> A basic use case is: developer needs some specific tool (like UI >> design tool) when he working >> on application. But at the moment when he ships the application, it is >> no longer needed. > > Right. I don't know of an automatic solution, but a good convention is to > split all packages into a development and deployment pair where > the deployment half is a prerequisite of the development half. Sticking to > the convention and using good names makes it easier to remember to remove > deevelopment components and to guess which parts of someone else's > components are development only. Yes, and this is what i really missing in smalltalk-80 based environments: distinction between development and deployment modes & models. It would be cool to have some basic things to behave different when in deployed mode (like preventing access & data overrides). The main problem in open system (such as smalltalk object memory) is that when something goes wrong, often you having two choices: reboot the system or debug and fix the problem in a living environment. Often, none of the choices is acceptable, because if we are talking about end-user application, we don't expect that user is able to debug & fix the issue. As well as rebooting an image means loss of data and/or interruption of serving other jobs. But, if system modelled in modular layers , like kernel -> services -> interfaces -> working set, then things would be much easier to handle. > I added a bulk instancesOf primitive that answered all instances of an Array > of classes that my colleague Steve Dahl wanted to use in instance migration > on class redefinition. This could be used to look for all instances of the > classes defined by a parcel prior to unload. Do a GC, collect all instaces > of classes defined (rather than redefined) by a parcel and warn if non-empty > (if in a dev image). I think that independent tiny layers (isles/vats) is the future system organization in smalltalk-like VMs. First, it gives the strong answer to question, what belongs to what. There is no possibility to reference a foreign object other than by far ref. You can count/enumerate them easily, and this approach also makes possible to run code in vats concurrently. The problem here is how to handle the shared behavior, like Arrays, Collections etc in order to avoid duplication. Since in smalltalk everything is objects, and so methods & classes too, they can belong only to a single island/vat, and therefore , only owning island can manipulate with it. This creates a major bottleneck in effective implementation of concurrently (and independently) running the code. Trade space for speed? Allow each island to have own Array class with own implementation? This question remains open for me. >> >> >> Obviously one of the side of such problem is uniform object memory, >> >> where each object could >> >> reference any other object and limited only by a imagination of people. >> >> There is no layers or any other means which could establish a certain >> >> barriers (which we calling a modules) >> >> in smalltalk. >> >> It means, that once you integrated the parcel into image, and started >> >> using it, you may have a hard times trying to unload it. >> >> It is possible to develop an image as an artifact, which contains both >> >> binary & sources , but such approach >> >> having a drawbacks, which we, by the way, trying to overcome nowadays. >> >> Practice shows that such approach is credible only >> >> for a small group of individuals, but becomes a bottleneck if you >> >> adopt such scheme for a wider community. >> >> >> >> So, i think , that before entering this domain (allowing binary data), >> >> first we should solve more basic problems of smalltalk & its design - >> >> modularity, name spaces, layering & etc etc.. Only the we could return >> >> to original question and solve it. >> >> >> >> -- >> >> Best regards, >> >> Igor Stasenko AKA sig. >> >> >> > >> > >> > >> > >> > >> >> >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > > > > -- Best regards, Igor Stasenko AKA sig. |
On Tue, Aug 25, 2009 at 12:35 PM, Igor Stasenko <[hidden email]> wrote:
Yes, I agree. One of the things the headless support in VW allows which is quite nice is taking a shapshot which can then be restarted in a headless mode for debugging. This can easily be mailed or ftp'ed back for analysis.
Not quite the same, but very neat: The other day at Qwaq Craig Latta had a VM crash while running in a Parallels Linux VM under gdb. He was able to give me a copy of the VM snapshot at the point where gdb stopped the process, giving me the opportunity to debug the live app at my leisure. A cool idea.
Yes, yes, yes!! The system should be like an onion where each layer of the onion is a set of interlocking techtonic plates of modules of functionality.
Yes, this is a cool radical idea that I haven't got my head around yet. I need to think about this at length. The obvious approach to the duplication is copy-on-write where any modifications to the root Array class get propagated to the copies, assuming there is some hierarchical control organization. I think this approach is taken in Alex's worlds where modifications to a parent world are seen my children. But then the merge problem rears its head when trying to propagate modifications to a child that has made its own local modifications in the same region.
|
On Aug 25, 2009, at 2:28 PM, Eliot Miranda wrote: > One of the things the headless support in VW allows which is quite > nice is taking a shapshot which can then be restarted in a headless > mode for debugging. Eliot - Was this VM or image-side support? Can you describe how it worked? > Not quite the same, but very neat: The other day at Qwaq Craig > Latta had a VM crash while running in a Parallels Linux VM under > gdb. He was able to give me a copy of the VM snapshot at the point > where gdb stopped the process, giving me the opportunity to debug > the live app at my leisure. A cool idea. In other words, he was already running the Qwaq VM under gdb, so when the Qwaq VM crashed (and left him at a gdb prompt), he simply suspended the Parallels Linux VM and sent you a copy of the suspended Parallels Linux VM. Is that right? David |
In reply to this post by Eliot Miranda-2
Eliot Miranda wrote:
> Yes, this is a cool radical idea that I haven't got my head around yet. > I need to think about this at length. The obvious approach to the > duplication is copy-on-write where any modifications to the root > Array class get propagated to the copies, assuming there is some > hierarchical control organization. I think this approach is taken in > Alex's worlds where modifications to a parent world are seen my > children. But then the merge problem rears its head when trying > to propagate modifications to a child that has made its own local > modifications in the same region. I have some papers about this (1992 and 1993) but since they are in Portuguese it makes no sense for me to point them out. The basic idea is the MESI cache coherence protocol from bus based multiprocessors (network based multiprocessors normally use directory based schemes which are closer to what we want but harder to explain so I will start out with MESI). Any given cache line, or object in our case, can be either Invalid (meaning the local node doesn't have a copy), Exclusive (the local node has a copy and knows that nobody else does), Shared (there is a local copy and possibly other nodes also have copies) and Modified (the local copy has been changed and must be saved to main memory). - You can go from I to either E or S (which one depends on what the other caches say) by fetching a copy. - You can go from E to S if you see anybody else fetch a copy. - You can go from S to E by asking everybody else go from S to I and inform you they have done so. - You can go from E to M by writing to your copy. - You can go from M to E by saving your copy to main memory. - You can go from E or S to I if you need to reuse the cache line for other data. No other transitions are allowed (perhaps this would be far easier to understand as a drawing?). This scheme doesn't need to merge since there is at most one changed copy at any given time. This restricts parallelism compared to multiple worlds, but is compatible with our current semantics. Note that David Ungar showed something very similar to this actually running in Squeak on 56 processors at least year's OOPSLA and the movie of his demo (thanks, Göran!) is available online (at http://siliconsqueak.org among other places). -- Jecel |
In reply to this post by Eliot Miranda-2
>>>>> "Jecel" == Jecel Assumpcao <[hidden email]> writes:
Jecel> - You can go from I to either E or S (which one depends on what the Jecel> other caches say) by fetching a copy. Jecel> - You can go from E to S if you see anybody else fetch a copy. Jecel> - You can go from S to E by asking everybody else go from S to I and Jecel> inform you they have done so. Jecel> - You can go from E to M by writing to your copy. Jecel> - You can go from M to E by saving your copy to main memory. Jecel> - You can go from E or S to I if you need to reuse the cache line for Jecel> other data. digraph Jecel { I -> {E; S} [label = "fetch"]; E -> S [label = "other fetch"]; S -> E [label = "force other S->I"]; E -> M [label = "write"]; M -> E [label = "save"]; {E; S} -> I [label = "reuse"]; } Name it "jecel.dot", read it into OmniGraffle or any Graphviz Tool. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <[hidden email]> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion |
Free forum by Nabble | Edit this page |