[squeak-dev] 3.11 and the trunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

K. K. Subramaniam
On Thursday 20 Aug 2009 8:05:14 am Jecel Assumpcao Jr wrote:
> > This is what making a huge difference, for instance, between
> > applications with open source code and applications shipped in binary
> > form - you can only report bugs, but can't realy make any suggestions
> > about what happening.
>
> All of the tools that created the bits in the first place, as well as
> the tools to change them are inside the same image as the bits. So I
> don't agree with your analogy.
I think a better analogy is the way public key cryptography certificates are
constituted. What matters is not whether a certificate is encoded in ASCII or
binary but whether there is a chain of trust. Should anyone lose their
certificate, it can be reconstituted from its parent cert in the chain. But if
you happen to lose a root cert then no new chains can be reconstituted. You
are stuck with the existing chains originating from this root.

The key tools for a 'binary' encoding are the equality and diff tools. Given
two images A and B, check if they are equivalent. If not, find the difference D
that will reconstitute B from A.

Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Jecel Assumpcao Jr
K. K. Subramaniam  wrote on Thu, 20 Aug 2009 18:19:24 +0530

> On Thursday 20 Aug 2009 8:05:14 am Jecel Assumpcao Jr wrote:
> > All of the tools that created the bits in the first place, as well as
> > the tools to change them are inside the same image as the bits. So I
> > don't agree with your analogy.
> I think a better analogy is the way public key cryptography certificates are
> constituted. What matters is not whether a certificate is encoded in ASCII or
> binary but whether there is a chain of trust. Should anyone lose their
> certificate, it can be reconstituted from its parent cert in the chain. But if
> you happen to lose a root cert then no new chains can be reconstituted. You
> are stuck with the existing chains originating from this root.

Hmm... I didn't understand this analogy very well. Don't certificates in
the middle of the chain also involve a pair of public/private keys? If
so, it seems to me that losing the private key in the middle would be as
fatal as losing the root one (though it was affect fewer people).

Was the analogy about how a chain of certificates is like a chain of
images (starting all the way back from Smalltalk-76)?

> The key tools for a 'binary' encoding are the equality and diff tools. Given
> two images A and B, check if they are equivalent. If not, find the difference D
> that will reconstitute B from A.

This is what the Debian guy was asking for, but the idea was to convert
images into some kind of XML and then use traditional text diff to deal
with that. I used to have great success with diff but in the past few
years its results have become useless for me (probably some default
settting has been changed and I would have to force it to work the old
way), so I am not sure about the value of this approach. A tool that
actually understood images, as you proposed, might work better. It
wouldn't be too easy to write, however (see other thread about object
identity).

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

K. K. Subramaniam
In reply to this post by K. K. Subramaniam
On Monday 24 Aug 2009 3:08:00 am Jecel Assumpcao Jr wrote:
> Was the analogy about how a chain of certificates is like a chain of
> images (starting all the way back from Smalltalk-76)?
Yes.
> > The key tools for a 'binary' encoding are the equality and diff tools.
> > Given two images A and B, check if they are equivalent. If not, find the
> > difference D that will reconstitute B from A.
>
> This is what the Debian guy was asking for, but the idea was to convert
> images into some kind of XML and then use traditional text diff to deal
> with that.
I meant a difference operator, not the diff(1) program. Change set browser is a
good tool but is incomplete. It does not track and log all changes (e.g. class
variables).

Subbu

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Colin Putney
In reply to this post by Colin Putney

On 19-Aug-09, at 5:45 PM, Jecel Assumpcao Jr wrote:

> http://wiki.squeak.org/squeak/584
>
> The idea is to be more like the Etoys users which can load binary
> projects containing not only the code they need but also hand crafted
> objects which have no source (like a drawing, some nested Morphs or  
> even
> some text). This is very simplistic compared to Spoon, and my proposal
> was even more simplistic. In particular, this doesn't handle the case
> where any changes to bytecodes or object format are needed.

Interesting.

I note, though, that the wiki page you mention doesn't actually say  
much about development. It's mostly concerned with efficient ways of  
moving objects between images. Reliably reconstructing part of one  
image in another is certainly a crucial part of collaborative  
development, but it's not everything.

The other key feature of Monticello is merging. If you and I have the  
same chunk in different image, and we make differing but compatible  
changes, how can we create a chunk that contains both sets of changes?  
I submit that any tool that can do that will have explicit knowledge  
of the semantics of objects it's merging, whether Smalltalk code,  
Etoys projects or something else.

So the wonderful generality of the Chunky Images idea only gets you so  
far, and you still need a tool like Monticello to actually create  
collaboratively. In Monticello 2 I've tried to address this idea  
explicitly: the core versioning engine is knows nothing about the  
semantics of the objects it's versioning, but it does rely on  
pluggable domain models that do.

Colin

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Casey Ransberger
We're bumping up against the homoiconicity of the system, aren't we? That code is really just a kind of data. Has anyone ever done a diff tool for whole images, not just source methods? It would be fantabulous if I didn't have to write an installer script for my package, instead having the necessary objects brought over directly. 

Seems like the mother of all problems is: moving things around that way between images of different formats. Would some future descendant of SystemTracer perhaps be of use?

 - Ron

On Sun, Aug 23, 2009 at 9:14 PM, Colin Putney <[hidden email]> wrote:

On 19-Aug-09, at 5:45 PM, Jecel Assumpcao Jr wrote:

http://wiki.squeak.org/squeak/584

The idea is to be more like the Etoys users which can load binary
projects containing not only the code they need but also hand crafted
objects which have no source (like a drawing, some nested Morphs or even
some text). This is very simplistic compared to Spoon, and my proposal
was even more simplistic. In particular, this doesn't handle the case
where any changes to bytecodes or object format are needed.

Interesting.

I note, though, that the wiki page you mention doesn't actually say much about development. It's mostly concerned with efficient ways of moving objects between images. Reliably reconstructing part of one image in another is certainly a crucial part of collaborative development, but it's not everything.

The other key feature of Monticello is merging. If you and I have the same chunk in different image, and we make differing but compatible changes, how can we create a chunk that contains both sets of changes? I submit that any tool that can do that will have explicit knowledge of the semantics of objects it's merging, whether Smalltalk code, Etoys projects or something else.

So the wonderful generality of the Chunky Images idea only gets you so far, and you still need a tool like Monticello to actually create collaboratively. In Monticello 2 I've tried to address this idea explicitly: the core versioning engine is knows nothing about the semantics of the objects it's versioning, but it does rely on pluggable domain models that do.

Colin




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Jecel Assumpcao Jr
In reply to this post by Colin Putney
Colin Putney wrote:
> I note, though, that the wiki page you mention doesn't actually say  
> much about development.

That is left up to other tools in this proposal.

> It's mostly concerned with efficient ways of  
> moving objects between images.

It can't even do that either - an object can only be reloaded into the
exact same image from which it was extracted (unlike ImageSegments). The
idea is that by letting several "images" live side by side in memory and
disk without taking up too much space then you won't mind dedicating an
"image" for each Squeak application you use.

> Reliably reconstructing part of one  
> image in another is certainly a crucial part of collaborative  
> development, but it's not everything.

You are right, but as I mentioned above my proposal doesn't even try to
do that much.

> The other key feature of Monticello is merging. If you and I have the  
> same chunk in different image, and we make differing but compatible  
> changes, how can we create a chunk that contains both sets of changes?  

You would have to use Monticello or something similar (in which case you
would be limited to source code). For "merging" generic objects my idea
was to use Croquet, but then you would be limited to the equivalent of
instant messaging rather than email.

> I submit that any tool that can do that will have explicit knowledge  
> of the semantics of objects it's merging, whether Smalltalk code,  
> Etoys projects or something else.

Yes. And I have given up on automatic conflict resolution after working
many years on the problem (for the more general Neo Smalltalk modules,
since for Chunky Squeak you can't even get conflicts in the first place
as it is so limited).

> So the wonderful generality of the Chunky Images idea only gets you so  
> far, and you still need a tool like Monticello to actually create  
> collaboratively. In Monticello 2 I've tried to address this idea  
> explicitly: the core versioning engine is knows nothing about the  
> semantics of the objects it's versioning, but it does rely on  
> pluggable domain models that do.

I should have mentioned http://wiki.squeak.org/squeak/5637 (Neo
Smalltalk "groups") as well since Chunky Squeak is just a very stripped
down version of it. In that system you do have merging in something that
is similar to "commit" in transactional systems.

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Jecel Assumpcao Jr
In reply to this post by K. K. Subramaniam
K. K. Subramaniam wrote:
> I meant a difference operator, not the diff(1) program. Change set browser is a
> good tool but is incomplete. It does not track and log all changes (e.g. class
> variables).

I meant the same thing, but mentioned the text diff program as an
example of what some people would like to be able to use. Back when
Smalltalk-80 used an object table it wouldn't have been that hard to
create a difference operator for binary images since objects never
changed their "oop". With direct pointers it is far more complicated to
decide that two objects in separate images are actually the same. The
best strategy is probably to start out with classes and processes and do
a breadth first search.

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Jecel Assumpcao Jr
In reply to this post by Casey Ransberger
Ronald Spengler wrote:
> We're bumping up against the homoiconicity of the system, aren't we?
> That code is really just a kind of data. Has anyone ever done a diff tool
> for whole images, not just source methods?

Like I said in another reply, given that Squeak objects don't have a
fixed identity (see other thread) it isn't very easy. But I don't think
it is impossible in practice since objects mostly exist in very
stereotyped patterns.

> It would be fantabulous if I didn't have to write an installer script for
> my package, instead having the necessary objects brought over directly. 

That is what I want. And I partly had it in Smalltalk V/Win (also
released as Smalltalk Express). In that system your image started out as
an essentially empty v.exe file plus a bunch of .dll files with objects
and code. Some of these had to be shipped with the application while
many (with all the development tools) couldn't (due to the license). The
.dll files had lots of stuff you wouldn't need but you had to ship them
even if you only needed a single object. And there were no tools to tell
you that you were using any objects from a .dll so you might ship if
even if you didn't need it. But I don't think all these problems would
be too hard to fix.

> Seems like the mother of all problems is: moving things around that way
> between images of different formats. Would some future descendant of
> SystemTracer perhaps be of use?

If you want to send messages between these different images and you want
to be able to send over some of the arguments (instead of just a far
reference back to the sending image) then this problem has to be solved
anyway. Perhaps we are talking about something more like the Corba
serialization format than the SystemTracer, but it certainly is related
to the latter.

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2
In reply to this post by K. K. Subramaniam


On Mon, Aug 24, 2009 at 7:35 PM, Jecel Assumpcao Jr <[hidden email]> wrote:
K. K. Subramaniam wrote:
> I meant a difference operator, not the diff(1) program. Change set browser is a
> good tool but is incomplete. It does not track and log all changes (e.g. class
> variables).

I meant the same thing, but mentioned the text diff program as an
example of what some people would like to be able to use. Back when
Smalltalk-80 used an object table it wouldn't have been that hard to
create a difference operator for binary images since objects never
changed their "oop".

That only works for things that are created in exactly the same order with no intervening operations.  If I were to compile the source for method A followed by compiling the source for method B I would end up with different oops for methods A and B than if I were to first compile method B's source followed by method A's source.  Things would also be different if in between compiling I performed some other arbitrary action that caused allocations whose results were discarded.

The problem is orthogonal to direct pointers.
 
With direct pointers it is far more complicated to
decide that two objects in separate images are actually the same. The
best strategy is probably to start out with classes and processes and do
a breadth first search.

What is needed is generic structural comparison.  One problem in this is that certain collections are unordered and therefore comparing structure reachable from unordered collections may involve a combinatoric explosion (compare all possible pair-wise combinations, succeeding if a match is found).  Another problem is what I'll call incidental concrete difference.  Are these two equivalent or not for the purposes of comparison or not?  (1 to: 3) #(1 2 3)?  (etc)

The schema for code representation in the system is well-defined and several ordering operations exist to allow comparison; selector-method-pairs in method dictionaries can be ordered by lexicographic order of selectors and sibling subclasses can be ordered by lexicographic order of class names.  Hence structural comparison of Smalltalk code is straight-forward.  Generalising to arbitrary object structures isn't at all straight-forward unless analogous schema are introduced.

Writing a recursive structural equality tester in Smalltalk is straight-forward (I have code written for the Newspeak project I could post if you're interested) but it fails for unordered collections and for incidental concrete difference.




-- Jecel





Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2
In reply to this post by Casey Ransberger


On Mon, Aug 24, 2009 at 7:48 PM, Jecel Assumpcao Jr <[hidden email]> wrote:
Ronald Spengler wrote:
> We're bumping up against the homoiconicity of the system, aren't we?
> That code is really just a kind of data. Has anyone ever done a diff tool
> for whole images, not just source methods?

Like I said in another reply, given that Squeak objects don't have a
fixed identity (see other thread) it isn't very easy. But I don't think
it is impossible in practice since objects mostly exist in very
stereotyped patterns.

> It would be fantabulous if I didn't have to write an installer script for
> my package, instead having the necessary objects brought over directly. 

That is what I want. And I partly had it in Smalltalk V/Win (also
released as Smalltalk Express). In that system your image started out as
an essentially empty v.exe file plus a bunch of .dll files with objects
and code. Some of these had to be shipped with the application while
many (with all the development tools) couldn't (due to the license). The
.dll files had lots of stuff you wouldn't need but you had to ship them
even if you only needed a single object. And there were no tools to tell
you that you were using any objects from a .dll so you might ship if
even if you didn't need it. But I don't think all these problems would
be too hard to fix.

and our experience at ParcPlace-Digitalk when we compared SLLs (Smalltalk V's system of object DLLs requiring VM support, quite similar to image segments) with Parcels (VW's system of a conventional but optimized object pickling format) was that parcels were as fast, if not faster, and far less brittle.  Making the file format identical to the object format and linking in objects to the heap instead of parsing them seems like a really cool idea but it ties the representation far too closely to a particular implementation of the memory manager and object model.  Going with a "soft"pickling format allows one to concentrate on important issues like naming assumed structure (what are the prerequisites of a component) and how to update it in the presence of schema changes, e.g. what if a component includes an instance of some class which has gained or lost instance variables in the loading image when compared to the image that published the component. 



> Seems like the mother of all problems is: moving things around that way
> between images of different formats. Would some future descendant of
> SystemTracer perhaps be of use?

If you want to send messages between these different images and you want
to be able to send over some of the arguments (instead of just a far
reference back to the sending image) then this problem has to be solved
anyway. Perhaps we are talking about something more like the Corba
serialization format than the SystemTracer, but it certainly is related
to the latter.

Again you should check-out the VW parcel system and the communications framework OpenTalk.  The OpenTalk marshaller is influenced by the parcel object marshaller but the two have differences. 


-- Jecel





Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2
In reply to this post by Casey Ransberger


On Mon, Aug 24, 2009 at 6:15 PM, Ronald Spengler <[hidden email]> wrote:
We're bumping up against the homoiconicity of the system, aren't we? That code is really just a kind of data. Has anyone ever done a diff tool for whole images, not just source methods? It would be fantabulous if I didn't have to write an installer script for my package, instead having the necessary objects brought over directly. 

Seems like the mother of all problems is: moving things around that way between images of different formats. Would some future descendant of SystemTracer perhaps be of use?

Why is this such a mother of a problem?  In VW we used a single parcel format that could represent compiled code that could be loaded into either a 32-bit or a 64-bit image even when the size of SmallInteger, the number of tag bits, the existence or not of SmallDouble, the size of identify hashes, the way class references are encoded in instances, etc all differed between the 32-bit and 64-bit systems.


 - Ron


On Sun, Aug 23, 2009 at 9:14 PM, Colin Putney <[hidden email]> wrote:

On 19-Aug-09, at 5:45 PM, Jecel Assumpcao Jr wrote:

http://wiki.squeak.org/squeak/584

The idea is to be more like the Etoys users which can load binary
projects containing not only the code they need but also hand crafted
objects which have no source (like a drawing, some nested Morphs or even
some text). This is very simplistic compared to Spoon, and my proposal
was even more simplistic. In particular, this doesn't handle the case
where any changes to bytecodes or object format are needed.

Interesting.

I note, though, that the wiki page you mention doesn't actually say much about development. It's mostly concerned with efficient ways of moving objects between images. Reliably reconstructing part of one image in another is certainly a crucial part of collaborative development, but it's not everything.

The other key feature of Monticello is merging. If you and I have the same chunk in different image, and we make differing but compatible changes, how can we create a chunk that contains both sets of changes? I submit that any tool that can do that will have explicit knowledge of the semantics of objects it's merging, whether Smalltalk code, Etoys projects or something else.

So the wonderful generality of the Chunky Images idea only gets you so far, and you still need a tool like Monticello to actually create collaboratively. In Monticello 2 I've tried to address this idea explicitly: the core versioning engine is knows nothing about the semantics of the objects it's versioning, but it does rely on pluggable domain models that do.

Colin








Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2
In reply to this post by Igor Stasenko


On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> wrote:
2009/8/20 Eliot Miranda <[hidden email]>:
> Hi Igor,
>
> On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>:
>> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >>
>> >> > For example, I would far prefer to
>> >> > see Squeak move to a binary based development model (I would mention
>> >> > Projects and Etoys here) than the current source based things we are
>> >> > doing (trunk, bob or whatever).
>> >>
>> >> Forgive me for seizing on a throw-away comment like this, but would
>> >> you mind expanding on this a bit? Are you saying you prefer something
>> >> spoonish, where CompiledMethods  are passed directly from image to
>> >> image? Something else?
>> >
>> > Heh, I got asked about this on IRC as well. Though I had actually
>> > started to explain this a little in the original email, I ended up
>> > deleting it to keep on topic. With a new subject line I don't feel I
>> > have to worry about that. Some details about this (with a few drawings)
>> > can be found in the Chunky Squeak wiki page:
>> >
>> > http://wiki.squeak.org/squeak/584
>> >
>> > The idea is to be more like the Etoys users which can load binary
>> > projects containing not only the code they need but also hand crafted
>> > objects which have no source (like a drawing, some nested Morphs or even
>> > some text). This is very simplistic compared to Spoon, and my proposal
>> > was even more simplistic. In particular, this doesn't handle the case
>> > where any changes to bytecodes or object format are needed.
>> >
>>
>> The central question, which arising immediately is, what is the
>> credible way(s) to reproduce such artifacts?
>> When we having a source code, we could (re)compile it on a different
>> system. But what you propose to do with pure binary data, a soup of
>> objects, in respect that it is incredibly hard to understand, what
>> bits you need and what's not, in case if you need to do clean-up ,
>> refactor, rewrite and simply analyze what is happening.
>> This is what making a huge difference, for instance, between
>> applications with open source code and applications shipped in binary
>> form - you can only report bugs, but can't realy make any suggestions
>> about what happening.
>> I don't think that developers of Squeak should be victims of such
>> situation(s).
>
>     it is possible to have your cake and eat it too.  One can create a
> binary format that includes source and includes the meta-source for its
> creation.  But including a binary representation allows much faster loading,
> loading without a
> compiler, and source hiding if one choses not to include the source.
> There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
> the parcel file the source pointers of compiled methods are the positions of
> their source in the parcel source file.  When one loads a parcel the
> SourceFileManager adds the file to its set of managed files and assigns an
> index for the source file.  The parcle loader then swizzles all the source
> pointers so that they include the source file index along with the position.
>  So accessing the source for a method loaded form a parcel accesses that
> parcel's source file.  We used a floating-point like format for source
> pointers, where the exponent was the source file index, and the mantissa was
> the position in the file.
> We didn't create a single file format, having two separate files for binary
> and source, which is probably a mistake.  A format with a short header,
> followed by source, followed by binary, followed by metasource, would be
> easier to manage than three separate files.
> We didn't include any metasource, but we did include pre-read, load and
> unload actions.  I did a very bad job on version numbering and prerequisite
> selection.
> That's not the whole story but enough to start answering your question.  If
> there is a well-defined definition of the objects in a package and that
> definition is included in the package as metasource, then one can comprehend
> the binary package's contents by examining the metasource and can reproduce
> creating the package, provided that the tools are careful to impose
> ordering, etc.
> best
> Eliot

I think you inevitably made wrong decisions, because you went this way
by allowing an
arbitrary binary data , held by package.
In such situations it is much more easier to make a mistakes.
But sure, one who's making no mistakes is one who doing nothing :)

We didn't disallow representation of arbitrary data but we also didn't support it.  The only thing the Parcel system supports (as in the tool set, rather than what one can extend the framework to do in specific circumstances) is to represent code, which it does very well.

What are these mistakes?  Can you be specific?  I think the parcel system has been a major success.  VW is now deployed as a system of components, the base image and a much larger suite of parcels.  Parcels are not tied to a particular version or implementation and yet are still fast to publish and load.  What's not to like?



Obviously one of the side of such problem is uniform object memory,
where each object could
reference any other object and limited only by a imagination of people.
There is no layers or any other means which could establish a certain
barriers (which we calling a modules)
in smalltalk.
It means, that once you integrated the parcel into image, and started
using it, you may have a hard times trying to unload it.
It is possible to develop an image as an artifact, which contains both
binary & sources , but such approach
having a drawbacks, which we, by the way, trying to overcome nowadays.
Practice shows that such approach is credible only
for a small group of individuals, but becomes a bottleneck if you
adopt such scheme for a wider community.

So, i think , that before entering this domain (allowing binary data),
first we should solve more basic problems of smalltalk & its design -
modularity, name spaces, layering & etc etc.. Only the we could return
to original question and solve it.

--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Igor Stasenko
2009/8/25 Eliot Miranda <[hidden email]>:

>
>
> On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/8/20 Eliot Miranda <[hidden email]>:
>> > Hi Igor,
>> >
>> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]>
>> > wrote:
>> >>
>> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>:
>> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >> >>
>> >> >> > For example, I would far prefer to
>> >> >> > see Squeak move to a binary based development model (I would
>> >> >> > mention
>> >> >> > Projects and Etoys here) than the current source based things we
>> >> >> > are
>> >> >> > doing (trunk, bob or whatever).
>> >> >>
>> >> >> Forgive me for seizing on a throw-away comment like this, but would
>> >> >> you mind expanding on this a bit? Are you saying you prefer
>> >> >> something
>> >> >> spoonish, where CompiledMethods  are passed directly from image to
>> >> >> image? Something else?
>> >> >
>> >> > Heh, I got asked about this on IRC as well. Though I had actually
>> >> > started to explain this a little in the original email, I ended up
>> >> > deleting it to keep on topic. With a new subject line I don't feel I
>> >> > have to worry about that. Some details about this (with a few
>> >> > drawings)
>> >> > can be found in the Chunky Squeak wiki page:
>> >> >
>> >> > http://wiki.squeak.org/squeak/584
>> >> >
>> >> > The idea is to be more like the Etoys users which can load binary
>> >> > projects containing not only the code they need but also hand crafted
>> >> > objects which have no source (like a drawing, some nested Morphs or
>> >> > even
>> >> > some text). This is very simplistic compared to Spoon, and my
>> >> > proposal
>> >> > was even more simplistic. In particular, this doesn't handle the case
>> >> > where any changes to bytecodes or object format are needed.
>> >> >
>> >>
>> >> The central question, which arising immediately is, what is the
>> >> credible way(s) to reproduce such artifacts?
>> >> When we having a source code, we could (re)compile it on a different
>> >> system. But what you propose to do with pure binary data, a soup of
>> >> objects, in respect that it is incredibly hard to understand, what
>> >> bits you need and what's not, in case if you need to do clean-up ,
>> >> refactor, rewrite and simply analyze what is happening.
>> >> This is what making a huge difference, for instance, between
>> >> applications with open source code and applications shipped in binary
>> >> form - you can only report bugs, but can't realy make any suggestions
>> >> about what happening.
>> >> I don't think that developers of Squeak should be victims of such
>> >> situation(s).
>> >
>> >     it is possible to have your cake and eat it too.  One can create a
>> > binary format that includes source and includes the meta-source for its
>> > creation.  But including a binary representation allows much faster
>> > loading,
>> > loading without a
>> > compiler, and source hiding if one choses not to include the source.
>> >
>> > There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
>> > the parcel file the source pointers of compiled methods are the
>> > positions of
>> > their source in the parcel source file.  When one loads a parcel the
>> > SourceFileManager adds the file to its set of managed files and assigns
>> > an
>> > index for the source file.  The parcle loader then swizzles all the
>> > source
>> > pointers so that they include the source file index along with the
>> > position.
>> >  So accessing the source for a method loaded form a parcel accesses that
>> > parcel's source file.  We used a floating-point like format for source
>> > pointers, where the exponent was the source file index, and the mantissa
>> > was
>> > the position in the file.
>> > We didn't create a single file format, having two separate files for
>> > binary
>> > and source, which is probably a mistake.  A format with a short header,
>> > followed by source, followed by binary, followed by metasource, would be
>> > easier to manage than three separate files.
>> > We didn't include any metasource, but we did include pre-read, load and
>> > unload actions.  I did a very bad job on version numbering and
>> > prerequisite
>> > selection.
>> > That's not the whole story but enough to start answering your question.
>> >  If
>> > there is a well-defined definition of the objects in a package and that
>> > definition is included in the package as metasource, then one can
>> > comprehend
>> > the binary package's contents by examining the metasource and can
>> > reproduce
>> > creating the package, provided that the tools are careful to impose
>> > ordering, etc.
>> > best
>> > Eliot
>>
>> I think you inevitably made wrong decisions, because you went this way
>> by allowing an
>> arbitrary binary data , held by package.
>> In such situations it is much more easier to make a mistakes.
>> But sure, one who's making no mistakes is one who doing nothing :)
>
> We didn't disallow representation of arbitrary data but we also didn't
> support it.  The only thing the Parcel system supports (as in the tool set,
> rather than what one can extend the framework to do in specific
> circumstances) is to represent code, which it does very well.
> What are these mistakes?  Can you be specific?  I think the parcel system
> has been a major success.  VW is now deployed as a system of components, the
> base image and a much larger suite of parcels.  Parcels are not tied to a
> particular version or implementation and yet are still fast to publish and
> load.  What's not to like?

I referred mainly to your own statements about mistake(s).
I don't know about parcels so much to tell exactly where is the flaws.
I'm still wondering, how you could unload a parcel if its not longer needed, but
there are still object(s) which used/created by parcel sitting in image.
A basic use case is: developer needs some specific tool (like UI
design tool) when he working
on application. But at the moment when he ships the application, it is
no longer needed.

>>
>>
>> Obviously one of the side of such problem is uniform object memory,
>> where each object could
>> reference any other object and limited only by a imagination of people.
>> There is no layers or any other means which could establish a certain
>> barriers (which we calling a modules)
>> in smalltalk.
>> It means, that once you integrated the parcel into image, and started
>> using it, you may have a hard times trying to unload it.
>> It is possible to develop an image as an artifact, which contains both
>> binary & sources , but such approach
>> having a drawbacks, which we, by the way, trying to overcome nowadays.
>> Practice shows that such approach is credible only
>> for a small group of individuals, but becomes a bottleneck if you
>> adopt such scheme for a wider community.
>>
>> So, i think , that before entering this domain (allowing binary data),
>> first we should solve more basic problems of smalltalk & its design -
>> modularity, name spaces, layering & etc etc.. Only the we could return
>> to original question and solve it.
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2


On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <[hidden email]> wrote:
2009/8/25 Eliot Miranda <[hidden email]>:
>
>
> On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/8/20 Eliot Miranda <[hidden email]>:
>> > Hi Igor,
>> >
>> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]>
>> > wrote:
>> >>
>> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>:
>> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >> >>
>> >> >> > For example, I would far prefer to
>> >> >> > see Squeak move to a binary based development model (I would
>> >> >> > mention
>> >> >> > Projects and Etoys here) than the current source based things we
>> >> >> > are
>> >> >> > doing (trunk, bob or whatever).
>> >> >>
>> >> >> Forgive me for seizing on a throw-away comment like this, but would
>> >> >> you mind expanding on this a bit? Are you saying you prefer
>> >> >> something
>> >> >> spoonish, where CompiledMethods  are passed directly from image to
>> >> >> image? Something else?
>> >> >
>> >> > Heh, I got asked about this on IRC as well. Though I had actually
>> >> > started to explain this a little in the original email, I ended up
>> >> > deleting it to keep on topic. With a new subject line I don't feel I
>> >> > have to worry about that. Some details about this (with a few
>> >> > drawings)
>> >> > can be found in the Chunky Squeak wiki page:
>> >> >
>> >> > http://wiki.squeak.org/squeak/584
>> >> >
>> >> > The idea is to be more like the Etoys users which can load binary
>> >> > projects containing not only the code they need but also hand crafted
>> >> > objects which have no source (like a drawing, some nested Morphs or
>> >> > even
>> >> > some text). This is very simplistic compared to Spoon, and my
>> >> > proposal
>> >> > was even more simplistic. In particular, this doesn't handle the case
>> >> > where any changes to bytecodes or object format are needed.
>> >> >
>> >>
>> >> The central question, which arising immediately is, what is the
>> >> credible way(s) to reproduce such artifacts?
>> >> When we having a source code, we could (re)compile it on a different
>> >> system. But what you propose to do with pure binary data, a soup of
>> >> objects, in respect that it is incredibly hard to understand, what
>> >> bits you need and what's not, in case if you need to do clean-up ,
>> >> refactor, rewrite and simply analyze what is happening.
>> >> This is what making a huge difference, for instance, between
>> >> applications with open source code and applications shipped in binary
>> >> form - you can only report bugs, but can't realy make any suggestions
>> >> about what happening.
>> >> I don't think that developers of Squeak should be victims of such
>> >> situation(s).
>> >
>> >     it is possible to have your cake and eat it too.  One can create a
>> > binary format that includes source and includes the meta-source for its
>> > creation.  But including a binary representation allows much faster
>> > loading,
>> > loading without a
>> > compiler, and source hiding if one choses not to include the source.
>> >
>> > There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
>> > the parcel file the source pointers of compiled methods are the
>> > positions of
>> > their source in the parcel source file.  When one loads a parcel the
>> > SourceFileManager adds the file to its set of managed files and assigns
>> > an
>> > index for the source file.  The parcle loader then swizzles all the
>> > source
>> > pointers so that they include the source file index along with the
>> > position.
>> >  So accessing the source for a method loaded form a parcel accesses that
>> > parcel's source file.  We used a floating-point like format for source
>> > pointers, where the exponent was the source file index, and the mantissa
>> > was
>> > the position in the file.
>> > We didn't create a single file format, having two separate files for
>> > binary
>> > and source, which is probably a mistake.  A format with a short header,
>> > followed by source, followed by binary, followed by metasource, would be
>> > easier to manage than three separate files.
>> > We didn't include any metasource, but we did include pre-read, load and
>> > unload actions.  I did a very bad job on version numbering and
>> > prerequisite
>> > selection.
>> > That's not the whole story but enough to start answering your question.
>> >  If
>> > there is a well-defined definition of the objects in a package and that
>> > definition is included in the package as metasource, then one can
>> > comprehend
>> > the binary package's contents by examining the metasource and can
>> > reproduce
>> > creating the package, provided that the tools are careful to impose
>> > ordering, etc.
>> > best
>> > Eliot
>>
>> I think you inevitably made wrong decisions, because you went this way
>> by allowing an
>> arbitrary binary data , held by package.
>> In such situations it is much more easier to make a mistakes.
>> But sure, one who's making no mistakes is one who doing nothing :)
>
> We didn't disallow representation of arbitrary data but we also didn't
> support it.  The only thing the Parcel system supports (as in the tool set,
> rather than what one can extend the framework to do in specific
> circumstances) is to represent code, which it does very well.
> What are these mistakes?  Can you be specific?  I think the parcel system
> has been a major success.  VW is now deployed as a system of components, the
> base image and a much larger suite of parcels.  Parcels are not tied to a
> particular version or implementation and yet are still fast to publish and
> load.  What's not to like?

I referred mainly to your own statements about mistake(s).

Ah, ok,  Sorry :)
 

I don't know about parcels so much to tell exactly where is the flaws.
I'm still wondering, how you could unload a parcel if its not longer needed, but
there are still object(s) which used/created by parcel sitting in image.

Smalltalk has this problem with or without binary loading; they're called obsolete classes :)  However, the problem of knowing what to remove when the user says "unload" means that a loaded parcel requires a data structure that names the classes and methods it loaded.  In addition we maintain overrides, the older versions of methods and class definitions, in a stack, so that these can be restored when unloading a parcel.  I made lots of mistakes here (not allowing the tools to publish a parcel that has code overridden by others, not integrating source management and browsing queries with overridden code, not compressing the changes correctly with overridden code, etc, etc).  Tests would have helped :/

VW did (does?) test for open instances of applications when we unload a parcel so that if the parcel contains a subclass(s) of ApplicationModel (VW's top-level GUI app class) all open applications are tested to see if they contain instances of the class(es) and a warning is issued.

A basic use case is: developer needs some specific tool (like UI
design tool) when he working
on application. But at the moment when he ships the application, it is
no longer needed.

Right.  I don't know of an automatic solution, but a good convention is to split all packages into a development and deployment pair where the deployment half is a prerequisite of the development half.  Sticking to the convention and using good names makes it easier to remember to remove deevelopment components and to guess which parts of someone else's components are development only.

I added a bulk instancesOf primitive that answered all instances of an Array of classes that my colleague Steve Dahl wanted to use in instance migration on class redefinition.  This could be used to look for all instances of the classes defined by a parcel prior to unload.  Do a GC, collect all instaces of classes defined (rather than redefined) by a parcel and warn if non-empty (if in a dev image).

>> Obviously one of the side of such problem is uniform object memory,
>> where each object could
>> reference any other object and limited only by a imagination of people.
>> There is no layers or any other means which could establish a certain
>> barriers (which we calling a modules)
>> in smalltalk.
>> It means, that once you integrated the parcel into image, and started
>> using it, you may have a hard times trying to unload it.
>> It is possible to develop an image as an artifact, which contains both
>> binary & sources , but such approach
>> having a drawbacks, which we, by the way, trying to overcome nowadays.
>> Practice shows that such approach is credible only
>> for a small group of individuals, but becomes a bottleneck if you
>> adopt such scheme for a wider community.
>>
>> So, i think , that before entering this domain (allowing binary data),
>> first we should solve more basic problems of smalltalk & its design -
>> modularity, name spaces, layering & etc etc.. Only the we could return
>> to original question and solve it.
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development

Jecel Assumpcao Jr
Eliot,

thanks for all your wonderful comments and insights about the Parcel
system in VisualWorks. My experience with it is extremely limited (I
once loaded idass, the chip simulation system, as a parcel into VW 5i
NC) and so I cited V/Win as an existence proof.

You are correct that an object table wouldn't help in general when
comparing two images - I was thinking of the specific case of when one
is known to be directly derived from the other like Squeak 3.8 from 3.7.
This was from the discussion of doing a security audit to allow Squeak
to be included in Debian.
Comparing unordered collections has all the complications you mentioned
and in Smalltalk this is supposed to be solved in #=, which experience
tells us not to trust too much.

For Neo Smalltalk I didn't do pure memory dumps but had a binary format
that was reasonably compressed. And it didn't have small integers but
only variable sized ones and these became SmallIntegers or LargeIntegers
when read in. That made the binary format compatible between the 16 and
36 bit versions of Neo Smalltalk.

One idea for Neo modules that Dan thought a bit excessive was to divide
each into four related modules: the actual objects (I'll call this the
"deployment module" to use your term), the sources (just a bunch of
String objects), the documentation (nicely formated text, with possibly
pictures or even movies) and the tests. In different situations you
might want different subsets of these. For example, while browsing
through SqueakMap you might click on "see more..." and get the full docs
in your machine. Then you might click on an example and create a new
object (in a new module) that would bring in the deployment module to
support it. If you ever try to look at the code for this new object in
the system browser or the debugger then the sources module would get
loaded. There would be links to the tests in the documentation but they
might also get loaded through the SUnit tool. If you close all windows
with the documentation, the doc module will eventually be unloaded.

Of course, I am supposing that objects in a module can point to objects
in a separate module in the above description. And that module
loading/unloading is a kind of crude virtual memory.

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Igor Stasenko
In reply to this post by Eliot Miranda-2
2009/8/25 Eliot Miranda <[hidden email]>:

>
>
> On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/8/25 Eliot Miranda <[hidden email]>:
>> >
>> >
>> > On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]>
>> > wrote:
>> >>
>> >> 2009/8/20 Eliot Miranda <[hidden email]>:
>> >> > Hi Igor,
>> >> >
>> >> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>:
>> >> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >> >> >>
>> >> >> >> > For example, I would far prefer to
>> >> >> >> > see Squeak move to a binary based development model (I would
>> >> >> >> > mention
>> >> >> >> > Projects and Etoys here) than the current source based things
>> >> >> >> > we
>> >> >> >> > are
>> >> >> >> > doing (trunk, bob or whatever).
>> >> >> >>
>> >> >> >> Forgive me for seizing on a throw-away comment like this, but
>> >> >> >> would
>> >> >> >> you mind expanding on this a bit? Are you saying you prefer
>> >> >> >> something
>> >> >> >> spoonish, where CompiledMethods  are passed directly from image
>> >> >> >> to
>> >> >> >> image? Something else?
>> >> >> >
>> >> >> > Heh, I got asked about this on IRC as well. Though I had actually
>> >> >> > started to explain this a little in the original email, I ended up
>> >> >> > deleting it to keep on topic. With a new subject line I don't feel
>> >> >> > I
>> >> >> > have to worry about that. Some details about this (with a few
>> >> >> > drawings)
>> >> >> > can be found in the Chunky Squeak wiki page:
>> >> >> >
>> >> >> > http://wiki.squeak.org/squeak/584
>> >> >> >
>> >> >> > The idea is to be more like the Etoys users which can load binary
>> >> >> > projects containing not only the code they need but also hand
>> >> >> > crafted
>> >> >> > objects which have no source (like a drawing, some nested Morphs
>> >> >> > or
>> >> >> > even
>> >> >> > some text). This is very simplistic compared to Spoon, and my
>> >> >> > proposal
>> >> >> > was even more simplistic. In particular, this doesn't handle the
>> >> >> > case
>> >> >> > where any changes to bytecodes or object format are needed.
>> >> >> >
>> >> >>
>> >> >> The central question, which arising immediately is, what is the
>> >> >> credible way(s) to reproduce such artifacts?
>> >> >> When we having a source code, we could (re)compile it on a different
>> >> >> system. But what you propose to do with pure binary data, a soup of
>> >> >> objects, in respect that it is incredibly hard to understand, what
>> >> >> bits you need and what's not, in case if you need to do clean-up ,
>> >> >> refactor, rewrite and simply analyze what is happening.
>> >> >> This is what making a huge difference, for instance, between
>> >> >> applications with open source code and applications shipped in
>> >> >> binary
>> >> >> form - you can only report bugs, but can't realy make any
>> >> >> suggestions
>> >> >> about what happening.
>> >> >> I don't think that developers of Squeak should be victims of such
>> >> >> situation(s).
>> >> >
>> >> >     it is possible to have your cake and eat it too.  One can create
>> >> > a
>> >> > binary format that includes source and includes the meta-source for
>> >> > its
>> >> > creation.  But including a binary representation allows much faster
>> >> > loading,
>> >> > loading without a
>> >> > compiler, and source hiding if one choses not to include the source.
>> >> >
>> >> >
>> >> > There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
>> >> > the parcel file the source pointers of compiled methods are the
>> >> > positions of
>> >> > their source in the parcel source file.  When one loads a parcel the
>> >> > SourceFileManager adds the file to its set of managed files and
>> >> > assigns
>> >> > an
>> >> > index for the source file.  The parcle loader then swizzles all the
>> >> > source
>> >> > pointers so that they include the source file index along with the
>> >> > position.
>> >> >  So accessing the source for a method loaded form a parcel accesses
>> >> > that
>> >> > parcel's source file.  We used a floating-point like format for
>> >> > source
>> >> > pointers, where the exponent was the source file index, and the
>> >> > mantissa
>> >> > was
>> >> > the position in the file.
>> >> > We didn't create a single file format, having two separate files for
>> >> > binary
>> >> > and source, which is probably a mistake.  A format with a short
>> >> > header,
>> >> > followed by source, followed by binary, followed by metasource, would
>> >> > be
>> >> > easier to manage than three separate files.
>> >> > We didn't include any metasource, but we did include pre-read, load
>> >> > and
>> >> > unload actions.  I did a very bad job on version numbering and
>> >> > prerequisite
>> >> > selection.
>> >> > That's not the whole story but enough to start answering your
>> >> > question.
>> >> >  If
>> >> > there is a well-defined definition of the objects in a package and
>> >> > that
>> >> > definition is included in the package as metasource, then one can
>> >> > comprehend
>> >> > the binary package's contents by examining the metasource and can
>> >> > reproduce
>> >> > creating the package, provided that the tools are careful to impose
>> >> > ordering, etc.
>> >> > best
>> >> > Eliot
>> >>
>> >> I think you inevitably made wrong decisions, because you went this way
>> >> by allowing an
>> >> arbitrary binary data , held by package.
>> >> In such situations it is much more easier to make a mistakes.
>> >> But sure, one who's making no mistakes is one who doing nothing :)
>> >
>> > We didn't disallow representation of arbitrary data but we also didn't
>> > support it.  The only thing the Parcel system supports (as in the tool
>> > set,
>> > rather than what one can extend the framework to do in specific
>> > circumstances) is to represent code, which it does very well.
>> > What are these mistakes?  Can you be specific?  I think the parcel
>> > system
>> > has been a major success.  VW is now deployed as a system of components,
>> > the
>> > base image and a much larger suite of parcels.  Parcels are not tied to
>> > a
>> > particular version or implementation and yet are still fast to publish
>> > and
>> > load.  What's not to like?
>>
>> I referred mainly to your own statements about mistake(s).
>
> Ah, ok,  Sorry :)
>
>>
>> I don't know about parcels so much to tell exactly where is the flaws.
>> I'm still wondering, how you could unload a parcel if its not longer
>> needed, but
>> there are still object(s) which used/created by parcel sitting in image.
>
> Smalltalk has this problem with or without binary loading; they're called
> obsolete classes :)  However, the problem of knowing what to remove when the
> user says "unload" means that a loaded parcel requires a data structure that
> names the classes and methods it loaded.  In addition we maintain overrides,
> the older versions of methods and class definitions, in a stack, so that
> these can be restored when unloading a parcel.  I made lots of mistakes here
> (not allowing the tools to publish a parcel that has code overridden by
> others, not integrating source management and browsing queries with
> overridden code, not compressing the changes correctly with overridden code,
> etc, etc).  Tests would have helped :/
> VW did (does?) test for open instances of applications when we unload a
> parcel so that if the parcel contains a subclass(s) of ApplicationModel
> (VW's top-level GUI app class) all open applications are tested to see if
> they contain instances of the class(es) and a warning is issued.
>>
>> A basic use case is: developer needs some specific tool (like UI
>> design tool) when he working
>> on application. But at the moment when he ships the application, it is
>> no longer needed.
>
> Right.  I don't know of an automatic solution, but a good convention is to
> split all packages into a development and deployment pair where
> the deployment half is a prerequisite of the development half.  Sticking to
> the convention and using good names makes it easier to remember to remove
> deevelopment components and to guess which parts of someone else's
> components are development only.

Yes, and this is what i really missing in smalltalk-80 based
environments: distinction between development
and deployment modes & models.
It would be cool to have some basic things to behave different when in
deployed mode (like preventing access & data overrides).
The main problem in open system (such as smalltalk object memory) is
that when something goes wrong, often you
having two choices: reboot the system or debug and fix the problem in
a living environment.
Often, none of the choices is acceptable, because if we are talking
about end-user application, we don't expect that
user is able to debug & fix the issue. As well as rebooting an image
means loss of data and/or interruption of serving other jobs.

But, if system modelled in modular layers , like kernel -> services ->
interfaces -> working set,  then things
would be much easier to handle.

> I added a bulk instancesOf primitive that answered all instances of an Array
> of classes that my colleague Steve Dahl wanted to use in instance migration
> on class redefinition.  This could be used to look for all instances of the
> classes defined by a parcel prior to unload.  Do a GC, collect all instaces
> of classes defined (rather than redefined) by a parcel and warn if non-empty
> (if in a dev image).

I think that independent tiny layers (isles/vats) is the future system
organization in smalltalk-like VMs.
First, it gives the strong answer to question, what belongs to what.
There is no possibility to reference a foreign object
other than by far ref. You can count/enumerate them easily, and this
approach also makes possible to run code in vats concurrently.
The problem here is how to handle the shared behavior, like Arrays,
Collections etc in order to avoid duplication. Since in smalltalk
everything is objects, and so methods & classes too, they can belong
only to a single island/vat, and therefore , only owning island can
manipulate with it. This creates a major bottleneck in effective
implementation of concurrently (and independently) running the code.
Trade space for speed? Allow each island to have own Array class with
own implementation?
This question remains open for me.

>>
>> >> Obviously one of the side of such problem is uniform object memory,
>> >> where each object could
>> >> reference any other object and limited only by a imagination of people.
>> >> There is no layers or any other means which could establish a certain
>> >> barriers (which we calling a modules)
>> >> in smalltalk.
>> >> It means, that once you integrated the parcel into image, and started
>> >> using it, you may have a hard times trying to unload it.
>> >> It is possible to develop an image as an artifact, which contains both
>> >> binary & sources , but such approach
>> >> having a drawbacks, which we, by the way, trying to overcome nowadays.
>> >> Practice shows that such approach is credible only
>> >> for a small group of individuals, but becomes a bottleneck if you
>> >> adopt such scheme for a wider community.
>> >>
>> >> So, i think , that before entering this domain (allowing binary data),
>> >> first we should solve more basic problems of smalltalk & its design -
>> >> modularity, name spaces, layering & etc etc.. Only the we could return
>> >> to original question and solve it.
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>> >>
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

Eliot Miranda-2


On Tue, Aug 25, 2009 at 12:35 PM, Igor Stasenko <[hidden email]> wrote:
2009/8/25 Eliot Miranda <[hidden email]>:
>
>
> On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> 2009/8/25 Eliot Miranda <[hidden email]>:
>> >
>> >
>> > On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <[hidden email]>
>> > wrote:
>> >>
>> >> 2009/8/20 Eliot Miranda <[hidden email]>:
>> >> > Hi Igor,
>> >> >
>> >> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> 2009/8/20 Jecel Assumpcao Jr <[hidden email]>:
>> >> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >> >> >>
>> >> >> >> > For example, I would far prefer to
>> >> >> >> > see Squeak move to a binary based development model (I would
>> >> >> >> > mention
>> >> >> >> > Projects and Etoys here) than the current source based things
>> >> >> >> > we
>> >> >> >> > are
>> >> >> >> > doing (trunk, bob or whatever).
>> >> >> >>
>> >> >> >> Forgive me for seizing on a throw-away comment like this, but
>> >> >> >> would
>> >> >> >> you mind expanding on this a bit? Are you saying you prefer
>> >> >> >> something
>> >> >> >> spoonish, where CompiledMethods  are passed directly from image
>> >> >> >> to
>> >> >> >> image? Something else?
>> >> >> >
>> >> >> > Heh, I got asked about this on IRC as well. Though I had actually
>> >> >> > started to explain this a little in the original email, I ended up
>> >> >> > deleting it to keep on topic. With a new subject line I don't feel
>> >> >> > I
>> >> >> > have to worry about that. Some details about this (with a few
>> >> >> > drawings)
>> >> >> > can be found in the Chunky Squeak wiki page:
>> >> >> >
>> >> >> > http://wiki.squeak.org/squeak/584
>> >> >> >
>> >> >> > The idea is to be more like the Etoys users which can load binary
>> >> >> > projects containing not only the code they need but also hand
>> >> >> > crafted
>> >> >> > objects which have no source (like a drawing, some nested Morphs
>> >> >> > or
>> >> >> > even
>> >> >> > some text). This is very simplistic compared to Spoon, and my
>> >> >> > proposal
>> >> >> > was even more simplistic. In particular, this doesn't handle the
>> >> >> > case
>> >> >> > where any changes to bytecodes or object format are needed.
>> >> >> >
>> >> >>
>> >> >> The central question, which arising immediately is, what is the
>> >> >> credible way(s) to reproduce such artifacts?
>> >> >> When we having a source code, we could (re)compile it on a different
>> >> >> system. But what you propose to do with pure binary data, a soup of
>> >> >> objects, in respect that it is incredibly hard to understand, what
>> >> >> bits you need and what's not, in case if you need to do clean-up ,
>> >> >> refactor, rewrite and simply analyze what is happening.
>> >> >> This is what making a huge difference, for instance, between
>> >> >> applications with open source code and applications shipped in
>> >> >> binary
>> >> >> form - you can only report bugs, but can't realy make any
>> >> >> suggestions
>> >> >> about what happening.
>> >> >> I don't think that developers of Squeak should be victims of such
>> >> >> situation(s).
>> >> >
>> >> >     it is possible to have your cake and eat it too.  One can create
>> >> > a
>> >> > binary format that includes source and includes the meta-source for
>> >> > its
>> >> > creation.  But including a binary representation allows much faster
>> >> > loading,
>> >> > loading without a
>> >> > compiler, and source hiding if one choses not to include the source.
>> >> >
>> >> >
>> >> > There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
>> >> > the parcel file the source pointers of compiled methods are the
>> >> > positions of
>> >> > their source in the parcel source file.  When one loads a parcel the
>> >> > SourceFileManager adds the file to its set of managed files and
>> >> > assigns
>> >> > an
>> >> > index for the source file.  The parcle loader then swizzles all the
>> >> > source
>> >> > pointers so that they include the source file index along with the
>> >> > position.
>> >> >  So accessing the source for a method loaded form a parcel accesses
>> >> > that
>> >> > parcel's source file.  We used a floating-point like format for
>> >> > source
>> >> > pointers, where the exponent was the source file index, and the
>> >> > mantissa
>> >> > was
>> >> > the position in the file.
>> >> > We didn't create a single file format, having two separate files for
>> >> > binary
>> >> > and source, which is probably a mistake.  A format with a short
>> >> > header,
>> >> > followed by source, followed by binary, followed by metasource, would
>> >> > be
>> >> > easier to manage than three separate files.
>> >> > We didn't include any metasource, but we did include pre-read, load
>> >> > and
>> >> > unload actions.  I did a very bad job on version numbering and
>> >> > prerequisite
>> >> > selection.
>> >> > That's not the whole story but enough to start answering your
>> >> > question.
>> >> >  If
>> >> > there is a well-defined definition of the objects in a package and
>> >> > that
>> >> > definition is included in the package as metasource, then one can
>> >> > comprehend
>> >> > the binary package's contents by examining the metasource and can
>> >> > reproduce
>> >> > creating the package, provided that the tools are careful to impose
>> >> > ordering, etc.
>> >> > best
>> >> > Eliot
>> >>
>> >> I think you inevitably made wrong decisions, because you went this way
>> >> by allowing an
>> >> arbitrary binary data , held by package.
>> >> In such situations it is much more easier to make a mistakes.
>> >> But sure, one who's making no mistakes is one who doing nothing :)
>> >
>> > We didn't disallow representation of arbitrary data but we also didn't
>> > support it.  The only thing the Parcel system supports (as in the tool
>> > set,
>> > rather than what one can extend the framework to do in specific
>> > circumstances) is to represent code, which it does very well.
>> > What are these mistakes?  Can you be specific?  I think the parcel
>> > system
>> > has been a major success.  VW is now deployed as a system of components,
>> > the
>> > base image and a much larger suite of parcels.  Parcels are not tied to
>> > a
>> > particular version or implementation and yet are still fast to publish
>> > and
>> > load.  What's not to like?
>>
>> I referred mainly to your own statements about mistake(s).
>
> Ah, ok,  Sorry :)
>
>>
>> I don't know about parcels so much to tell exactly where is the flaws.
>> I'm still wondering, how you could unload a parcel if its not longer
>> needed, but
>> there are still object(s) which used/created by parcel sitting in image.
>
> Smalltalk has this problem with or without binary loading; they're called
> obsolete classes :)  However, the problem of knowing what to remove when the
> user says "unload" means that a loaded parcel requires a data structure that
> names the classes and methods it loaded.  In addition we maintain overrides,
> the older versions of methods and class definitions, in a stack, so that
> these can be restored when unloading a parcel.  I made lots of mistakes here
> (not allowing the tools to publish a parcel that has code overridden by
> others, not integrating source management and browsing queries with
> overridden code, not compressing the changes correctly with overridden code,
> etc, etc).  Tests would have helped :/
> VW did (does?) test for open instances of applications when we unload a
> parcel so that if the parcel contains a subclass(s) of ApplicationModel
> (VW's top-level GUI app class) all open applications are tested to see if
> they contain instances of the class(es) and a warning is issued.
>>
>> A basic use case is: developer needs some specific tool (like UI
>> design tool) when he working
>> on application. But at the moment when he ships the application, it is
>> no longer needed.
>
> Right.  I don't know of an automatic solution, but a good convention is to
> split all packages into a development and deployment pair where
> the deployment half is a prerequisite of the development half.  Sticking to
> the convention and using good names makes it easier to remember to remove
> deevelopment components and to guess which parts of someone else's
> components are development only.

Yes, and this is what i really missing in smalltalk-80 based
environments: distinction between development
and deployment modes & models.
It would be cool to have some basic things to behave different when in
deployed mode (like preventing access & data overrides).
The main problem in open system (such as smalltalk object memory) is
that when something goes wrong, often you
having two choices: reboot the system or debug and fix the problem in
a living environment.
Often, none of the choices is acceptable, because if we are talking
about end-user application, we don't expect that
user is able to debug & fix the issue. As well as rebooting an image
means loss of data and/or interruption of serving other jobs.

Yes, I agree.  One of the things the headless support in VW allows which is quite nice is taking a shapshot which can then be restarted in a headless mode for debugging.  This can easily be mailed or ftp'ed back for analysis.

Not quite the same, but very neat:  The other day at Qwaq Craig Latta had a VM crash while running in a Parallels Linux VM under gdb.  He was able to give me a copy of the VM snapshot at the point where gdb stopped the process, giving me the opportunity to debug the live app at my leisure.  A cool idea.



But, if system modelled in modular layers , like kernel -> services ->
interfaces -> working set,  then things
would be much easier to handle.

Yes, yes, yes!!  The system should be like an onion where each layer of the onion is a set of interlocking techtonic plates of modules of functionality.



> I added a bulk instancesOf primitive that answered all instances of an Array
> of classes that my colleague Steve Dahl wanted to use in instance migration
> on class redefinition.  This could be used to look for all instances of the
> classes defined by a parcel prior to unload.  Do a GC, collect all instaces
> of classes defined (rather than redefined) by a parcel and warn if non-empty
> (if in a dev image).

I think that independent tiny layers (isles/vats) is the future system
organization in smalltalk-like VMs.
First, it gives the strong answer to question, what belongs to what.
There is no possibility to reference a foreign object
other than by far ref. You can count/enumerate them easily, and this
approach also makes possible to run code in vats concurrently.
The problem here is how to handle the shared behavior, like Arrays,
Collections etc in order to avoid duplication. Since in smalltalk
everything is objects, and so methods & classes too, they can belong
only to a single island/vat, and therefore , only owning island can
manipulate with it. This creates a major bottleneck in effective
implementation of concurrently (and independently) running the code.
Trade space for speed? Allow each island to have own Array class with
own implementation?
This question remains open for me.

Yes, this is a cool radical idea that I haven't got my head around yet.  I need to think about this at length.  The obvious approach to the duplication is copy-on-write where any modifications to the root Array class get propagated to the copies, assuming there is some hierarchical control organization.  I think this approach is taken in Alex's worlds where modifications to a parent world are seen my children.  But then the merge problem rears its head when trying to propagate modifications to a child that has made its own local modifications in the same region.



>>
>> >> Obviously one of the side of such problem is uniform object memory,
>> >> where each object could
>> >> reference any other object and limited only by a imagination of people.
>> >> There is no layers or any other means which could establish a certain
>> >> barriers (which we calling a modules)
>> >> in smalltalk.
>> >> It means, that once you integrated the parcel into image, and started
>> >> using it, you may have a hard times trying to unload it.
>> >> It is possible to develop an image as an artifact, which contains both
>> >> binary & sources , but such approach
>> >> having a drawbacks, which we, by the way, trying to overcome nowadays.
>> >> Practice shows that such approach is credible only
>> >> for a small group of individuals, but becomes a bottleneck if you
>> >> adopt such scheme for a wider community.
>> >>
>> >> So, i think , that before entering this domain (allowing binary data),
>> >> first we should solve more basic problems of smalltalk & its design -
>> >> modularity, name spaces, layering & etc etc.. Only the we could return
>> >> to original question and solve it.
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>> >>
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] binary development (was: 3.11 and the trunk)

David Farber

On Aug 25, 2009, at 2:28 PM, Eliot Miranda wrote:

> One of the things the headless support in VW allows which is quite  
> nice is taking a shapshot which can then be restarted in a headless  
> mode for debugging.

Eliot - Was this VM or image-side support?  Can you describe how it  
worked?

> Not quite the same, but very neat:  The other day at Qwaq Craig  
> Latta had a VM crash while running in a Parallels Linux VM under  
> gdb.  He was able to give me a copy of the VM snapshot at the point  
> where gdb stopped the process, giving me the opportunity to debug  
> the live app at my leisure.  A cool idea.

In other words, he was already running the Qwaq VM under gdb, so when  
the Qwaq VM crashed (and left him at a gdb prompt), he simply  
suspended the Parallels Linux VM and sent you a copy of the suspended  
Parallels Linux VM.  Is that right?

David


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] sharing (was: binary development)

Jecel Assumpcao Jr
In reply to this post by Eliot Miranda-2
Eliot Miranda wrote:

> Yes, this is a cool radical idea that I haven't got my head around yet.
>  I need to think about this at length.  The obvious approach to the
> duplication is copy-on-write where any modifications to the root
> Array class get propagated to the copies, assuming there is some
> hierarchical control organization.  I think this approach is taken in
> Alex's worlds where modifications to a parent world are seen my
> children.  But then the merge problem rears its head when trying
> to propagate modifications to a child that has made its own local
> modifications in the same region.

I have some papers about this (1992 and 1993) but since they are in
Portuguese it makes no sense for me to point them out. The basic idea is
the MESI cache coherence protocol from bus based multiprocessors
(network based multiprocessors normally use directory based schemes
which are closer to what we want but harder to explain so I will start
out with MESI).

Any given cache line, or object in our case, can be either Invalid
(meaning the local node doesn't have a copy), Exclusive (the local node
has a copy and knows that nobody else does), Shared (there is a local
copy and possibly other nodes also have copies) and Modified (the local
copy has been changed and must be saved to main memory).

- You can go from I to either E or S (which one depends on what the
other caches say) by fetching a copy.
- You can go from E to S if you see anybody else fetch a copy.
- You can go from S to E by asking everybody else go from S to I and
inform you they have done so.
- You can go from E to M by writing to your copy.
- You can go from M to E by saving your copy to main memory.
- You can go from E or S to I if you need to reuse the cache line for
other data.

No other transitions are allowed (perhaps this would be far easier to
understand as a drawing?). This scheme doesn't need to merge since there
is at most one changed copy at any given time. This restricts
parallelism compared to multiple worlds, but is compatible with our
current semantics.

Note that David Ungar showed something very similar to this actually
running in Squeak on 56 processors at least year's OOPSLA and the movie
of his demo (thanks, Göran!) is available online (at
http://siliconsqueak.org among other places).

-- Jecel


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] sharing

Randal L. Schwartz
In reply to this post by Eliot Miranda-2
>>>>> "Jecel" == Jecel Assumpcao <[hidden email]> writes:

Jecel> - You can go from I to either E or S (which one depends on what the
Jecel> other caches say) by fetching a copy.
Jecel> - You can go from E to S if you see anybody else fetch a copy.
Jecel> - You can go from S to E by asking everybody else go from S to I and
Jecel> inform you they have done so.
Jecel> - You can go from E to M by writing to your copy.
Jecel> - You can go from M to E by saving your copy to main memory.
Jecel> - You can go from E or S to I if you need to reuse the cache line for
Jecel> other data.

    digraph Jecel {
      I -> {E; S} [label = "fetch"];
      E -> S [label = "other fetch"];
      S -> E [label = "force other S->I"];
      E -> M [label = "write"];
      M -> E [label = "save"];
      {E; S} -> I [label = "reuse"];
    }

Name it "jecel.dot", read it into OmniGraffle or any Graphviz Tool.

:)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

123