[squeak-dev] second call for feedback on Naiad design

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] second call for feedback on Naiad design

ccrraaiigg

Hi--

      This is another call for feedback on the design of Naiad[1], a
Smalltalk module system I'm writing for Squeak as part of the Spoon
project[2].

      On the theory that I'll get more of a response by including the
whole text rather than a link to it, here it is... :)

***

2008-10-20, 1946 GMT

Copyright (c) 2008 Craig Latta. All rights reserved.


Hi--

      I've been on a quest to make Squeak smaller and more modular, the
Spoon project[1]. Part one was making the object memory small. Part
three is about making the virtual machine small. This message is about
part two, making a module system suitable for adding new behavior to a
minimal system in an organized way, and for transferring behavior
accurately between running systems.

      Spoon's module system is called "Naiad", which is an acronym for
"Name And Identity Are Distinct". It keeps track of the development
history of a system (what the "sources" and "changes" files are for
now), and makes it available for exchange with other systems. I think
keeping classes' names and identities separate is critical for
this. Following are some notes on its design and use, including the
object model[1].

      At this point I'd like to emphasize I am the author of this
design, that I intend to release its implementation under an MIT-style
license, and that I'd like to pursue a graduate degree with it (I'm
open to invitations :).

***

motivation

      A traditional Smalltalk system uses source code to express both
development history and changes exchanged between systems. The precise
meaning of source code depends on the current state of the system
compiling it. Since a Smalltalk system is dynamic, source code is an
inherently ambiguous medium across time.

      The most problematic system artifacts in light of this ambiguity
are classes. All activity in a Smalltalk system is the result of
sending messages to objects. The sending of a message invokes the
execution of a method, a sequence of instructions for a virtual
processor. Some of these instructions manipulate the state of the
object receiving the message. Classes define the structure of that
state. Therefore, when those class definitions change, the source code
for the methods of those classes may become meaningless.

      One may confront this situation when trying to recompile source
code for an old version of a method whose class definition has changed
in the meantime. Similarly, source code from one system may not be
meaningful on another, since corresponding class definitions on each
system may change independently (or be removed entirely).

      This means that the accurate exchange of behavior requires manual
labor, hindering the propagation of useful fixes and new code. It also
means that interpretation and use of historical code is more difficult
than necessary. So we pay twice for this problem: when learning the
system, and when trying to share our work with others. By separating
class name from identity, Naiad makes Smalltalk more approachable for
newcomers, and more productive for developer and user communities.


editions


      Using Naiad, each development system consists of two object
memories: one containing developed code, and another containing
"editions" which describe that code. I'll call the first one the
"subject memory" and the other the "history memory".

      An Edition is a description of some artifact in the subject
memory at some point in time, currently an author, comment, tag,
class, method, module, checkpoint, or edit. Each edition has a
reference to that artifact's next state in the future (the next
edition) and in the past (the previous edition), as well as an author
edition, a collection of licenses, and a timestamp.

      An Edit represents the activation of some edition at a point in
time. For example, there may be a method created in 2005 that is
removed in 2006 and reactivated in 2007. There would be an Edit for
each of those three events, but only two method editions (one
representing the method becoming active, and one representing it being
removed).

      The history memory replaces the current changes and sources
files. It has an instance of EditHistory corresponding to the subject
memory, which records the active (current) editions for the classes,
method, modules, and authors in the subject memory. It also keeps the
subject memory's id and the last Edit made to the subject memory.

      Every time the subject memory adds, changes, or removes a class
definition, method, author, comment, tag, or module, or makes a
checkpoint (i.e., makes an edit), it adds the appropriate editions to
the history memory via remote messages. The history memory snapshots
itself after every edit, so as to provide crash recovery support.

      The subject memory keeps a remote reference to the history
memory's instance of EditHistory as a class variable of the local
EditHistory class, and interacts with it using utility messages sent
to the local EditHistory class. The history memory also keeps that
EditHistory instance as a class variable of its local EditHistory
class, but as a local reference.

      An edition typically elides some of its references when it is
transferred out of a history memory. For example, a transferred
edition will usually omit the references to its next and previous
editions. The requesting subject memory can calculate the ID of those
editions and obtain them with a separate request, if necessary.

      A subject memory may elect to keep its EditHistory instance as a
local object, such as in a situation where one wants some limited
immutable history for debugging purposes, and no crash recovery
support. Whether in this scenario or in normal development the same
EditHistory utility messages suffice, since no special code need be
written to support remote objects. If no edits will be made during
deployment, and no history retrieval is required, one may simply
jettison the history memory. One may always reconnect the subject and
history memories at a later time and continue development.

      The subject memory has tools for browsing and activating the
editions, wherever they are located. This means that no special tools
are needed to browse the artifacts of multiple subject systems; one
uses the same tools as for browsing the artifacts of the local subject
memory. Each subject memory may connect to multiple history memories
concurrently (if allowed).

      For that matter, the history memories of multiple systems may
connect to each other directly, to aggregate editions from multiple
people, for example.


class and method IDs


      Each class in the subject memory has a universally-unique
identifier[3], or UUID. The classes in the minimal subject memory are
assigned UUIDs before the initial release, and all subsequent classes
are assigned UUIDs when created. Rather than use the single word
"class" to refer to either a metaclass or to its sole instance, Spoon
introduces the term "protoclass". For example, (Array class) is a
metaclass, and its sole instance, Array, is a protoclass. Each
metaclass and protoclass has its own UUID, called a "base ID". This is
supported by a new instance variable in ClassDescription.

      Each version of each class is identified by a ClassID, a byte
array with segments for the class's baseID, author UUID, and a
sixteen-bit version. This means we can uniquely identify, for each
author, 65,535 versions of each class in the system. Since we identify
authors by UUID, the number of possible authors is very large.

      Each version of each method is identified by a MethodID, a byte
array which contains a ClassID and segments for the method's selector,
author UUID, and a sixteen-bit version. This means we can uniquely
identify, for each author, 65,535 versions of each method in each
version of each class in the system.


method editions and method literal markers


      Each MethodEdition holds a reference to the corresponding
ClassEdition, the method source code, and the information needed to
reconstruct the corresponding CompiledMethod directly, without need of
the compiler (the method header, initial and final program-counter
values, method literal markers, and instructions). If one will never
use the history memory to install methods in a subject memory that
lacks a compiler, one could drop the compiled method information to
save space.

      Method literal markers are used to transmit a compiled method's
literal frame values between object memories. There are method literal
marker classes to support references to classes, class variables,
other pool variables, and literal objects, and to support methods
which perform class-side super-sends. Each method literal marker
instance knows how to serialize itself as part of Spoon's remote
messaging system. In particular, when a method literal that refers to
a class transmits itself, it transmits the ClassID of that class, not
the name of the class.

      This gets at the namesake concept of Naiad, "Name And Identity
Are Distinct". When referring to a class, we never need to use its
name. Each version of each class is an object with a distinct
identity. By using ClassIDs to refer to each of them, we can avoid
using class names at all when storing history or distributing
code. This means that name of each class can be anything, as far as
the system is concerned.

      With every class name unconstrained, there is no need for
"namespaces" to distinguish between classes which happen to have same
name at some point in time. Each class effectively has its own
namespace, since it is uniquely identifiable regardless of its
name.

      Developer tools armed with this information can resolve ambiguity
for humans browsing and changing the system. If a developer writes a
method which uses a name shared by multiple classes, the system can
present more information about each of those classes (such as the
author, time of creation, version, and module association), so that
the developer can choose the intended one. When browsing such a
method, the system can distinguish the aliased class name visually,
indicating that there is disambiguating information available.


class editions and shared variables


      Each ClassEdition holds the editions for all the method versions
currently active in the corresponding class in the subject
memory. Since every edition keeps a reference to its previous and next
editions, one can trace the history of any method by starting at the
active edition. Removed methods are represented by method editions
which have the same MethodID as a normal previous method edition, but
with the rest of the fields set to nil.

      Each ClassEdition also holds the information needed to
reconstruct the corresponding class directly, without need of the
class builder. For all classes, this includes the format, instance
variable names, and superclass ID. For protoclasses, it also includes
the class pool keys, class name, and received pool IDs.

      In Spoon, every shared variable pool is the responsibility of
some class in the system. There is no global variables pool ("system
dictionary"). Each class that defines a pool is said to "publish" that
pool; classes which use that pool "receive" it. Spoon adds an instance
variable to Class to map published pools to their names. Each
ReceivedPoolID that a protoclass edition uses is a byte array which
contains a class ID and a published pool name.


checkpoints and modules


      A Checkpoint edition is simply a named marker of a particular
point in time. A developer may use checkpoints to indicate various
interesting states of development, and use the tools to regress or
replay edits made before or after that time.

      The largest unit of work is represented by module editions. They
are named collections of method IDs, indicating the specific versions
of methods which comprise a module, along with sets of child, parent,
prerequisite, and postrequisite module editions. When a module edition
is transferred out of a history memory, those edition references are
transmitted as ModuleIDs. Each module edition also has an
"antimodule", a module edition calculated at installation time by a
receiving system which, if applied, would undo the changes made by
installing the original module. Finally, each module edition has a URI
by which someone at a remote site may install the module.

      That URI represents a command to a Spoon system running on a
requestor's local machine; it refers to a standard port on
localhost. Its path is a text-encoded action, containing an
instruction (in this case "install a module"), the hostname and port
of a Spoon system providing the module, and the module's ID. The
receiving system uses this information to request the module from a
providing history memory, which then transmits editions as
necessary. Exactly which editions are transmitted depends on the state
of the receiving system; this is a two-way conversation between the
providing and receiving systems. This is often more time and space
efficient than simply providing all of a module's code, which is what
happens with traditional static representations like change sets.

      The URIs may be cited on ordinary webpages, which are indexed by
search engines like Google. A person in search of a module for a
particular purpose can search for it with a web browser, using those
search engines. Having found a module's URI, the person can click on
it, establishing a connection to an embedded webserver in their local
Spoon system, which carries out the URI's command.

      This mechanism for code distribution avoids storing code in
static files. It's a deparature from Smalltalk's traditional "fileout"
mechanism.

      The encoded URIs can serve other functions as well, such as
listing a system's installed modules, removing an installed module,
making a snapshot, and quitting the system. In this way one can use a
web browser to interact with a Spoon system for several basic tasks;
this is especially useful when the system is headless (e.g., in its
initial minimal state).


comments and tags


      Editions for authors, classes, methods, checkpoints, edits, and
modules each have their own comment and tag editions. This means each
one of those artifacts has a comment and tags, and the changes in both
are recorded over time. Comments are as we've already been using them:
they're explanatory prose about the artifacts. Tags may be familiar to
you from the web; they are short semantic markers used for grouping
similar artifacts.

      I intend for tags to replace class and method
categories. Nominally, we've been using class and method categories to
establish semantic hierarchies, but the hierarchies have turned out to
be quite shallow. Although we can form hierarchies with tags as well,
I think we would do better to apply the sorts of algorithms that
search engines use, and not concern ourselves with memorizing an
artifact's semantic markers. The computational cost this incurs for
the tools might have been high in the early days of Smalltalk, but it
is quite modest now.


      Thanks for reading! Please let me know of any questions or other
feedback, and feel free to discuss this on the Spoon and Squeak-dev
mailing lists.


-C

[1] http://netjam.org/spoon/naiad
[2] http://netjam.org/spoon
[3] http://en.wikipedia.org/wiki/Universally_Unique_Identifier

--
Craig Latta
improvisational musical informaticist
www.netjam.org
Smalltalkers do: [:it | All with: Class, (And love: it)]



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] second call for feedback on Naiad design

Karl Ramberg
I can only say I look forward to testing this.
One question:
When this system works, will not image size be a issue, like a ever
growing web browser cashe that have no size limit ?


Karl

On 11/19/08, Craig Latta <[hidden email]> wrote:

>
> Hi--
>
>       This is another call for feedback on the design of Naiad[1], a
> Smalltalk module system I'm writing for Squeak as part of the Spoon
> project[2].
>
>       On the theory that I'll get more of a response by including the
> whole text rather than a link to it, here it is... :)
>
> ***
>
> 2008-10-20, 1946 GMT
>
> Copyright (c) 2008 Craig Latta. All rights reserved.
>
>
> Hi--
>
>       I've been on a quest to make Squeak smaller and more modular, the
> Spoon project[1]. Part one was making the object memory small. Part
> three is about making the virtual machine small. This message is about
> part two, making a module system suitable for adding new behavior to a
> minimal system in an organized way, and for transferring behavior
> accurately between running systems.
>
>       Spoon's module system is called "Naiad", which is an acronym for
> "Name And Identity Are Distinct". It keeps track of the development
> history of a system (what the "sources" and "changes" files are for
> now), and makes it available for exchange with other systems. I think
> keeping classes' names and identities separate is critical for
> this. Following are some notes on its design and use, including the
> object model[1].
>
>       At this point I'd like to emphasize I am the author of this
> design, that I intend to release its implementation under an MIT-style
> license, and that I'd like to pursue a graduate degree with it (I'm
> open to invitations :).
>
> ***
>
> motivation
>
>       A traditional Smalltalk system uses source code to express both
> development history and changes exchanged between systems. The precise
> meaning of source code depends on the current state of the system
> compiling it. Since a Smalltalk system is dynamic, source code is an
> inherently ambiguous medium across time.
>
>       The most problematic system artifacts in light of this ambiguity
> are classes. All activity in a Smalltalk system is the result of
> sending messages to objects. The sending of a message invokes the
> execution of a method, a sequence of instructions for a virtual
> processor. Some of these instructions manipulate the state of the
> object receiving the message. Classes define the structure of that
> state. Therefore, when those class definitions change, the source code
> for the methods of those classes may become meaningless.
>
>       One may confront this situation when trying to recompile source
> code for an old version of a method whose class definition has changed
> in the meantime. Similarly, source code from one system may not be
> meaningful on another, since corresponding class definitions on each
> system may change independently (or be removed entirely).
>
>       This means that the accurate exchange of behavior requires manual
> labor, hindering the propagation of useful fixes and new code. It also
> means that interpretation and use of historical code is more difficult
> than necessary. So we pay twice for this problem: when learning the
> system, and when trying to share our work with others. By separating
> class name from identity, Naiad makes Smalltalk more approachable for
> newcomers, and more productive for developer and user communities.
>
>
> editions
>
>
>       Using Naiad, each development system consists of two object
> memories: one containing developed code, and another containing
> "editions" which describe that code. I'll call the first one the
> "subject memory" and the other the "history memory".
>
>       An Edition is a description of some artifact in the subject
> memory at some point in time, currently an author, comment, tag,
> class, method, module, checkpoint, or edit. Each edition has a
> reference to that artifact's next state in the future (the next
> edition) and in the past (the previous edition), as well as an author
> edition, a collection of licenses, and a timestamp.
>
>       An Edit represents the activation of some edition at a point in
> time. For example, there may be a method created in 2005 that is
> removed in 2006 and reactivated in 2007. There would be an Edit for
> each of those three events, but only two method editions (one
> representing the method becoming active, and one representing it being
> removed).
>
>       The history memory replaces the current changes and sources
> files. It has an instance of EditHistory corresponding to the subject
> memory, which records the active (current) editions for the classes,
> method, modules, and authors in the subject memory. It also keeps the
> subject memory's id and the last Edit made to the subject memory.
>
>       Every time the subject memory adds, changes, or removes a class
> definition, method, author, comment, tag, or module, or makes a
> checkpoint (i.e., makes an edit), it adds the appropriate editions to
> the history memory via remote messages. The history memory snapshots
> itself after every edit, so as to provide crash recovery support.
>
>       The subject memory keeps a remote reference to the history
> memory's instance of EditHistory as a class variable of the local
> EditHistory class, and interacts with it using utility messages sent
> to the local EditHistory class. The history memory also keeps that
> EditHistory instance as a class variable of its local EditHistory
> class, but as a local reference.
>
>       An edition typically elides some of its references when it is
> transferred out of a history memory. For example, a transferred
> edition will usually omit the references to its next and previous
> editions. The requesting subject memory can calculate the ID of those
> editions and obtain them with a separate request, if necessary.
>
>       A subject memory may elect to keep its EditHistory instance as a
> local object, such as in a situation where one wants some limited
> immutable history for debugging purposes, and no crash recovery
> support. Whether in this scenario or in normal development the same
> EditHistory utility messages suffice, since no special code need be
> written to support remote objects. If no edits will be made during
> deployment, and no history retrieval is required, one may simply
> jettison the history memory. One may always reconnect the subject and
> history memories at a later time and continue development.
>
>       The subject memory has tools for browsing and activating the
> editions, wherever they are located. This means that no special tools
> are needed to browse the artifacts of multiple subject systems; one
> uses the same tools as for browsing the artifacts of the local subject
> memory. Each subject memory may connect to multiple history memories
> concurrently (if allowed).
>
>       For that matter, the history memories of multiple systems may
> connect to each other directly, to aggregate editions from multiple
> people, for example.
>
>
> class and method IDs
>
>
>       Each class in the subject memory has a universally-unique
> identifier[3], or UUID. The classes in the minimal subject memory are
> assigned UUIDs before the initial release, and all subsequent classes
> are assigned UUIDs when created. Rather than use the single word
> "class" to refer to either a metaclass or to its sole instance, Spoon
> introduces the term "protoclass". For example, (Array class) is a
> metaclass, and its sole instance, Array, is a protoclass. Each
> metaclass and protoclass has its own UUID, called a "base ID". This is
> supported by a new instance variable in ClassDescription.
>
>       Each version of each class is identified by a ClassID, a byte
> array with segments for the class's baseID, author UUID, and a
> sixteen-bit version. This means we can uniquely identify, for each
> author, 65,535 versions of each class in the system. Since we identify
> authors by UUID, the number of possible authors is very large.
>
>       Each version of each method is identified by a MethodID, a byte
> array which contains a ClassID and segments for the method's selector,
> author UUID, and a sixteen-bit version. This means we can uniquely
> identify, for each author, 65,535 versions of each method in each
> version of each class in the system.
>
>
> method editions and method literal markers
>
>
>       Each MethodEdition holds a reference to the corresponding
> ClassEdition, the method source code, and the information needed to
> reconstruct the corresponding CompiledMethod directly, without need of
> the compiler (the method header, initial and final program-counter
> values, method literal markers, and instructions). If one will never
> use the history memory to install methods in a subject memory that
> lacks a compiler, one could drop the compiled method information to
> save space.
>
>       Method literal markers are used to transmit a compiled method's
> literal frame values between object memories. There are method literal
> marker classes to support references to classes, class variables,
> other pool variables, and literal objects, and to support methods
> which perform class-side super-sends. Each method literal marker
> instance knows how to serialize itself as part of Spoon's remote
> messaging system. In particular, when a method literal that refers to
> a class transmits itself, it transmits the ClassID of that class, not
> the name of the class.
>
>       This gets at the namesake concept of Naiad, "Name And Identity
> Are Distinct". When referring to a class, we never need to use its
> name. Each version of each class is an object with a distinct
> identity. By using ClassIDs to refer to each of them, we can avoid
> using class names at all when storing history or distributing
> code. This means that name of each class can be anything, as far as
> the system is concerned.
>
>       With every class name unconstrained, there is no need for
> "namespaces" to distinguish between classes which happen to have same
> name at some point in time. Each class effectively has its own
> namespace, since it is uniquely identifiable regardless of its
> name.
>
>       Developer tools armed with this information can resolve ambiguity
> for humans browsing and changing the system. If a developer writes a
> method which uses a name shared by multiple classes, the system can
> present more information about each of those classes (such as the
> author, time of creation, version, and module association), so that
> the developer can choose the intended one. When browsing such a
> method, the system can distinguish the aliased class name visually,
> indicating that there is disambiguating information available.
>
>
> class editions and shared variables
>
>
>       Each ClassEdition holds the editions for all the method versions
> currently active in the corresponding class in the subject
> memory. Since every edition keeps a reference to its previous and next
> editions, one can trace the history of any method by starting at the
> active edition. Removed methods are represented by method editions
> which have the same MethodID as a normal previous method edition, but
> with the rest of the fields set to nil.
>
>       Each ClassEdition also holds the information needed to
> reconstruct the corresponding class directly, without need of the
> class builder. For all classes, this includes the format, instance
> variable names, and superclass ID. For protoclasses, it also includes
> the class pool keys, class name, and received pool IDs.
>
>       In Spoon, every shared variable pool is the responsibility of
> some class in the system. There is no global variables pool ("system
> dictionary"). Each class that defines a pool is said to "publish" that
> pool; classes which use that pool "receive" it. Spoon adds an instance
> variable to Class to map published pools to their names. Each
> ReceivedPoolID that a protoclass edition uses is a byte array which
> contains a class ID and a published pool name.
>
>
> checkpoints and modules
>
>
>       A Checkpoint edition is simply a named marker of a particular
> point in time. A developer may use checkpoints to indicate various
> interesting states of development, and use the tools to regress or
> replay edits made before or after that time.
>
>       The largest unit of work is represented by module editions. They
> are named collections of method IDs, indicating the specific versions
> of methods which comprise a module, along with sets of child, parent,
> prerequisite, and postrequisite module editions. When a module edition
> is transferred out of a history memory, those edition references are
> transmitted as ModuleIDs. Each module edition also has an
> "antimodule", a module edition calculated at installation time by a
> receiving system which, if applied, would undo the changes made by
> installing the original module. Finally, each module edition has a URI
> by which someone at a remote site may install the module.
>
>       That URI represents a command to a Spoon system running on a
> requestor's local machine; it refers to a standard port on
> localhost. Its path is a text-encoded action, containing an
> instruction (in this case "install a module"), the hostname and port
> of a Spoon system providing the module, and the module's ID. The
> receiving system uses this information to request the module from a
> providing history memory, which then transmits editions as
> necessary. Exactly which editions are transmitted depends on the state
> of the receiving system; this is a two-way conversation between the
> providing and receiving systems. This is often more time and space
> efficient than simply providing all of a module's code, which is what
> happens with traditional static representations like change sets.
>
>       The URIs may be cited on ordinary webpages, which are indexed by
> search engines like Google. A person in search of a module for a
> particular purpose can search for it with a web browser, using those
> search engines. Having found a module's URI, the person can click on
> it, establishing a connection to an embedded webserver in their local
> Spoon system, which carries out the URI's command.
>
>       This mechanism for code distribution avoids storing code in
> static files. It's a deparature from Smalltalk's traditional "fileout"
> mechanism.
>
>       The encoded URIs can serve other functions as well, such as
> listing a system's installed modules, removing an installed module,
> making a snapshot, and quitting the system. In this way one can use a
> web browser to interact with a Spoon system for several basic tasks;
> this is especially useful when the system is headless (e.g., in its
> initial minimal state).
>
>
> comments and tags
>
>
>       Editions for authors, classes, methods, checkpoints, edits, and
> modules each have their own comment and tag editions. This means each
> one of those artifacts has a comment and tags, and the changes in both
> are recorded over time. Comments are as we've already been using them:
> they're explanatory prose about the artifacts. Tags may be familiar to
> you from the web; they are short semantic markers used for grouping
> similar artifacts.
>
>       I intend for tags to replace class and method
> categories. Nominally, we've been using class and method categories to
> establish semantic hierarchies, but the hierarchies have turned out to
> be quite shallow. Although we can form hierarchies with tags as well,
> I think we would do better to apply the sorts of algorithms that
> search engines use, and not concern ourselves with memorizing an
> artifact's semantic markers. The computational cost this incurs for
> the tools might have been high in the early days of Smalltalk, but it
> is quite modest now.
>
>
>       Thanks for reading! Please let me know of any questions or other
> feedback, and feel free to discuss this on the Spoon and Squeak-dev
> mailing lists.
>
>
> -C
>
> [1] http://netjam.org/spoon/naiad
> [2] http://netjam.org/spoon
> [3] http://en.wikipedia.org/wiki/Universally_Unique_Identifier
>
> --
> Craig Latta
> improvisational musical informaticist
> www.netjam.org
> Smalltalkers do: [:it | All with: Class, (And love: it)]
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] second call for feedback on Naiad design

Michael van der Gulik-2
In reply to this post by ccrraaiigg


On Thu, Nov 20, 2008 at 9:27 AM, Craig Latta <[hidden email]> wrote:

Hi--

    This is another call for feedback on the design of Naiad[1], a Smalltalk module system I'm writing for Squeak as part of the Spoon project[2].

Hi Craig.

I think the main reason people aren't commenting is because that's a lot of reading!

My thoughts are:

* Separating name and identity is a very, very good start. But obviously you realise this :-).

* Perhaps "versions" is a better name than "editions"? That's the name we're more familiar with.

* Do we need to run two instances of Squeak to edit code, one for the current version and one for managing the edit history? I assume that's what you mean by needing two object memories. If so, is it intended for the edit history object memory to be a live central repository shared by developers?

* Does the system work if it can't contact the edit history object memory?

* What do your remote references look like? How stable are they? Do they rely on, e.g. IP address to find a remote object memory? If somebody changes IP, are the remote references still valid?

* I assume a class now contains a ClassID and a collection of MethodIDs?

* Why is ClassID so complex? Why not just assign each class a new UUID for each new version of that class, with authorship and versioning being metadata of that class?

* Limiting to 65,536 versions per author is going to create problems in 10 years time.

* Isn't having the author and version in the unique IDs going to cause conflict problems? What happens if the author is careless and ends up with two different versions of a method with the same unique identifier?

* Are author UUIDs going to be able to be looked up to get email addresses and names somehow?

* Methods shouldn't have an author. The changes between methods versions/editions should have an author.

I think you're taking the "minimal memory usage" idea too far. In my design for distributable packages, I've made a couple of different design decisions:

* Packages (cf: classes in Naiad) are immutable and copy-on-write. When a Package (containing classes and methods) is completed and ready for distribution, it is made read-only and assigned a new UUID. If somebody wants to modify that Package, they need to (deep-)copy it and assign it a new UUID. In an object memory with two versions of the same package loaded, the two packages would exist as near identical copies of each other. This is a waste of memory, sure, but memory is cheap and this scheme is much simpler than fiddling around with sharing different class and method versions in two similar packages.

* I've separated source from bytecodes. A Package object contains only structural information and bytecodes (or it will when I've implemented that). The source code is managed by a completely separate pluggable system over which a compiler is run to produce a Package object. In this way, people have a lot more flexibility to change the way that source code is managed. Authorship and versioning information is all moved out to the source code managing system.

My opinion in general is that Naiad is a very interesting system. It certainly solves a lot of problems in Squeak, and adds new functionality. It'll be interesting to see how it pans out.

I'm not sure it's a good idea to propose an unstable system as the next version of Squeak though. I would prefer that the system is first stable, tested and used before it is pushed into the community. I would be wary of causing a "change-shock" in the community.

Gulik.

--
http://people.squeakfoundation.org/person/mikevdg
http://gulik.pbwiki.com/


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] second call for feedback on Naiad design

Wolfgang Eder
In reply to this post by ccrraaiigg
Hi Craig et al,
I have read this, and its a very interesting read!
For me the main issue is the protocol that is used
between the two images (subject and history).
There is little written about it.
Just for thought, what if the history memory would
be a web server. What would the protocol look like?
Can the low-level protocol be hacked to support this?

And one thing I am suspicious is that there is so
much knowledge in the IDs. And limits to the maximum
number of editions etc. I'd rather have proper
objects that those IDs, with LargeIntegers :-).

Just my 2 cents,
thanks,
Wolfgang


Craig Latta wrote:

>
> Hi--
>
>      This is another call for feedback on the design of Naiad[1], a
> Smalltalk module system I'm writing for Squeak as part of the Spoon
> project[2].
>
>      On the theory that I'll get more of a response by including the
> whole text rather than a link to it, here it is... :)
>
> ***
>
> 2008-10-20, 1946 GMT
>
> Copyright (c) 2008 Craig Latta. All rights reserved.
>
>
> Hi--
>
>      I've been on a quest to make Squeak smaller and more modular, the
> Spoon project[1]. Part one was making the object memory small. Part
> three is about making the virtual machine small. This message is about
> part two, making a module system suitable for adding new behavior to a
> minimal system in an organized way, and for transferring behavior
> accurately between running systems.
>
>      Spoon's module system is called "Naiad", which is an acronym for
> "Name And Identity Are Distinct". It keeps track of the development
> history of a system (what the "sources" and "changes" files are for
> now), and makes it available for exchange with other systems. I think
> keeping classes' names and identities separate is critical for
> this. Following are some notes on its design and use, including the
> object model[1].
>
>      At this point I'd like to emphasize I am the author of this
> design, that I intend to release its implementation under an MIT-style
> license, and that I'd like to pursue a graduate degree with it (I'm
> open to invitations :).
>
> ***
>
> motivation
>
>      A traditional Smalltalk system uses source code to express both
> development history and changes exchanged between systems. The precise
> meaning of source code depends on the current state of the system
> compiling it. Since a Smalltalk system is dynamic, source code is an
> inherently ambiguous medium across time.
>
>      The most problematic system artifacts in light of this ambiguity
> are classes. All activity in a Smalltalk system is the result of
> sending messages to objects. The sending of a message invokes the
> execution of a method, a sequence of instructions for a virtual
> processor. Some of these instructions manipulate the state of the
> object receiving the message. Classes define the structure of that
> state. Therefore, when those class definitions change, the source code
> for the methods of those classes may become meaningless.
>
>      One may confront this situation when trying to recompile source
> code for an old version of a method whose class definition has changed
> in the meantime. Similarly, source code from one system may not be
> meaningful on another, since corresponding class definitions on each
> system may change independently (or be removed entirely).
>
>      This means that the accurate exchange of behavior requires manual
> labor, hindering the propagation of useful fixes and new code. It also
> means that interpretation and use of historical code is more difficult
> than necessary. So we pay twice for this problem: when learning the
> system, and when trying to share our work with others. By separating
> class name from identity, Naiad makes Smalltalk more approachable for
> newcomers, and more productive for developer and user communities.
>
>
> editions
>
>
>      Using Naiad, each development system consists of two object
> memories: one containing developed code, and another containing
> "editions" which describe that code. I'll call the first one the
> "subject memory" and the other the "history memory".
>
>      An Edition is a description of some artifact in the subject
> memory at some point in time, currently an author, comment, tag,
> class, method, module, checkpoint, or edit. Each edition has a
> reference to that artifact's next state in the future (the next
> edition) and in the past (the previous edition), as well as an author
> edition, a collection of licenses, and a timestamp.
>
>      An Edit represents the activation of some edition at a point in
> time. For example, there may be a method created in 2005 that is
> removed in 2006 and reactivated in 2007. There would be an Edit for
> each of those three events, but only two method editions (one
> representing the method becoming active, and one representing it being
> removed).
>
>      The history memory replaces the current changes and sources
> files. It has an instance of EditHistory corresponding to the subject
> memory, which records the active (current) editions for the classes,
> method, modules, and authors in the subject memory. It also keeps the
> subject memory's id and the last Edit made to the subject memory.
>
>      Every time the subject memory adds, changes, or removes a class
> definition, method, author, comment, tag, or module, or makes a
> checkpoint (i.e., makes an edit), it adds the appropriate editions to
> the history memory via remote messages. The history memory snapshots
> itself after every edit, so as to provide crash recovery support.
>
>      The subject memory keeps a remote reference to the history
> memory's instance of EditHistory as a class variable of the local
> EditHistory class, and interacts with it using utility messages sent
> to the local EditHistory class. The history memory also keeps that
> EditHistory instance as a class variable of its local EditHistory
> class, but as a local reference.
>
>      An edition typically elides some of its references when it is
> transferred out of a history memory. For example, a transferred
> edition will usually omit the references to its next and previous
> editions. The requesting subject memory can calculate the ID of those
> editions and obtain them with a separate request, if necessary.
>
>      A subject memory may elect to keep its EditHistory instance as a
> local object, such as in a situation where one wants some limited
> immutable history for debugging purposes, and no crash recovery
> support. Whether in this scenario or in normal development the same
> EditHistory utility messages suffice, since no special code need be
> written to support remote objects. If no edits will be made during
> deployment, and no history retrieval is required, one may simply
> jettison the history memory. One may always reconnect the subject and
> history memories at a later time and continue development.
>
>      The subject memory has tools for browsing and activating the
> editions, wherever they are located. This means that no special tools
> are needed to browse the artifacts of multiple subject systems; one
> uses the same tools as for browsing the artifacts of the local subject
> memory. Each subject memory may connect to multiple history memories
> concurrently (if allowed).
>
>      For that matter, the history memories of multiple systems may
> connect to each other directly, to aggregate editions from multiple
> people, for example.
>
>
> class and method IDs
>
>
>      Each class in the subject memory has a universally-unique
> identifier[3], or UUID. The classes in the minimal subject memory are
> assigned UUIDs before the initial release, and all subsequent classes
> are assigned UUIDs when created. Rather than use the single word
> "class" to refer to either a metaclass or to its sole instance, Spoon
> introduces the term "protoclass". For example, (Array class) is a
> metaclass, and its sole instance, Array, is a protoclass. Each
> metaclass and protoclass has its own UUID, called a "base ID". This is
> supported by a new instance variable in ClassDescription.
>
>      Each version of each class is identified by a ClassID, a byte
> array with segments for the class's baseID, author UUID, and a
> sixteen-bit version. This means we can uniquely identify, for each
> author, 65,535 versions of each class in the system. Since we identify
> authors by UUID, the number of possible authors is very large.
>
>      Each version of each method is identified by a MethodID, a byte
> array which contains a ClassID and segments for the method's selector,
> author UUID, and a sixteen-bit version. This means we can uniquely
> identify, for each author, 65,535 versions of each method in each
> version of each class in the system.
>
>
> method editions and method literal markers
>
>
>      Each MethodEdition holds a reference to the corresponding
> ClassEdition, the method source code, and the information needed to
> reconstruct the corresponding CompiledMethod directly, without need of
> the compiler (the method header, initial and final program-counter
> values, method literal markers, and instructions). If one will never
> use the history memory to install methods in a subject memory that
> lacks a compiler, one could drop the compiled method information to
> save space.
>
>      Method literal markers are used to transmit a compiled method's
> literal frame values between object memories. There are method literal
> marker classes to support references to classes, class variables,
> other pool variables, and literal objects, and to support methods
> which perform class-side super-sends. Each method literal marker
> instance knows how to serialize itself as part of Spoon's remote
> messaging system. In particular, when a method literal that refers to
> a class transmits itself, it transmits the ClassID of that class, not
> the name of the class.
>
>      This gets at the namesake concept of Naiad, "Name And Identity
> Are Distinct". When referring to a class, we never need to use its
> name. Each version of each class is an object with a distinct
> identity. By using ClassIDs to refer to each of them, we can avoid
> using class names at all when storing history or distributing
> code. This means that name of each class can be anything, as far as
> the system is concerned.
>
>      With every class name unconstrained, there is no need for
> "namespaces" to distinguish between classes which happen to have same
> name at some point in time. Each class effectively has its own
> namespace, since it is uniquely identifiable regardless of its
> name.
>
>      Developer tools armed with this information can resolve ambiguity
> for humans browsing and changing the system. If a developer writes a
> method which uses a name shared by multiple classes, the system can
> present more information about each of those classes (such as the
> author, time of creation, version, and module association), so that
> the developer can choose the intended one. When browsing such a
> method, the system can distinguish the aliased class name visually,
> indicating that there is disambiguating information available.
>
>
> class editions and shared variables
>
>
>      Each ClassEdition holds the editions for all the method versions
> currently active in the corresponding class in the subject
> memory. Since every edition keeps a reference to its previous and next
> editions, one can trace the history of any method by starting at the
> active edition. Removed methods are represented by method editions
> which have the same MethodID as a normal previous method edition, but
> with the rest of the fields set to nil.
>
>      Each ClassEdition also holds the information needed to
> reconstruct the corresponding class directly, without need of the
> class builder. For all classes, this includes the format, instance
> variable names, and superclass ID. For protoclasses, it also includes
> the class pool keys, class name, and received pool IDs.
>
>      In Spoon, every shared variable pool is the responsibility of
> some class in the system. There is no global variables pool ("system
> dictionary"). Each class that defines a pool is said to "publish" that
> pool; classes which use that pool "receive" it. Spoon adds an instance
> variable to Class to map published pools to their names. Each
> ReceivedPoolID that a protoclass edition uses is a byte array which
> contains a class ID and a published pool name.
>
>
> checkpoints and modules
>
>
>      A Checkpoint edition is simply a named marker of a particular
> point in time. A developer may use checkpoints to indicate various
> interesting states of development, and use the tools to regress or
> replay edits made before or after that time.
>
>      The largest unit of work is represented by module editions. They
> are named collections of method IDs, indicating the specific versions
> of methods which comprise a module, along with sets of child, parent,
> prerequisite, and postrequisite module editions. When a module edition
> is transferred out of a history memory, those edition references are
> transmitted as ModuleIDs. Each module edition also has an
> "antimodule", a module edition calculated at installation time by a
> receiving system which, if applied, would undo the changes made by
> installing the original module. Finally, each module edition has a URI
> by which someone at a remote site may install the module.
>
>      That URI represents a command to a Spoon system running on a
> requestor's local machine; it refers to a standard port on
> localhost. Its path is a text-encoded action, containing an
> instruction (in this case "install a module"), the hostname and port
> of a Spoon system providing the module, and the module's ID. The
> receiving system uses this information to request the module from a
> providing history memory, which then transmits editions as
> necessary. Exactly which editions are transmitted depends on the state
> of the receiving system; this is a two-way conversation between the
> providing and receiving systems. This is often more time and space
> efficient than simply providing all of a module's code, which is what
> happens with traditional static representations like change sets.
>
>      The URIs may be cited on ordinary webpages, which are indexed by
> search engines like Google. A person in search of a module for a
> particular purpose can search for it with a web browser, using those
> search engines. Having found a module's URI, the person can click on
> it, establishing a connection to an embedded webserver in their local
> Spoon system, which carries out the URI's command.
>
>      This mechanism for code distribution avoids storing code in
> static files. It's a deparature from Smalltalk's traditional "fileout"
> mechanism.
>
>      The encoded URIs can serve other functions as well, such as
> listing a system's installed modules, removing an installed module,
> making a snapshot, and quitting the system. In this way one can use a
> web browser to interact with a Spoon system for several basic tasks;
> this is especially useful when the system is headless (e.g., in its
> initial minimal state).
>
>
> comments and tags
>
>
>      Editions for authors, classes, methods, checkpoints, edits, and
> modules each have their own comment and tag editions. This means each
> one of those artifacts has a comment and tags, and the changes in both
> are recorded over time. Comments are as we've already been using them:
> they're explanatory prose about the artifacts. Tags may be familiar to
> you from the web; they are short semantic markers used for grouping
> similar artifacts.
>
>      I intend for tags to replace class and method
> categories. Nominally, we've been using class and method categories to
> establish semantic hierarchies, but the hierarchies have turned out to
> be quite shallow. Although we can form hierarchies with tags as well,
> I think we would do better to apply the sorts of algorithms that
> search engines use, and not concern ourselves with memorizing an
> artifact's semantic markers. The computational cost this incurs for
> the tools might have been high in the early days of Smalltalk, but it
> is quite modest now.
>
>
>      Thanks for reading! Please let me know of any questions or other
> feedback, and feel free to discuss this on the Spoon and Squeak-dev
> mailing lists.
>
>
> -C
>
> [1] http://netjam.org/spoon/naiad
> [2] http://netjam.org/spoon
> [3] http://en.wikipedia.org/wiki/Universally_Unique_Identifier
>
> --
> Craig Latta
> improvisational musical informaticist
> www.netjam.org
> Smalltalkers do: [:it | All with: Class, (And love: it)]
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: second call for feedback on Naiad design

ccrraaiigg
In reply to this post by Karl Ramberg

Hi--

      Thanks for the comments! I'm responding to the comments so far in
this single message. I see no reason to restrict Naiad-related
discussion to a single thread; hopefully threads will emerge around
particular specific issues, rather than particular people. :)  Please
feel free to break issues out into new threads... for this message,
there's such a grab-bag going that I decided to deal with it all in one
place.

      Karl writes:

 > When this system works, won't image size be an issue, like an
 > ever-growing web browser cache that has no size limit?

      I imagine the history memory will have various utilities, like:

-    dumping all the compiled method info, because the subject memory
      will always have a compiler

-    dumping all the method source, because the subject memory will
      never have a compiler :)

-    storing its less-frequently-accessed editions in one or more
      separate history memories, which spend most of their time as
      suspended snapshot files, but which can be activated when
      necessary. Remote message-sending is a fundamental part of
      Spoon; there's no inherent reason why the history memory can't be a
      federation of history memories instead.

      Of course, one might decide to put editions in another object
      database at any point instead (e.g., Magma or Gemstone). I just
      want to provide something that provides the bare minimum
      functionality "out of the box".

-    purging certain editions entirely (rather like when we made new
      sources files with the traditional setup)

***

      Wolfgang writes:

 > For me the main issue is the protocol that is used between the two
 > images (subject and history). There is little written about it.

      This is true, I haven't finished that documentation yet. One can
look at the implementation of remote message-sending from the last Spoon
release, but I haven't described it in prose yet, and the Naiad design
document is the most prose I've written about how the subject and
history memories communicate at a higher level. Eventually all this
stuff will be in the Spoon book[1].

 > Just for thought, what if the history memory would be a web server.
 > What would the protocol look like?

      Well, there is already a (tiny) webserver in the subject memory,
to  provide the initial user interface when first run. One could load
its conveying module into the history memory and do lots of interesting
things with it, yes.

 > Can the low-level protocol be hacked to support this?

      Yes.

 > And one thing I am suspicious is that there is so much knowledge in
 > the IDs.

      Since they're going to be flying back and forth over sockets,
sometimes in large numbers, they need to be as small as possible; so
I've thought carefully about minimizing them. At this point I'm simply
open to discussion about what anyone would leave out. :)  I think have a
good argument for every bit in every ID (likewise for every bit in the
minimal subject memory).

 > And limits to the maximum number of editions etc.

      So far I've decided that it's not worth any extra bits expressing
variable-length sizes, but again I'm open to discussion about that.

 > I'd rather have proper objects that those IDs, with LargeIntegers :-)

      (The size argument applies here, too.)

***

      Michael writes:

 > I think the main reason people aren't commenting is because that's a
 > lot of reading!

      Sure, I'll just keep asserting that the importance justifies the
time. :)

 > Perhaps "versions" is a better name than "editions"? That's the name
 > we're more familiar with.

      In this case I think the familiarity is a disadvantage; "version"
has multiple strong meanings to people. A "version" is sometimes an
artifact which has multiple interesting states over time, and sometimes
it's an identifier used to refer to such an artifact. I think it's
better to use a less-used term here, and I like the resonance between
"edition" and "edit".

 > Do we need to run two instances of Squeak to edit code, one for the
 > current version and one for managing the edit history? I assume that's
 > what you mean by needing two object memories.

      That's right. The typical case is one person using one subject
memory connected to one history memory that is mostly that person's
editions, over a localhost socket connection.

 > If so, is it intended for the edit history object memory to be a live
 > central repository shared by developers?

      That's also an option, yes; it's just not the default.

 > Does the system work if it can't contact the edit history object
 > memory?

      Yes, but the tools would show decompiled method source, and some
history features like regression would be unavailable, (similar to what
happens if you don't have the changes/sources files with the current
setup). But the typical case is that you have the history memory
snapshot on the local machine, so it seems no more likely this would
happen than it would be for one to lose the old changes/sources files,
or indeed the subject memory itself.

 > What do your remote references look like?

      Each one is an object which holds a special hash for a remote
object, and stream on a socket connected to the remote system. So...

 > How stable are they? Do they rely on, e.g. IP address to find a remote
 > object memory? If somebody changes IP, are the remote references still
 > valid?

      ...currently, they do not survive suspension or termination of the
object memory in which they live. They are *not* like URLs, as your
comment implies. They are not a description of how to reach a remote
object, they are an active connection to a remote object which behaves
in all ways like the remote object. In general, they are created by
sending messages to other remote objects. The first remote objects in a
session are created specially as part of the connection handshake
between object memories.

      If the object memory of the reference is suspended (saved and
quit), the reference is nilled on resumption of the memory.

      I implemented this part of the system in 2003; it's been in all
the Spoon releases so far.

 > I assume a class now contains a ClassID and a collection of MethodIDs?

      No, a class has a "base ID", which is a UUID. The subject memory
as a whole also has a UUID. The history memory knows the UUID of the
subject memory it is tracking, and has "class editions" for each of the
classes that have ever existed in the subject memory. Each class edition
has "method editions" for all of the methods which have ever existed for
that class as defined at a certain point in time.

 > Why is ClassID so complex?

      It's complex? It's just a base UUID, an author UUID, and a version
number. I think if it were any simpler we'd lose something important.

 > Why not just assign each class a new UUID for each new version of that
 > class, with authorship and versioning being metadata of that class?

      It seems to me that it would be useful to have a single unique
identifier that can refer to the definition of a class at all points in
time, as expressed by all authors. When you want to get more specific as
to author and point in time, you can append additional bits to it.

      Also, I explicitly want to keep history information separate from
the artifact objects they describe, so that they may be easily left
behind during production.

 > Limiting to 65,536 versions per author is going to create problems in
 > 10 years time.

      I disagree. Remember, these are editions of a class *definition*
(instance variable format, etc.). If you add a method to a class, you're
not creating a new edition of that class, you're merely creating a new
method edition. From my experience (which encompasses more than ten
years ;), authors tend to create entirely new classes much more often
than they revise class definitions, and they simply use the classes as
they exist a lot more often than that. Frankly, I'd expect 1,024 to
suffice here. Sixteen bits is simply the first sufficient number of
bytes, so it's convenient as well.

 > Isn't having the author and version in the [class] IDs going to cause
 > conflict problems? What happens if the author is careless and ends up
 > with two different versions of a method with the same unique
 > identifier?

      Another good reason for keeping the history information in a
separate (and headless) object memory is so it can take of itself
without most developers bothering with it. :)  The typical developer
uses tools in the subject memory. Those tools only make requests to the
history memory for new editions to be added; they have no say in how the
corresponding IDs are made. In particular, the history memory decides
what the next available version number is for a particular combination
of class base ID, author, and selector.

 > Are author UUIDs going to be able to be looked up to get email
 > addresses and names somehow?

      Each history memory stores author editions; each author edition
associates an author UUID with all that info and more (see the class
tree at [2]). When you receive a module from another author's system,
you get the relevant author editions as well. When you use a new system
for the first time, you can create an author edition for yourself.

 > Methods shouldn't have an author. The changes between methods
 > versions/editions should have an author.

      I disagree. I think it's less work over time to figure out those
changes when necessary.

 > I think you're taking the "minimal memory usage" idea too far.

      I think it's necessary to make the system as easy to learn and
maintain as I want it to be.

 > In my design for distributable packages... packages (cf: classes in
 > Naiad)...

      I would expect them to correspond to Naiad's modules, not classes.

 > ...they need to (deep-)copy it...

      Uh-oh... "deep copy" is one of those phrases that immediately
makes me suspect something is wrong (almost as bad as someone saying
"dude" ;).

 > I've separated source from bytecodes.

      Naiad does that, too.

 > I'm not sure it's a good idea to propose an unstable system as the
 > next version of Squeak though.

      Well, this is two major versions out, not one. I think we can get
plenty of testing in. And I think we're in serious danger of stagnation
as it is. For better or worse, I think this history stuff is the sort of
thing that has to be done with a relatively provocative step. Sometimes
this is good (insert your favorite Alan Kay quote here ;).


      thanks again!

-C

[1] http://netjam.org/spoon/book
[2] http://netjam.org/spoon/naiad


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: second call for feedback on Naiad design

Ken Causey-3
This seems like a very good start for a Spoon/Squeak 5 FAQ and deserves
a semi-permanent location I think.

Ken

On Fri, 2008-11-21 at 17:48 -0800, Craig Latta wrote:

> Hi--
>
>       Thanks for the comments! I'm responding to the comments so far in
> this single message. I see no reason to restrict Naiad-related
> discussion to a single thread; hopefully threads will emerge around
> particular specific issues, rather than particular people. :)  Please
> feel free to break issues out into new threads... for this message,
> there's such a grab-bag going that I decided to deal with it all in one
> place.
>
>       Karl writes:
>
>  > When this system works, won't image size be an issue, like an
>  > ever-growing web browser cache that has no size limit?
>
>       I imagine the history memory will have various utilities, like:
>
> -    dumping all the compiled method info, because the subject memory
>       will always have a compiler
>
> -    dumping all the method source, because the subject memory will
>       never have a compiler :)
>
> -    storing its less-frequently-accessed editions in one or more
>       separate history memories, which spend most of their time as
>       suspended snapshot files, but which can be activated when
>       necessary. Remote message-sending is a fundamental part of
>       Spoon; there's no inherent reason why the history memory can't be a
>       federation of history memories instead.
>
>       Of course, one might decide to put editions in another object
>       database at any point instead (e.g., Magma or Gemstone). I just
>       want to provide something that provides the bare minimum
>       functionality "out of the box".
>
> -    purging certain editions entirely (rather like when we made new
>       sources files with the traditional setup)
>
> ***
>
>       Wolfgang writes:
>
>  > For me the main issue is the protocol that is used between the two
>  > images (subject and history). There is little written about it.
>
>       This is true, I haven't finished that documentation yet. One can
> look at the implementation of remote message-sending from the last Spoon
> release, but I haven't described it in prose yet, and the Naiad design
> document is the most prose I've written about how the subject and
> history memories communicate at a higher level. Eventually all this
> stuff will be in the Spoon book[1].
>
>  > Just for thought, what if the history memory would be a web server.
>  > What would the protocol look like?
>
>       Well, there is already a (tiny) webserver in the subject memory,
> to  provide the initial user interface when first run. One could load
> its conveying module into the history memory and do lots of interesting
> things with it, yes.
>
>  > Can the low-level protocol be hacked to support this?
>
>       Yes.
>
>  > And one thing I am suspicious is that there is so much knowledge in
>  > the IDs.
>
>       Since they're going to be flying back and forth over sockets,
> sometimes in large numbers, they need to be as small as possible; so
> I've thought carefully about minimizing them. At this point I'm simply
> open to discussion about what anyone would leave out. :)  I think have a
> good argument for every bit in every ID (likewise for every bit in the
> minimal subject memory).
>
>  > And limits to the maximum number of editions etc.
>
>       So far I've decided that it's not worth any extra bits expressing
> variable-length sizes, but again I'm open to discussion about that.
>
>  > I'd rather have proper objects that those IDs, with LargeIntegers :-)
>
>       (The size argument applies here, too.)
>
> ***
>
>       Michael writes:
>
>  > I think the main reason people aren't commenting is because that's a
>  > lot of reading!
>
>       Sure, I'll just keep asserting that the importance justifies the
> time. :)
>
>  > Perhaps "versions" is a better name than "editions"? That's the name
>  > we're more familiar with.
>
>       In this case I think the familiarity is a disadvantage; "version"
> has multiple strong meanings to people. A "version" is sometimes an
> artifact which has multiple interesting states over time, and sometimes
> it's an identifier used to refer to such an artifact. I think it's
> better to use a less-used term here, and I like the resonance between
> "edition" and "edit".
>
>  > Do we need to run two instances of Squeak to edit code, one for the
>  > current version and one for managing the edit history? I assume that's
>  > what you mean by needing two object memories.
>
>       That's right. The typical case is one person using one subject
> memory connected to one history memory that is mostly that person's
> editions, over a localhost socket connection.
>
>  > If so, is it intended for the edit history object memory to be a live
>  > central repository shared by developers?
>
>       That's also an option, yes; it's just not the default.
>
>  > Does the system work if it can't contact the edit history object
>  > memory?
>
>       Yes, but the tools would show decompiled method source, and some
> history features like regression would be unavailable, (similar to what
> happens if you don't have the changes/sources files with the current
> setup). But the typical case is that you have the history memory
> snapshot on the local machine, so it seems no more likely this would
> happen than it would be for one to lose the old changes/sources files,
> or indeed the subject memory itself.
>
>  > What do your remote references look like?
>
>       Each one is an object which holds a special hash for a remote
> object, and stream on a socket connected to the remote system. So...
>
>  > How stable are they? Do they rely on, e.g. IP address to find a remote
>  > object memory? If somebody changes IP, are the remote references still
>  > valid?
>
>       ...currently, they do not survive suspension or termination of the
> object memory in which they live. They are *not* like URLs, as your
> comment implies. They are not a description of how to reach a remote
> object, they are an active connection to a remote object which behaves
> in all ways like the remote object. In general, they are created by
> sending messages to other remote objects. The first remote objects in a
> session are created specially as part of the connection handshake
> between object memories.
>
>       If the object memory of the reference is suspended (saved and
> quit), the reference is nilled on resumption of the memory.
>
>       I implemented this part of the system in 2003; it's been in all
> the Spoon releases so far.
>
>  > I assume a class now contains a ClassID and a collection of MethodIDs?
>
>       No, a class has a "base ID", which is a UUID. The subject memory
> as a whole also has a UUID. The history memory knows the UUID of the
> subject memory it is tracking, and has "class editions" for each of the
> classes that have ever existed in the subject memory. Each class edition
> has "method editions" for all of the methods which have ever existed for
> that class as defined at a certain point in time.
>
>  > Why is ClassID so complex?
>
>       It's complex? It's just a base UUID, an author UUID, and a version
> number. I think if it were any simpler we'd lose something important.
>
>  > Why not just assign each class a new UUID for each new version of that
>  > class, with authorship and versioning being metadata of that class?
>
>       It seems to me that it would be useful to have a single unique
> identifier that can refer to the definition of a class at all points in
> time, as expressed by all authors. When you want to get more specific as
> to author and point in time, you can append additional bits to it.
>
>       Also, I explicitly want to keep history information separate from
> the artifact objects they describe, so that they may be easily left
> behind during production.
>
>  > Limiting to 65,536 versions per author is going to create problems in
>  > 10 years time.
>
>       I disagree. Remember, these are editions of a class *definition*
> (instance variable format, etc.). If you add a method to a class, you're
> not creating a new edition of that class, you're merely creating a new
> method edition. From my experience (which encompasses more than ten
> years ;), authors tend to create entirely new classes much more often
> than they revise class definitions, and they simply use the classes as
> they exist a lot more often than that. Frankly, I'd expect 1,024 to
> suffice here. Sixteen bits is simply the first sufficient number of
> bytes, so it's convenient as well.
>
>  > Isn't having the author and version in the [class] IDs going to cause
>  > conflict problems? What happens if the author is careless and ends up
>  > with two different versions of a method with the same unique
>  > identifier?
>
>       Another good reason for keeping the history information in a
> separate (and headless) object memory is so it can take of itself
> without most developers bothering with it. :)  The typical developer
> uses tools in the subject memory. Those tools only make requests to the
> history memory for new editions to be added; they have no say in how the
> corresponding IDs are made. In particular, the history memory decides
> what the next available version number is for a particular combination
> of class base ID, author, and selector.
>
>  > Are author UUIDs going to be able to be looked up to get email
>  > addresses and names somehow?
>
>       Each history memory stores author editions; each author edition
> associates an author UUID with all that info and more (see the class
> tree at [2]). When you receive a module from another author's system,
> you get the relevant author editions as well. When you use a new system
> for the first time, you can create an author edition for yourself.
>
>  > Methods shouldn't have an author. The changes between methods
>  > versions/editions should have an author.
>
>       I disagree. I think it's less work over time to figure out those
> changes when necessary.
>
>  > I think you're taking the "minimal memory usage" idea too far.
>
>       I think it's necessary to make the system as easy to learn and
> maintain as I want it to be.
>
>  > In my design for distributable packages... packages (cf: classes in
>  > Naiad)...
>
>       I would expect them to correspond to Naiad's modules, not classes.
>
>  > ...they need to (deep-)copy it...
>
>       Uh-oh... "deep copy" is one of those phrases that immediately
> makes me suspect something is wrong (almost as bad as someone saying
> "dude" ;).
>
>  > I've separated source from bytecodes.
>
>       Naiad does that, too.
>
>  > I'm not sure it's a good idea to propose an unstable system as the
>  > next version of Squeak though.
>
>       Well, this is two major versions out, not one. I think we can get
> plenty of testing in. And I think we're in serious danger of stagnation
> as it is. For better or worse, I think this history stuff is the sort of
> thing that has to be done with a relatively provocative step. Sometimes
> this is good (insert your favorite Alan Kay quote here ;).
>
>
>       thanks again!
>
> -C
>
> [1] http://netjam.org/spoon/book
> [2] http://netjam.org/spoon/naiad
>
> _______________________________________________
> Spoon mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/spoon
>



signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: second call for feedback on Naiad design

Joshua Gargus-2
In reply to this post by ccrraaiigg

On Nov 21, 2008, at 5:48 PM, Craig Latta wrote:

>
> Hi--
>
>     Thanks for the comments! I'm responding to the comments so far  
> in this single message. I see no reason to restrict Naiad-related  
> discussion to a single thread; hopefully threads will emerge around  
> particular specific issues, rather than particular people. :)  
> Please feel free to break issues out into new threads... for this  
> message, there's such a grab-bag going that I decided to deal with  
> it all in one place.
>
>     Karl writes:
>
> > When this system works, won't image size be an issue, like an
> > ever-growing web browser cache that has no size limit?

I had two concerns along these lines.  I'm not so concerned about  
absolute size, since disk is cheap.   But I wonder about the time/CPU  
it will take to snapshot the whole image each time you edit a method  
or evaluate something in the workspace (DoIts are recorded in the  
history image, right?).  Appending to a file is an O(1) operation, but  
snapshotting an image is O(n), where n is the total number of updates.

Another concern is data integrity.  What happens if your machine  
crashes while you're snapshotting?  If you're simply appending to  
a .changes file, there's no problem.  Of course, this is surmountable,  
but the solution will be more complicated than a changes file.

>
>
>     I imagine the history memory will have various utilities, like:
>
> -    dumping all the compiled method info, because the subject memory
>     will always have a compiler
>
> -    dumping all the method source, because the subject memory will
>     never have a compiler :)


This level of flexibility worries me a bit.  It's cool that the model  
supports these types of uses, but if it's going to replace  
the .changes file, it needs to be simple and stable.

I realize that this is a bit unfair, because you're talking about the  
interesting characteristics of the model, not focusing on how to make  
the transition from .changes to Naiad.  At some point, though, we'll  
have to have that conversation.


>
>
> -    storing its less-frequently-accessed editions in one or more
>     separate history memories, which spend most of their time as
>     suspended snapshot files, but which can be activated when
>     necessary. Remote message-sending is a fundamental part of
>     Spoon; there's no inherent reason why the history memory can't  
> be a
>     federation of history memories instead.
>
>     Of course, one might decide to put editions in another object
>     database at any point instead (e.g., Magma or Gemstone). I just
>     want to provide something that provides the bare minimum
>     functionality "out of the box".


Good to see the focus on simplicity.  But what is the use-case driving  
your definition of "bare minimum functionality"?  Is it a prototype  
for people to gain exposure to Naiad?  Is it something that can  
replace .changes for Joe Squeaker's day-to-day use?

Thanks for taking the time to document your design,
Josh




>
>
> -    purging certain editions entirely (rather like when we made new
>     sources files with the traditional setup)
>
> ***
>
>     Wolfgang writes:
>
> > For me the main issue is the protocol that is used between the two
> > images (subject and history). There is little written about it.
>
>     This is true, I haven't finished that documentation yet. One can  
> look at the implementation of remote message-sending from the last  
> Spoon release, but I haven't described it in prose yet, and the  
> Naiad design document is the most prose I've written about how the  
> subject and history memories communicate at a higher level.  
> Eventually all this stuff will be in the Spoon book[1].
>
> > Just for thought, what if the history memory would be a web server.
> > What would the protocol look like?
>
>     Well, there is already a (tiny) webserver in the subject memory,  
> to  provide the initial user interface when first run. One could  
> load its conveying module into the history memory and do lots of  
> interesting things with it, yes.
>
> > Can the low-level protocol be hacked to support this?
>
>     Yes.
>
> > And one thing I am suspicious is that there is so much knowledge in
> > the IDs.
>
>     Since they're going to be flying back and forth over sockets,  
> sometimes in large numbers, they need to be as small as possible; so  
> I've thought carefully about minimizing them. At this point I'm  
> simply open to discussion about what anyone would leave out. :)  I  
> think have a good argument for every bit in every ID (likewise for  
> every bit in the minimal subject memory).
>
> > And limits to the maximum number of editions etc.
>
>     So far I've decided that it's not worth any extra bits  
> expressing variable-length sizes, but again I'm open to discussion  
> about that.
>
> > I'd rather have proper objects that those IDs, with  
> LargeIntegers :-)
>
>     (The size argument applies here, too.)
>
> ***
>
>     Michael writes:
>
> > I think the main reason people aren't commenting is because that's a
> > lot of reading!
>
>     Sure, I'll just keep asserting that the importance justifies the  
> time. :)
>
> > Perhaps "versions" is a better name than "editions"? That's the name
> > we're more familiar with.
>
>     In this case I think the familiarity is a disadvantage;  
> "version" has multiple strong meanings to people. A "version" is  
> sometimes an artifact which has multiple interesting states over  
> time, and sometimes it's an identifier used to refer to such an  
> artifact. I think it's better to use a less-used term here, and I  
> like the resonance between "edition" and "edit".
>
> > Do we need to run two instances of Squeak to edit code, one for the
> > current version and one for managing the edit history? I assume  
> that's
> > what you mean by needing two object memories.
>
>     That's right. The typical case is one person using one subject  
> memory connected to one history memory that is mostly that person's  
> editions, over a localhost socket connection.
>
> > If so, is it intended for the edit history object memory to be a  
> live
> > central repository shared by developers?
>
>     That's also an option, yes; it's just not the default.
>
> > Does the system work if it can't contact the edit history object
> > memory?
>
>     Yes, but the tools would show decompiled method source, and some  
> history features like regression would be unavailable, (similar to  
> what happens if you don't have the changes/sources files with the  
> current setup). But the typical case is that you have the history  
> memory snapshot on the local machine, so it seems no more likely  
> this would happen than it would be for one to lose the old changes/
> sources files, or indeed the subject memory itself.
>
> > What do your remote references look like?
>
>     Each one is an object which holds a special hash for a remote  
> object, and stream on a socket connected to the remote system. So...
>
> > How stable are they? Do they rely on, e.g. IP address to find a  
> remote
> > object memory? If somebody changes IP, are the remote references  
> still
> > valid?
>
>     ...currently, they do not survive suspension or termination of  
> the object memory in which they live. They are *not* like URLs, as  
> your comment implies. They are not a description of how to reach a  
> remote object, they are an active connection to a remote object  
> which behaves in all ways like the remote object. In general, they  
> are created by sending messages to other remote objects. The first  
> remote objects in a session are created specially as part of the  
> connection handshake between object memories.
>
>     If the object memory of the reference is suspended (saved and  
> quit), the reference is nilled on resumption of the memory.
>
>     I implemented this part of the system in 2003; it's been in all  
> the Spoon releases so far.
>
> > I assume a class now contains a ClassID and a collection of  
> MethodIDs?
>
>     No, a class has a "base ID", which is a UUID. The subject memory  
> as a whole also has a UUID. The history memory knows the UUID of the  
> subject memory it is tracking, and has "class editions" for each of  
> the classes that have ever existed in the subject memory. Each class  
> edition has "method editions" for all of the methods which have ever  
> existed for that class as defined at a certain point in time.
>
> > Why is ClassID so complex?
>
>     It's complex? It's just a base UUID, an author UUID, and a  
> version number. I think if it were any simpler we'd lose something  
> important.
>
> > Why not just assign each class a new UUID for each new version of  
> that
> > class, with authorship and versioning being metadata of that class?
>
>     It seems to me that it would be useful to have a single unique  
> identifier that can refer to the definition of a class at all points  
> in time, as expressed by all authors. When you want to get more  
> specific as to author and point in time, you can append additional  
> bits to it.
>
>     Also, I explicitly want to keep history information separate  
> from the artifact objects they describe, so that they may be easily  
> left behind during production.
>
> > Limiting to 65,536 versions per author is going to create problems  
> in
> > 10 years time.
>
>     I disagree. Remember, these are editions of a class *definition*  
> (instance variable format, etc.). If you add a method to a class,  
> you're not creating a new edition of that class, you're merely  
> creating a new method edition. From my experience (which encompasses  
> more than ten years ;), authors tend to create entirely new classes  
> much more often than they revise class definitions, and they simply  
> use the classes as they exist a lot more often than that. Frankly,  
> I'd expect 1,024 to suffice here. Sixteen bits is simply the first  
> sufficient number of bytes, so it's convenient as well.
>
> > Isn't having the author and version in the [class] IDs going to  
> cause
> > conflict problems? What happens if the author is careless and ends  
> up
> > with two different versions of a method with the same unique
> > identifier?
>
>     Another good reason for keeping the history information in a  
> separate (and headless) object memory is so it can take of itself  
> without most developers bothering with it. :)  The typical developer  
> uses tools in the subject memory. Those tools only make requests to  
> the history memory for new editions to be added; they have no say in  
> how the corresponding IDs are made. In particular, the history  
> memory decides what the next available version number is for a  
> particular combination of class base ID, author, and selector.
>
> > Are author UUIDs going to be able to be looked up to get email
> > addresses and names somehow?
>
>     Each history memory stores author editions; each author edition  
> associates an author UUID with all that info and more (see the class  
> tree at [2]). When you receive a module from another author's  
> system, you get the relevant author editions as well. When you use a  
> new system for the first time, you can create an author edition for  
> yourself.
>
> > Methods shouldn't have an author. The changes between methods
> > versions/editions should have an author.
>
>     I disagree. I think it's less work over time to figure out those  
> changes when necessary.
>
> > I think you're taking the "minimal memory usage" idea too far.
>
>     I think it's necessary to make the system as easy to learn and  
> maintain as I want it to be.
>
> > In my design for distributable packages... packages (cf: classes in
> > Naiad)...
>
>     I would expect them to correspond to Naiad's modules, not classes.
>
> > ...they need to (deep-)copy it...
>
>     Uh-oh... "deep copy" is one of those phrases that immediately  
> makes me suspect something is wrong (almost as bad as someone saying  
> "dude" ;).
>
> > I've separated source from bytecodes.
>
>     Naiad does that, too.
>
> > I'm not sure it's a good idea to propose an unstable system as the
> > next version of Squeak though.
>
>     Well, this is two major versions out, not one. I think we can  
> get plenty of testing in. And I think we're in serious danger of  
> stagnation as it is. For better or worse, I think this history stuff  
> is the sort of thing that has to be done with a relatively  
> provocative step. Sometimes this is good (insert your favorite Alan  
> Kay quote here ;).
>
>
>     thanks again!
>
> -C
>
> [1] http://netjam.org/spoon/book
> [2] http://netjam.org/spoon/naiad
>
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: second call for feedback on Naiad design

ccrraaiigg

Hi Josh--

      Thanks for the feedback!

 > I'm not so concerned about absolute size, since disk is cheap.   But I
 > wonder about the time/CPU it will take to snapshot the whole image
 > each time you edit a method or evaluate something in the workspace
 > (DoIts are recorded in the history image, right?).  Appending to a
 > file is an O(1) operation, but snapshotting an image is O(n), where n
 > is the total number of updates.

      My claim here is that the added expense is worthwhile, given the
greater functionality of objects over change-file chunks. Also, the
developer may continue to work while the snapshots occur; the system
doesn't block.

 > Another concern is data integrity.  What happens if your machine
 > crashes while you're snapshotting?

      The system doesn't delete the previous history snapshot until the
next one is successfully written (some would argue that all snapshots
should work this way). One could also host the subject and history
memories on separate machines. One could also maintain a history memory
*and* a changes file (and perhaps snapshot the history memory less often).

 > If you're simply appending to a .changes file, there's no problem.

      It seems to me that the capacity for data loss is the same in both
cases (the most recent change only, unless the storage devices on
machine hosting the history memory or changes file are wiped out also).

 > Of course, this is surmountable, but the solution will be more
 > complicated than a changes file.

      Well, of course the history memory scheme is already more
complicated than a changes file, but again I claim it's worthwhile. :)

 > > I imagine the history memory will have various utilities, like:
 > >
 > > - dumping all the compiled method info, because the subject memory
 > >   will always have a compiler
 > >
 > > - dumping all the method source, because the subject memory will
 > >   never have a compiler :)
 >
 > This level of flexibility worries me a bit.  It's cool that the model
 > supports these types of uses, but if it's going to replace the
 > .changes file, it needs to be simple and stable.
 >
 > I realize that this is a bit unfair, because you're talking about the
 > interesting characteristics of the model, not focusing on how to make
 > the transition from .changes to Naiad.  At some point, though, we'll
 > have to have that conversation.

      Sure, I think now is a great time to discuss that. Of course, it's
also fair to discuss the shortcomings of the changes file scheme. It may
be simple and stable, but I also think its utility is unacceptably low.
I strongly suspect the changes file scheme is an anachronism, primarily
driven by the costs and availabilities of processor cycles, disk space,
and network bandwidth in the 1970s. But even if the original designers
thought it was a great idea and would do it the same way now, I still
find it a hindrance. :)

      While I have the same concerns you do, the status quo is so bad
that I want to pursue this design.

 > > - storing its less-frequently-accessed editions in one or more
 > > separate history memories, which spend most of their time as
 > > suspended snapshot files, but which can be activated when
 > > necessary. Remote message-sending is a fundamental part of
 > > Spoon; there's no inherent reason why the history memory can't be a
 > > federation of history memories instead.
 > >
 > > Of course, one might decide to put editions in another object
 > > database at any point instead (e.g., Magma or Gemstone). I just
 > > want to provide something that provides the bare minimum
 > > functionality "out of the box".
 >
 > Good to see the focus on simplicity.  But what is the use-case driving
 > your definition of "bare minimum functionality"? Is it a prototype for
 > people to gain exposure to Naiad?  Is it something that can replace
 > .changes for Joe Squeaker's day-to-day use?

      The latter. Indeed, I've been trying to get people to tell me
their desired use cases.


      thanks again,

-C

--
Craig Latta
www.netjam.org