Smalltalk › Squeak › Squeak - Dev

DeltaStreams file-out format and class model

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Tapple Gao

DeltaStreams file-out format and class model

This is an IRC discussion I am moving to the mailing list

the current DeltaStreams file-out format is monolithic, and can be only loaded/saved as one chunk. It is a gzipped DataStream of the Delta model.

An idea is to base it on a logging framework. where composite
changes could be rendered as:

open composite change.
add change.
add change.
close composite change.

Much like xml/sexp formats. Not sure if the chunk format could
do this

This would enable streaming loading and saving of deltas, rather
than the all-at-once load/save as is done now.

Keith Hodges replied:

> please please use the chunk format
> or as I have suggested a form that may not look like the chunk format but can behave like it
> I myself didnt think that DS needed a class model
> just heve defined operations on a System Editor
> foled out in chunk format
> filed*
> i.e.,,, change method
> chunk of code which assigns the method change to aystemEditor A
> and a second chunk of code which assigns the inverse to system editor B
> thats you delta stream

First, I am open to any suggestion about a better change model.
Having 34 subclasses of DSChange be the model does seem messy to
me. I know there should be something better, but I havn't
thought of it yet, except that it would probably be vaguely
pier-like

Some questions:
What is a "defined operation on a SystemEditor" other than a class model?
What do you mean by "the chunk format" or "like the chunk format"?
Do you mean "able to be used as a CompiledMethod sourcePointer"?
Or do you mean "some format that is a smalltalk expression that generates something"?

I don't really understand the chunk format; by having old and
new version, any fileout of Deltas would not look like a
file-out or change set, even if it did use something parsable by
the chunk reader. The chunk format also has the
complication/liberty of custom stream parsers

Would this be like the chunk format you speak of?
| editor classEditor|
editor := SystemEditor new !
classEditor := editor at: Object !
classEditor compile: "methodZ ^ self" classified: #'junk methods'!

--
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808

keith1y

Re: DeltaStreams file-out format and class model

Hello Matthew,
> This is an IRC discussion I am moving to the mailing list
>
> the current DeltaStreams file-out format is monolithic, and can be only loaded/saved as one chunk. It is a gzipped DataStream of the Delta model.
This is essentially the same idea as Monticello. Classes modelling each type of change albeit at a higher level of granularity than DS, saved/loaded as a dataStream.

When loading, Monticello takes the monolithic file out of the model and does some analysis to establish safest load order and stuff that is not needed. Monticello is quite smart since it only loads what has changed.

Datastream based formats like this are inherently inflexible. If you want to use this format due to its simplicity and universal availability then you either need to provide for future expansion. Or have some notion of format version. Supporting all possible versions could then be difficult if this depends upon class definitions, since it is presently difficult to have more than one definition of a class in the image at the same time.

I think that the easiest way of providing for future expansion is to have a slot reserved for a 'properties' dictionary in the base class DSChange. This is the approach I have taken for releasing Monticello from its fixed class layout, except of course it is somewhat more difficult to add after the fact. Additional instvar requirements can be placed in there rather than changing the class format.

Another possibility would be to simply load all data elements into a Dictionary and file that out instead.
> An idea is to base it on a logging framework. where composite
My logging framework is designed to be the coders-interface between the placing of a debugging statement in their code , and the choice of back-end log to disk framework, of which the squeak-dev universe offers 3 different ones.

So as it stands I dont think my logging stiff is the right tool for this, but a variant could be. There is no reason by actual useful bits of data/code cannot be sent to logs. e.g. In perl it is common practice to use Data Dumper to write out complete data structures to server logs. The form it is writen in can be eval-ed to restore the data structure.

> changes could be rendered as:
>
> open composite change.
> add change.
> add change.
> close composite change.
>
> Much like xml/sexp formats. Not sure if the chunk format could
> do this
>
> This would enable streaming loading and saving of deltas, rather
> than the all-at-once load/save as is done now

> Keith Hodges replied:
>
>> please please use the chunk format
>>
Some people love it some people hate it.

I love it, it is ultimately flexible, thats what I like about it. The
simplicity to power ratio is potentially very good.

If I recall correctly the default behaviour begins such that the first
chunk is read and evaluated by the compiler, the result being a reader
which (by convention) reads the next chunk and so on. Typically when a
reader finds an empty chunk it returns, resetting to the initial reader
which restarts the process.

Those who want to make models out of the data records don't like it
because it is flexible enough to include anything, so the content cannot
necessarily be guaranteed to be readable by anything other than the
Compiler.

Given the flexibility of the chunk reading idea, I am surprised that we
have not seen much innovation around it in improving fileOuts etc.

One advantage being that chunks can do anything, so you could record and
file out an executable representaion of any action even such things as
"pasting an image into the environment" since chunks can include encoded
binary data if preceded by the appropriate decoding reader.

>> I myself didnt think that DS needed a class model
>> just heve defined operations on a System Editor
>> foled out in chunk format
>> filed*
>> i.e.,,, change method
>> chunk of code which assigns the method change to aystemEditor A
>> and a second chunk of code which assigns the inverse to system editor B
>> thats you delta stream
>>
>
> First, I am open to any suggestion about a better change model.
> Having 34 subclasses of DSChange be the model does seem messy to
> me. I know there should be something better, but I havn't
>

I guess that is an inevitable outcme of modelling in an environment
where one models with classes and instances.
> thought of it yet, except that it would probably be vaguely
>
> pier-like
>
I am not sure I understand that statement.
> Some questions:
> What is a "defined operation on a SystemEditor" other than a class model?
>
It is simply generated source code which performs an operation, In this
case the receiver is a model of the Smalltalk environment.

So what DS models as "Add an instance var 'newVar' to Class MyClass" can
be persisted as.

(CurrentSystemEditor value at: #MyClass) addInstVarName: 'newVar'.

One problem with chunks is that remapping Globals is not straight forward.
However, I am a fan of ProcessSpecific variables and I think that they
could help in this, since
"CurrentSystemEditor value" would be determined at runtime, and could
have a different value
in each process that is using it.
> What do you mean by "the chunk format" or "like the chunk format"?
> Do you mean "able to be used as a CompiledMethod sourcePointer"?
>
see above.
> Or do you mean "some format that is a smalltalk expression that generates something"?
>
>
I do, but the chunk format is not limited to that.
> I don't really understand the chunk format; by having old and
> new version, any fileout of Deltas would not look like a
> file-out or change set, even if it did use something parsable by
> the chunk reader. The chunk format also has the
>
indeed it would look like a deltastream.
> complication/liberty of custom stream parsers
>
> Would this be like the chunk format you speak of?
> | editor classEditor|
> editor := SystemEditor new !
> classEditor := editor at: Object !
> classEditor compile: "methodZ ^ self" classified: #'junk methods'
I imagine you would need...

A header chunk, to set up the SystemEditors, one for the forward
direction one for the reverse (although I suspect it may be possible to
have one do both.)

! CurrentSystemEditor value: SystemEditor new. ! !

Action Chunks.

! CurrentSystemEditor value addInstVarName: 'a' ! ! "individual statements"
! CurrentSystemEditor value inverseEditor removeInstVarName: 'a' ! !

Although many may not agree with me I think there is a lot of potential
for innovation using the chunk format, and it has the advantage that
most people have the tools to read it already.

regards

Keith

Tapple Gao

Re: DeltaStreams file-out format and class model

On Wed, Oct 10, 2007 at 06:15:51AM +0100, Keith Hodges wrote:

> > the current DeltaStreams file-out format is monolithic, and
> > can be only loaded/saved as one chunk. It is a gzipped
> > DataStream of the Delta model.
>
> Datastream based formats like this are inherently inflexible.
> If you want to use this format due to its simplicity and
> universal availability then you either need to provide for
> future expansion. Or have some notion of format version.
> Supporting all possible versions could then be difficult if
> this depends upon class definitions, since it is presently
> difficult to have more than one definition of a class in the
> image at the same time.

Indeed it is quite inflexible. I plan to change it.

> I think that the easiest way of providing for future expansion
> is to have a slot reserved for a 'properties' dictionary in
> the base class DSChange. This is the approach I have taken for
> releasing Monticello from its fixed class layout, except of
> course it is somewhat more difficult to add after the fact.
> Additional instvar requirements can be placed in there rather
> than changing the class format.

DSDelta already has this, however, DSChange does not. DSChange
should have this

> Another possibility would be to simply load all data elements
> into a Dictionary and file that out instead.

Could you elaborate?

> > An idea is to base it on a logging framework. where
> > composite
> My logging framework is designed to be the coders-interface
> between the placing of a debugging statement in their code ,
> and the choice of back-end log to disk framework, of which the
> squeak-dev universe offers 3 different ones.
>
> So as it stands I dont think my logging stiff is the right
> tool for this, but a variant could be. There is no reason by
> actual useful bits of data/code cannot be sent to logs. e.g.
> In perl it is common practice to use Data Dumper to write out
> complete data structures to server logs. The form it is writen
> in can be eval-ed to restore the data structure.
> > changes could be rendered as:
> >
> > open composite change. add change. add change. close
> > composite change.
> >
> > Much like xml/sexp formats. Not sure if the chunk format
> > could do this
> >
> > This would enable streaming loading and saving of deltas,
> > rather than the all-at-once load/save as is done now
>
> > Keith Hodges replied:
> >
> >> please please use the chunk format
> >>
> Some people love it some people hate it.
>
> I love it, it is ultimately flexible, thats what I like about
> it. The simplicity to power ratio is potentially very good.
>
> If I recall correctly the default behaviour begins such that
> the first chunk is read and evaluated by the compiler, the
> result being a reader which (by convention) reads the next
> chunk and so on. Typically when a reader finds an empty chunk
> it returns, resetting to the initial reader which restarts
> the process.

My understanding of the chunk format is from
http://wiki.squeak.org/squeak/1105

> Those who want to make models out of the data records don't
> like it because it is flexible enough to include anything, so
> the content cannot necessarily be guaranteed to be readable by
> anything other than the Compiler.
>
> Given the flexibility of the chunk reading idea, I am
> surprised that we have not seen much innovation around it in
> improving fileOuts etc.
>
> One advantage being that chunks can do anything, so you could
> record and file out an executable representaion of any action
> even such things as "pasting an image into the environment"
> since chunks can include encoded binary data if preceded by
> the appropriate decoding reader.

Indeed. I did this in my second attempt at a simple Delta
fileout format. I put a static decoder chunk at the beginning of
the file, followed by the gzipped datastream. I dropped it when
I noticed that the chunk reader was just noice to the actual
content, which was in the gzipped datastream.

My first attempt was an expression that, when evaluated, yielded
the Delta. I found out that the compiler and image die very
ungracefully when asked to evaluate a 3000-line long
expression/statement.

Here are some things I don't see how to do with the chunk
format:

1. Read chunks in reverse order. This is absolutely essential
when reverting a delta.
2. Pass arguments to a chunk file. For example, how could I
ensure that the Compiler, while parsing a chunk file, sends
all commands through a certain visitor, depending on whether
I want to
1. Find all conflicting deltas
2. apply non-conflicting deltas
3. collect all deltas of one package and do something with
them
3. define a chunk heiarchy, such as one chunk containing and
being able to manipulate an delimited set of the next several
chunks. This would be very useful in storing composite
changes, and in delimiting the individual Deltas in a
DeltaStream. This may be doable by returning a chunk reader
from a reader chunk, but I don't know if there would be a way
for a recursed chunk reader to recognize the end of its
substream.

> > First, I am open to any suggestion about a better change
> > model. Having 34 subclasses of DSChange be the model does
> > seem messy to me. I know there should be something better,
> > but I havn't
> >
> I guess that is an inevitable outcme of modelling in an
> environment where one models with classes and instances.

Let's hope not!

> > thought of it yet, except that it would probably be vaguely
> >
> > pier-like
> >
> I am not sure I understand that statement.

DSChange and friends mix together operation (add remove
change move), context (class, method, class organization, system
organization), and subject (ivar, method source, comment,
timestamp, category, etc.) at the class definition level.
Examples:

DSMethodAdded (operation: add; context: aClass; subject: aMethod)
DSMethodRemoved (operation: remove; context: aClass; subject: aMethod)
DSMethodSourceChange (operation: change; context: aMethod; subject: source, timestamp)

On the other hand, Pier separates these concepts a bit.
Operations are the task of a few very generic PRCommands
(PRAddCommand, PREditCommand, PRRemoveCommand, PRMoveCommand).

Commands operate, as I understand it, on PRStructures, which are
both a Context (a PRPath, which looks up a context), and a
subject. I am vague on the details, but it is a praiseworthy
model, since it recieves a lot of praise :).

> > Some questions: What is a "defined operation on a
> > SystemEditor" other than a class model?
> >
> It is simply generated source code which performs an
> operation, In this case the receiver is a model of the
> Smalltalk environment.
>
> So what DS models as "Add an instance var 'newVar' to Class
> MyClass" can be persisted as.
>
> (CurrentSystemEditor value at: #MyClass) addInstVarName:
> 'newVar'.
>
> One problem with chunks is that remapping Globals is not
> straight forward. However, I am a fan of ProcessSpecific
> variables and I think that they could help in this, since
> "CurrentSystemEditor value" would be determined at runtime,
> and could have a different value in each process that is using
> it.

I don't know what you mean by "remapping Globals"

> > Would this be like the chunk format you speak of?
> > | editor classEditor|
> > editor := SystemEditor new !
> > classEditor := editor at: Object ! classEditor compile: "methodZ ^ self" classified: #'junk methods'
> I imagine you would need...
>
> A header chunk, to set up the SystemEditors, one for the
> forward direction one for the reverse (although I suspect it
> may be possible to have one do both.)
>
> ! CurrentSystemEditor value: SystemEditor new. ! !
>
> Action Chunks.
>
> ! CurrentSystemEditor value addInstVarName: 'a' ! !
> "individual statements" ! CurrentSystemEditor value
> inverseEditor removeInstVarName: 'a' ! !
>
> Although many may not agree with me I think there is a lot of
> potential for innovation using the chunk format, and it has
> the advantage that most people have the tools to read it
> already.

All applications using the chunk format, so far, have ran into
the problem that if you need to do something not done by the
code in the chunk stream, you need to abandon the chunk format
and resort to manually parsing it to get back to the objects you
started with, or something more manipulable. For instance, one
can apply change sets using the built-in chunk reader, but
to open a change list or change browser, one must heuristically
parse the file (see ChangeList protocol scanning, for instance)

This may be a limit of the file-out format, though, and not of
the underlying chunk format.

A declarative model and matched visitor is currently the kernel
of DeltaStreams, so I don't see the chunk format as working for
the current model. If the model were more like the pier model, I
think the chunk format may be better suited, as the pier model
is not so declarative imho

--
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808

Göran Krampe

Re: DeltaStreams file-out format and class model

Hi guys!

No time to dwell in detail but I wonder two things:

1. I have always envisioned a Delta to be first *read* into the image
and then *after* that - either applied or reverted or whatever. So the
"reading" part would ONLY instantiate the totally self contained object
graph of the delta. And thus - why would reading chunks in reverse be
interesting?

2. Again, since IMHO the reading part should ONLY build the graph - why
would the chunk format be a bad choice? I always envisioned the simplest
and most flexible chunk format to be first some code to get a reader
(making it easy to hook in alternate readers) and then simply "feed"
that reader by sending messages to it that builds the Delta. In essence
it probably would boil down to one message per change (roughly) object.
Now... that doesn't expose any internal structure (ivars of DSChange
classes etc) so... as long as the readers grok the original set of
messages - how would that be problematic when it comes to schema
evolution?

Lastly - I like the current balance in the DSChange hierarchy between
"abstractness" and "concreteness" - I really don't want to make it more
abstract or generic, as a step towards the Pier model sounds like in my
ears. I like code that I can touch, feel and understand. I don't like
code which makes me feel I don't ever get to see "the meat" of the
action. :)

Which btw is why I sometimes feel slightly dizzy when the Visitor
pattern gets overused. Again in the DSChange hierarchy I wrote a visitor
mechanism but intentionally kept it a bit more intention revealing - and
yes, thus I did not FULLY exploit the genericness of the Visitor pattern
- but that was actually on purpose. Well, Matthew knows what I mean I
guess. :)

regards, Gran

Bert Freudenberg

Re: DeltaStreams file-out format and class model

In reply to this post by keith1y

On Oct 10, 2007, at 7:15 , Keith Hodges wrote:

> If I recall correctly the default behaviour begins such that the first
> chunk is read and evaluated by the compiler, the result being a reader

Not quite. A chunk is everything up to the next bang (!). A chunk is
simply evaluated. Only if an empty chunk is seen, i.e., it starts
with a bang, the next chunk is taken as a reader definition.

> which (by convention) reads the next chunk and so on. Typically when a
> reader finds an empty chunk it returns, resetting to the initial
> reader
> which restarts the process.

For source code readers, yes.

> chunks can include encoded
> binary data if preceded by the appropriate decoding reader.

It can even be raw binary data, not encoded. It's the reader's
responsibility to deal with what follows.

> A header chunk, to set up the SystemEditors, one for the forward
> direction one for the reverse (although I suspect it may be
> possible to
> have one do both.)
>
> ! CurrentSystemEditor value: SystemEditor new. ! !

No, that would be a reader chunk. The correct way is

CurrentSystemEditor value: SystemEditor new. !

> Action Chunks.
>
> ! CurrentSystemEditor value addInstVarName: 'a' ! ! "individual
> statements"
> ! CurrentSystemEditor value inverseEditor removeInstVarName: 'a' ! !

Again - you should use plain chunks for that, not reader chunks.

> Although many may not agree with me I think there is a lot of
> potential
> for innovation using the chunk format, and it has the advantage that
> most people have the tools to read it already.

It's incredibly flexible indeed, and goes back to the B5000 that
inspired Smalltalk.

- Bert -

Klaus D. Witzel

Re: DeltaStreams file-out format and class model

On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:

> On Oct 10, 2007, at 7:15 , Keith Hodges wrote:
[...]
>> Although many may not agree with me I think there is a lot of potential
>> for innovation using the chunk format, and it has the advantage that
>> most people have the tools to read it already.
>
> It's incredibly flexible indeed, and goes back to the B5000 that
> inspired Smalltalk.

Interesting. Having worked with almost all B5000 series and successors and
still working with them, what's the chunk format on them (besides that it
has no naked memory pointers, but descriptors)?

/Klaus

> - Bert -
>
>
>
>

Bert Freudenberg

Re: DeltaStreams file-out format and class model

On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote:

> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:
>
>> On Oct 10, 2007, at 7:15 , Keith Hodges wrote:
> [...]
>>> Although many may not agree with me I think there is a lot of
>>> potential
>>> for innovation using the chunk format, and it has the advantage that
>>> most people have the tools to read it already.
>>
>> It's incredibly flexible indeed, and goes back to the B5000 that
>> inspired Smalltalk.
>
> Interesting. Having worked with almost all B5000 series and
> successors and still working with them, what's the chunk format on
> them (besides that it has no naked memory pointers, but descriptors)?

Maybe I should have wrote "the idea goes back", not chunk format
itself. The idea to have the data itself specify how to be processed
comes from there - the tapes had a loader program in front that reads
the rest of the tape. I don't know much about the details.

- Bert -

Klaus D. Witzel

Re: DeltaStreams file-out format and class model

On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote:

>
> On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote:
>
>> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:
>>
>>> On Oct 10, 2007, at 7:15 , Keith Hodges wrote:
>> [...]
>>>> Although many may not agree with me I think there is a lot of
>>>> potential
>>>> for innovation using the chunk format, and it has the advantage that
>>>> most people have the tools to read it already.
>>>
>>> It's incredibly flexible indeed, and goes back to the B5000 that
>>> inspired Smalltalk.
>>
>> Interesting. Having worked with almost all B5000 series and successors
>> and still working with them, what's the chunk format on them (besides
>> that it has no naked memory pointers, but descriptors)?
>
> Maybe I should have wrote "the idea goes back", not chunk format itself.
> The idea to have the data itself specify how to be processed comes from
> there - the tapes had a loader program in front that reads the rest of
> the tape. I don't know much about the details.

Ah, the clear/start & halt/load tapes :) still in use in emergency
situations, when disks are down. Will tell my colleagues about
clar/start's usage in Smalltalk code file chunks :)

> - Bert -
>
>
>
>

Jecel Assumpcao Jr

Re: DeltaStreams file-out format and class model

Klaus D. Witzel wrote:

> On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote:
> > On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote:
> >> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:
> >>> It's incredibly flexible indeed, and goes back to the B5000 that
> >>> inspired Smalltalk.
> >>
> >> Interesting. Having worked with almost all B5000 series and successors
> >> and still working with them, what's the chunk format on them (besides
> >> that it has no naked memory pointers, but descriptors)?
> >
> > Maybe I should have wrote "the idea goes back", not chunk format itself.
> > The idea to have the data itself specify how to be processed comes from
> > there - the tapes had a loader program in front that reads the rest of
> > the tape. I don't know much about the details.
>
> Ah, the clear/start & halt/load tapes :) still in use in emergency
> situations, when disks are down. Will tell my colleagues about
> clar/start's usage in Smalltalk code file chunks :)

I think Bert might have been thinking of the Burroughs 220 tape format
used in Air Training Command installation which was mentioned by Alan
Kay in his "The Early History Of Smalltalk".

Since most problems in computing can be solved by adding another level
of indirection, it might be nice to extend the chunk format so that when
you got an error while trying to create a reader due to missing classes
there were some hint on how to fix the problem automatically (by
downloading another file).

-- Jecel

Klaus D. Witzel

Chunk format [was: DeltaStreams file-out format and class model]

In reply to this post by Klaus D. Witzel

On Wed, 10 Oct 2007 17:12:06 +0200, Jecel Assumpcao Jr wrote:

> Klaus D. Witzel wrote:
>> On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote:
>> > On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote:
>> >> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:
>> >>> It's incredibly flexible indeed, and goes back to the B5000 that
>> >>> inspired Smalltalk.
>> >>
>> >> Interesting. Having worked with almost all B5000 series and
>> successors
>> >> and still working with them, what's the chunk format on them (besides
>> >> that it has no naked memory pointers, but descriptors)?
>> >
>> > Maybe I should have wrote "the idea goes back", not chunk format
>> itself.
>> > The idea to have the data itself specify how to be processed comes
>> from
>> > there - the tapes had a loader program in front that reads the rest of
>> > the tape. I don't know much about the details.
>>
>> Ah, the clear/start & halt/load tapes :) still in use in emergency
>> situations, when disks are down. Will tell my colleagues about
>> clar/start's usage in Smalltalk code file chunks :)
>
> I think Bert might have been thinking of the Burroughs 220 tape

EletroData's Datatron 220?

- http://special.lib.umn.edu/findaid/ead/cbi/cbi00090-036.xml

> format
> used in Air Training Command installation which was mentioned by Alan
> Kay in his "The Early History Of Smalltalk".

Thanks for the reference Jecel. Alas, the clear/start tapes (small+medium
systems) and halt/load tapes (large systems) concept survived the whole
Burroughs story and so have the chunks to which Bert referred to :)

> Since most problems in computing can be solved by adding another level
> of indirection, it might be nice to extend the chunk format so that when
> you got an error while trying to create a reader due to missing classes
> there were some hint on how to fix the problem automatically (by
> downloading another file).

Yeah, that's always fascinating with software: adding another level of
complexity is always possible, whereas subtracting one might trouble you
with existence problems :)

/Klaus

> -- Jecel
>
>