This is an IRC discussion I am moving to the mailing list
the current DeltaStreams file-out format is monolithic, and can be only loaded/saved as one chunk. It is a gzipped DataStream of the Delta model. An idea is to base it on a logging framework. where composite changes could be rendered as: open composite change. add change. add change. close composite change. Much like xml/sexp formats. Not sure if the chunk format could do this This would enable streaming loading and saving of deltas, rather than the all-at-once load/save as is done now. Keith Hodges replied: > please please use the chunk format > or as I have suggested a form that may not look like the chunk format but can behave like it > I myself didnt think that DS needed a class model > just heve defined operations on a System Editor > foled out in chunk format > filed* > i.e.,,, change method > chunk of code which assigns the method change to aystemEditor A > and a second chunk of code which assigns the inverse to system editor B > thats you delta stream First, I am open to any suggestion about a better change model. Having 34 subclasses of DSChange be the model does seem messy to me. I know there should be something better, but I havn't thought of it yet, except that it would probably be vaguely pier-like Some questions: What is a "defined operation on a SystemEditor" other than a class model? What do you mean by "the chunk format" or "like the chunk format"? Do you mean "able to be used as a CompiledMethod sourcePointer"? Or do you mean "some format that is a smalltalk expression that generates something"? I don't really understand the chunk format; by having old and new version, any fileout of Deltas would not look like a file-out or change set, even if it did use something parsable by the chunk reader. The chunk format also has the complication/liberty of custom stream parsers Would this be like the chunk format you speak of? | editor classEditor| editor := SystemEditor new ! classEditor := editor at: Object ! classEditor compile: "methodZ ^ self" classified: #'junk methods'! -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
Hello Matthew,
> This is an IRC discussion I am moving to the mailing list > > the current DeltaStreams file-out format is monolithic, and can be only loaded/saved as one chunk. It is a gzipped DataStream of the Delta model. This is essentially the same idea as Monticello. Classes modelling each type of change albeit at a higher level of granularity than DS, saved/loaded as a dataStream. When loading, Monticello takes the monolithic file out of the model and does some analysis to establish safest load order and stuff that is not needed. Monticello is quite smart since it only loads what has changed. Datastream based formats like this are inherently inflexible. If you want to use this format due to its simplicity and universal availability then you either need to provide for future expansion. Or have some notion of format version. Supporting all possible versions could then be difficult if this depends upon class definitions, since it is presently difficult to have more than one definition of a class in the image at the same time. I think that the easiest way of providing for future expansion is to have a slot reserved for a 'properties' dictionary in the base class DSChange. This is the approach I have taken for releasing Monticello from its fixed class layout, except of course it is somewhat more difficult to add after the fact. Additional instvar requirements can be placed in there rather than changing the class format. Another possibility would be to simply load all data elements into a Dictionary and file that out instead. > An idea is to base it on a logging framework. where composite My logging framework is designed to be the coders-interface between the placing of a debugging statement in their code , and the choice of back-end log to disk framework, of which the squeak-dev universe offers 3 different ones. So as it stands I dont think my logging stiff is the right tool for this, but a variant could be. There is no reason by actual useful bits of data/code cannot be sent to logs. e.g. In perl it is common practice to use Data Dumper to write out complete data structures to server logs. The form it is writen in can be eval-ed to restore the data structure. > changes could be rendered as: > > open composite change. > add change. > add change. > close composite change. > > Much like xml/sexp formats. Not sure if the chunk format could > do this > > This would enable streaming loading and saving of deltas, rather > than the all-at-once load/save as is done now > Keith Hodges replied: > >> please please use the chunk format >> Some people love it some people hate it. I love it, it is ultimately flexible, thats what I like about it. The simplicity to power ratio is potentially very good. If I recall correctly the default behaviour begins such that the first chunk is read and evaluated by the compiler, the result being a reader which (by convention) reads the next chunk and so on. Typically when a reader finds an empty chunk it returns, resetting to the initial reader which restarts the process. Those who want to make models out of the data records don't like it because it is flexible enough to include anything, so the content cannot necessarily be guaranteed to be readable by anything other than the Compiler. Given the flexibility of the chunk reading idea, I am surprised that we have not seen much innovation around it in improving fileOuts etc. One advantage being that chunks can do anything, so you could record and file out an executable representaion of any action even such things as "pasting an image into the environment" since chunks can include encoded binary data if preceded by the appropriate decoding reader. >> I myself didnt think that DS needed a class model >> just heve defined operations on a System Editor >> foled out in chunk format >> filed* >> i.e.,,, change method >> chunk of code which assigns the method change to aystemEditor A >> and a second chunk of code which assigns the inverse to system editor B >> thats you delta stream >> > > First, I am open to any suggestion about a better change model. > Having 34 subclasses of DSChange be the model does seem messy to > me. I know there should be something better, but I havn't > where one models with classes and instances. > thought of it yet, except that it would probably be vaguely > > pier-like > I am not sure I understand that statement. > Some questions: > What is a "defined operation on a SystemEditor" other than a class model? > It is simply generated source code which performs an operation, In this case the receiver is a model of the Smalltalk environment. So what DS models as "Add an instance var 'newVar' to Class MyClass" can be persisted as. (CurrentSystemEditor value at: #MyClass) addInstVarName: 'newVar'. One problem with chunks is that remapping Globals is not straight forward. However, I am a fan of ProcessSpecific variables and I think that they could help in this, since "CurrentSystemEditor value" would be determined at runtime, and could have a different value in each process that is using it. > What do you mean by "the chunk format" or "like the chunk format"? > Do you mean "able to be used as a CompiledMethod sourcePointer"? > see above. > Or do you mean "some format that is a smalltalk expression that generates something"? > > I do, but the chunk format is not limited to that. > I don't really understand the chunk format; by having old and > new version, any fileout of Deltas would not look like a > file-out or change set, even if it did use something parsable by > the chunk reader. The chunk format also has the > indeed it would look like a deltastream. > complication/liberty of custom stream parsers > > Would this be like the chunk format you speak of? > | editor classEditor| > editor := SystemEditor new ! > classEditor := editor at: Object ! > classEditor compile: "methodZ ^ self" classified: #'junk methods' I imagine you would need... A header chunk, to set up the SystemEditors, one for the forward direction one for the reverse (although I suspect it may be possible to have one do both.) ! CurrentSystemEditor value: SystemEditor new. ! ! Action Chunks. ! CurrentSystemEditor value addInstVarName: 'a' ! ! "individual statements" ! CurrentSystemEditor value inverseEditor removeInstVarName: 'a' ! ! Although many may not agree with me I think there is a lot of potential for innovation using the chunk format, and it has the advantage that most people have the tools to read it already. regards Keith |
On Wed, Oct 10, 2007 at 06:15:51AM +0100, Keith Hodges wrote:
> > the current DeltaStreams file-out format is monolithic, and > > can be only loaded/saved as one chunk. It is a gzipped > > DataStream of the Delta model. > > Datastream based formats like this are inherently inflexible. > If you want to use this format due to its simplicity and > universal availability then you either need to provide for > future expansion. Or have some notion of format version. > Supporting all possible versions could then be difficult if > this depends upon class definitions, since it is presently > difficult to have more than one definition of a class in the > image at the same time. Indeed it is quite inflexible. I plan to change it. > I think that the easiest way of providing for future expansion > is to have a slot reserved for a 'properties' dictionary in > the base class DSChange. This is the approach I have taken for > releasing Monticello from its fixed class layout, except of > course it is somewhat more difficult to add after the fact. > Additional instvar requirements can be placed in there rather > than changing the class format. DSDelta already has this, however, DSChange does not. DSChange should have this > Another possibility would be to simply load all data elements > into a Dictionary and file that out instead. Could you elaborate? > > An idea is to base it on a logging framework. where > > composite > My logging framework is designed to be the coders-interface > between the placing of a debugging statement in their code , > and the choice of back-end log to disk framework, of which the > squeak-dev universe offers 3 different ones. > > So as it stands I dont think my logging stiff is the right > tool for this, but a variant could be. There is no reason by > actual useful bits of data/code cannot be sent to logs. e.g. > In perl it is common practice to use Data Dumper to write out > complete data structures to server logs. The form it is writen > in can be eval-ed to restore the data structure. > > changes could be rendered as: > > > > open composite change. add change. add change. close > > composite change. > > > > Much like xml/sexp formats. Not sure if the chunk format > > could do this > > > > This would enable streaming loading and saving of deltas, > > rather than the all-at-once load/save as is done now > > > Keith Hodges replied: > > > >> please please use the chunk format > >> > Some people love it some people hate it. > > I love it, it is ultimately flexible, thats what I like about > it. The simplicity to power ratio is potentially very good. > > If I recall correctly the default behaviour begins such that > the first chunk is read and evaluated by the compiler, the > result being a reader which (by convention) reads the next > chunk and so on. Typically when a reader finds an empty chunk > it returns, resetting to the initial reader which restarts > the process. My understanding of the chunk format is from http://wiki.squeak.org/squeak/1105 > Those who want to make models out of the data records don't > like it because it is flexible enough to include anything, so > the content cannot necessarily be guaranteed to be readable by > anything other than the Compiler. > > Given the flexibility of the chunk reading idea, I am > surprised that we have not seen much innovation around it in > improving fileOuts etc. > > One advantage being that chunks can do anything, so you could > record and file out an executable representaion of any action > even such things as "pasting an image into the environment" > since chunks can include encoded binary data if preceded by > the appropriate decoding reader. Indeed. I did this in my second attempt at a simple Delta fileout format. I put a static decoder chunk at the beginning of the file, followed by the gzipped datastream. I dropped it when I noticed that the chunk reader was just noice to the actual content, which was in the gzipped datastream. My first attempt was an expression that, when evaluated, yielded the Delta. I found out that the compiler and image die very ungracefully when asked to evaluate a 3000-line long expression/statement. Here are some things I don't see how to do with the chunk format: 1. Read chunks in reverse order. This is absolutely essential when reverting a delta. 2. Pass arguments to a chunk file. For example, how could I ensure that the Compiler, while parsing a chunk file, sends all commands through a certain visitor, depending on whether I want to 1. Find all conflicting deltas 2. apply non-conflicting deltas 3. collect all deltas of one package and do something with them 3. define a chunk heiarchy, such as one chunk containing and being able to manipulate an delimited set of the next several chunks. This would be very useful in storing composite changes, and in delimiting the individual Deltas in a DeltaStream. This may be doable by returning a chunk reader from a reader chunk, but I don't know if there would be a way for a recursed chunk reader to recognize the end of its substream. > > First, I am open to any suggestion about a better change > > model. Having 34 subclasses of DSChange be the model does > > seem messy to me. I know there should be something better, > > but I havn't > > > I guess that is an inevitable outcme of modelling in an > environment where one models with classes and instances. Let's hope not! > > thought of it yet, except that it would probably be vaguely > > > > pier-like > > > I am not sure I understand that statement. DSChange and friends mix together operation (add remove change move), context (class, method, class organization, system organization), and subject (ivar, method source, comment, timestamp, category, etc.) at the class definition level. Examples: DSMethodAdded (operation: add; context: aClass; subject: aMethod) DSMethodRemoved (operation: remove; context: aClass; subject: aMethod) DSMethodSourceChange (operation: change; context: aMethod; subject: source, timestamp) On the other hand, Pier separates these concepts a bit. Operations are the task of a few very generic PRCommands (PRAddCommand, PREditCommand, PRRemoveCommand, PRMoveCommand). Commands operate, as I understand it, on PRStructures, which are both a Context (a PRPath, which looks up a context), and a subject. I am vague on the details, but it is a praiseworthy model, since it recieves a lot of praise :). > > Some questions: What is a "defined operation on a > > SystemEditor" other than a class model? > > > It is simply generated source code which performs an > operation, In this case the receiver is a model of the > Smalltalk environment. > > So what DS models as "Add an instance var 'newVar' to Class > MyClass" can be persisted as. > > (CurrentSystemEditor value at: #MyClass) addInstVarName: > 'newVar'. > > One problem with chunks is that remapping Globals is not > straight forward. However, I am a fan of ProcessSpecific > variables and I think that they could help in this, since > "CurrentSystemEditor value" would be determined at runtime, > and could have a different value in each process that is using > it. I don't know what you mean by "remapping Globals" > > Would this be like the chunk format you speak of? > > | editor classEditor| > > editor := SystemEditor new ! > > classEditor := editor at: Object ! classEditor compile: "methodZ ^ self" classified: #'junk methods' > I imagine you would need... > > A header chunk, to set up the SystemEditors, one for the > forward direction one for the reverse (although I suspect it > may be possible to have one do both.) > > ! CurrentSystemEditor value: SystemEditor new. ! ! > > Action Chunks. > > ! CurrentSystemEditor value addInstVarName: 'a' ! ! > "individual statements" ! CurrentSystemEditor value > inverseEditor removeInstVarName: 'a' ! ! > > Although many may not agree with me I think there is a lot of > potential for innovation using the chunk format, and it has > the advantage that most people have the tools to read it > already. All applications using the chunk format, so far, have ran into the problem that if you need to do something not done by the code in the chunk stream, you need to abandon the chunk format and resort to manually parsing it to get back to the objects you started with, or something more manipulable. For instance, one can apply change sets using the built-in chunk reader, but to open a change list or change browser, one must heuristically parse the file (see ChangeList protocol scanning, for instance) This may be a limit of the file-out format, though, and not of the underlying chunk format. A declarative model and matched visitor is currently the kernel of DeltaStreams, so I don't see the chunk format as working for the current model. If the model were more like the pier model, I think the chunk format may be better suited, as the pier model is not so declarative imho -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 |
Hi guys!
No time to dwell in detail but I wonder two things: 1. I have always envisioned a Delta to be first *read* into the image and then *after* that - either applied or reverted or whatever. So the "reading" part would ONLY instantiate the totally self contained object graph of the delta. And thus - why would reading chunks in reverse be interesting? 2. Again, since IMHO the reading part should ONLY build the graph - why would the chunk format be a bad choice? I always envisioned the simplest and most flexible chunk format to be first some code to get a reader (making it easy to hook in alternate readers) and then simply "feed" that reader by sending messages to it that builds the Delta. In essence it probably would boil down to one message per change (roughly) object. Now... that doesn't expose any internal structure (ivars of DSChange classes etc) so... as long as the readers grok the original set of messages - how would that be problematic when it comes to schema evolution? Lastly - I like the current balance in the DSChange hierarchy between "abstractness" and "concreteness" - I really don't want to make it more abstract or generic, as a step towards the Pier model sounds like in my ears. I like code that I can touch, feel and understand. I don't like code which makes me feel I don't ever get to see "the meat" of the action. :) Which btw is why I sometimes feel slightly dizzy when the Visitor pattern gets overused. Again in the DSChange hierarchy I wrote a visitor mechanism but intentionally kept it a bit more intention revealing - and yes, thus I did not FULLY exploit the genericness of the Visitor pattern - but that was actually on purpose. Well, Matthew knows what I mean I guess. :) regards, Gran |
In reply to this post by keith1y
On Oct 10, 2007, at 7:15 , Keith Hodges wrote:
> If I recall correctly the default behaviour begins such that the first > chunk is read and evaluated by the compiler, the result being a reader Not quite. A chunk is everything up to the next bang (!). A chunk is simply evaluated. Only if an empty chunk is seen, i.e., it starts with a bang, the next chunk is taken as a reader definition. > which (by convention) reads the next chunk and so on. Typically when a > reader finds an empty chunk it returns, resetting to the initial > reader > which restarts the process. For source code readers, yes. > chunks can include encoded > binary data if preceded by the appropriate decoding reader. It can even be raw binary data, not encoded. It's the reader's responsibility to deal with what follows. > A header chunk, to set up the SystemEditors, one for the forward > direction one for the reverse (although I suspect it may be > possible to > have one do both.) > > ! CurrentSystemEditor value: SystemEditor new. ! ! No, that would be a reader chunk. The correct way is CurrentSystemEditor value: SystemEditor new. ! > Action Chunks. > > ! CurrentSystemEditor value addInstVarName: 'a' ! ! "individual > statements" > ! CurrentSystemEditor value inverseEditor removeInstVarName: 'a' ! ! Again - you should use plain chunks for that, not reader chunks. > Although many may not agree with me I think there is a lot of > potential > for innovation using the chunk format, and it has the advantage that > most people have the tools to read it already. It's incredibly flexible indeed, and goes back to the B5000 that inspired Smalltalk. - Bert - |
On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote:
> On Oct 10, 2007, at 7:15 , Keith Hodges wrote: [...] >> Although many may not agree with me I think there is a lot of potential >> for innovation using the chunk format, and it has the advantage that >> most people have the tools to read it already. > > It's incredibly flexible indeed, and goes back to the B5000 that > inspired Smalltalk. Interesting. Having worked with almost all B5000 series and successors and still working with them, what's the chunk format on them (besides that it has no naked memory pointers, but descriptors)? /Klaus > - Bert - > > > > |
On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote: > On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote: > >> On Oct 10, 2007, at 7:15 , Keith Hodges wrote: > [...] >>> Although many may not agree with me I think there is a lot of >>> potential >>> for innovation using the chunk format, and it has the advantage that >>> most people have the tools to read it already. >> >> It's incredibly flexible indeed, and goes back to the B5000 that >> inspired Smalltalk. > > Interesting. Having worked with almost all B5000 series and > successors and still working with them, what's the chunk format on > them (besides that it has no naked memory pointers, but descriptors)? Maybe I should have wrote "the idea goes back", not chunk format itself. The idea to have the data itself specify how to be processed comes from there - the tapes had a loader program in front that reads the rest of the tape. I don't know much about the details. - Bert - |
On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote:
> > On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote: > >> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote: >> >>> On Oct 10, 2007, at 7:15 , Keith Hodges wrote: >> [...] >>>> Although many may not agree with me I think there is a lot of >>>> potential >>>> for innovation using the chunk format, and it has the advantage that >>>> most people have the tools to read it already. >>> >>> It's incredibly flexible indeed, and goes back to the B5000 that >>> inspired Smalltalk. >> >> Interesting. Having worked with almost all B5000 series and successors >> and still working with them, what's the chunk format on them (besides >> that it has no naked memory pointers, but descriptors)? > > Maybe I should have wrote "the idea goes back", not chunk format itself. > The idea to have the data itself specify how to be processed comes from > there - the tapes had a loader program in front that reads the rest of > the tape. I don't know much about the details. Ah, the clear/start & halt/load tapes :) still in use in emergency situations, when disks are down. Will tell my colleagues about clar/start's usage in Smalltalk code file chunks :) > - Bert - > > > > |
Klaus D. Witzel wrote:
> On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote: > > On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote: > >> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote: > >>> It's incredibly flexible indeed, and goes back to the B5000 that > >>> inspired Smalltalk. > >> > >> Interesting. Having worked with almost all B5000 series and successors > >> and still working with them, what's the chunk format on them (besides > >> that it has no naked memory pointers, but descriptors)? > > > > Maybe I should have wrote "the idea goes back", not chunk format itself. > > The idea to have the data itself specify how to be processed comes from > > there - the tapes had a loader program in front that reads the rest of > > the tape. I don't know much about the details. > > Ah, the clear/start & halt/load tapes :) still in use in emergency > situations, when disks are down. Will tell my colleagues about > clar/start's usage in Smalltalk code file chunks :) I think Bert might have been thinking of the Burroughs 220 tape format used in Air Training Command installation which was mentioned by Alan Kay in his "The Early History Of Smalltalk". Since most problems in computing can be solved by adding another level of indirection, it might be nice to extend the chunk format so that when you got an error while trying to create a reader due to missing classes there were some hint on how to fix the problem automatically (by downloading another file). -- Jecel |
In reply to this post by Klaus D. Witzel
On Wed, 10 Oct 2007 17:12:06 +0200, Jecel Assumpcao Jr wrote:
> Klaus D. Witzel wrote: >> On Wed, 10 Oct 2007 12:35:20 +0200, Bert Freudenberg wrote: >> > On Oct 10, 2007, at 11:54 , Klaus D. Witzel wrote: >> >> On Wed, 10 Oct 2007 10:37:44 +0200, Bert Freudenberg wrote: >> >>> It's incredibly flexible indeed, and goes back to the B5000 that >> >>> inspired Smalltalk. >> >> >> >> Interesting. Having worked with almost all B5000 series and >> successors >> >> and still working with them, what's the chunk format on them (besides >> >> that it has no naked memory pointers, but descriptors)? >> > >> > Maybe I should have wrote "the idea goes back", not chunk format >> itself. >> > The idea to have the data itself specify how to be processed comes >> from >> > there - the tapes had a loader program in front that reads the rest of >> > the tape. I don't know much about the details. >> >> Ah, the clear/start & halt/load tapes :) still in use in emergency >> situations, when disks are down. Will tell my colleagues about >> clar/start's usage in Smalltalk code file chunks :) > > I think Bert might have been thinking of the Burroughs 220 tape EletroData's Datatron 220? - http://special.lib.umn.edu/findaid/ead/cbi/cbi00090-036.xml > format > used in Air Training Command installation which was mentioned by Alan > Kay in his "The Early History Of Smalltalk". Thanks for the reference Jecel. Alas, the clear/start tapes (small+medium systems) and halt/load tapes (large systems) concept survived the whole Burroughs story and so have the chunks to which Bert referred to :) > Since most problems in computing can be solved by adding another level > of indirection, it might be nice to extend the chunk format so that when > you got an error while trying to create a reader due to missing classes > there were some hint on how to fix the problem automatically (by > downloading another file). Yeah, that's always fascinating with software: adding another level of complexity is always possible, whereas subtracting one might trouble you with existence problems :) /Klaus > -- Jecel > > |
Free forum by Nabble | Edit this page |