Hi folks!
Wanted to give interested people (if there are any!) a heads up on Deltastreams development. History ======= I started the Deltastreams "project" a few years back, there is quite a bit of info on the wiki and even a movie from OOPSLA where I present it and demo it. The idea of Deltas and streams of them is an evolution of the old changeset and update stream. The concept and idea also borrows from experience in using MC (which is not a direct competitor) and other distributed SCMs outside of Squeak. Think of a Delta as a "super changeset". The streams part has not been coded on yet. Work ==== After a while Matthew Fulmer started helping me with the code and he has done a LOT on the code base including lots and lots more tests, lots of fixes in SystemEditor (from Colin Putney, used in MC2 I think) which we depend on. In fact, the Deltastreams codebase was probably first out in stressing SystemEditor. Matthew also created ICS - an advanced file format for Deltas. Matthew has lately been working in MC a lot, which gives Matthew a unique perspective that I don't have. Matthew is also involved a lot in Croquet - which is one primary potential fork to use DS with. I have started working the last days again and the "itch" is back for real. :) I am focusing on the "replace changesets" part and the next step for me is probably to make a "dual change sorter"-like UI and a new file format (see below). And make tests green. And also make ICS format work. :) Code today ========== Deltastreams is hosted on SS. I currently develop it in 3.10.2, dependencies are SystemEditor and InterleavedChangeset (ICS). Both of them could be replaced with other packages taking those roles ("file format for Deltas" and "tool to atomically apply code changes to a live image"). We want the code to have very little dependencies and to work in "all" Squeaks. Status ====== We have lots of broken tests right now, and I intend to make it all GREEN and keep it that way. We have been sloppy and have added lots of tests without implementing them - this tactic works for a while but when the code base gets complex they really need to be GREEN. Otherwise you lose the ability to see if you actually broke something :). The good part is that there are about 420 tests, and lots of aspects of Deltas are thoroughly covered. Logging, applying and reverting Deltas (code mechanisms) are 99% working. Currently I think the only bit missing is category reorganization. The ICS file format is partially working, I haven't gotten into the code base fully yet - the format is very "clever" which may be its main problem. It tries to do a really cool trick - being compatible with Changesets! Or in other words, the same file contains both a binary representation of a Delta that Deltastream code uses AND a changeset representation that old images can use. This means that an ICS file can be filed into an old image without ANY modification to that image. It then simply looks like a changeset. There is a UI built by Matthew that works on SystemEditor "models", I know too little of its status right now. I intend to build another complementary UI working much more like the "dual change sorter". A new format ============ ICS is cool. :) But... sorry Matthew, I think I will spend some time on another format for Deltas too. One that is NOT backwards compatible in that way. This is an area I really want some feedback on! Both on making another format available and what that format would be. :) I would like this "native" Delta format to be: - Human readable, just like a cs. We just gzip them and make up some nice extension like .dz or something. :) - Editable in a text editor. This means it can not be too complex. - Easy to extend. This means the base syntax should leave room for new elements and "relaxed parsing" that can ignore unknown elements - Very easy to parse. This means it needs to be simple, simple, simple. I don't want to depend on YAXO or similarly large package for parsing. - Not "compiler driven". I want the format to be safe and fast to load. This means the regular Smalltalk Compiler is out of the picture. My current idea of a format that I think covers the above is: JSON ...possibly using netstrings for source code (thus not strictly JSON). JSON offers a very readable "XML-ish" generic format that is very easy to parse and produce. It can be easily edited in a text editor if needed. It is compact. If used correctly it should be easy to extend. One substantial part of the file will be Smalltalk source code. I am not keen on having to do character-by-character escaping to comply with JSON Strings though... thus - netstrings. A netstring is a trivial construct: <length-in-ascii> ":" <binary-data> "," For example: 11:Sentence of thirty characters., Which then would be used for the source code. Advantages would be not having to do character-by-character escaping. Is this worth "breaking" JSON? Hmmm, thinking more about it I think we need to "break it" anyway, because a JSON String can't contain a CR. :) Ok, sorry for the long post. regards, Göran |
On Thu, Mar 12, 2009 at 5:28 AM, Göran Krampe <[hidden email]> wrote:
> - Human readable, just like a cs. We just gzip them and make up some nice > extension like .dz or something. :) > - Editable in a text editor. This means it can not be too complex. > - Easy to extend. This means the base syntax should leave room for new > elements and "relaxed parsing" that can ignore unknown elements > - Very easy to parse. This means it needs to be simple, simple, simple. I > don't want to depend on YAXO or similarly large package for parsing. > - Not "compiler driven". I want the format to be safe and fast to load. This > means the regular Smalltalk Compiler is out of the picture. I don't understand the combination of "human readable" and "not compiler driven". Does this mean that you are going to include bytecodes as well as source code? If the file is editable in a text editor, then it will be easy to make the bytecodes be incompatible with the source code. Maybe your definition of "human readable" and "editable in a text editor" are different from mine. Perhaps you only intend human readability as a last resort, as a way of debugging the system, for example. Netstrings aren't very editable, because you have to keep counting characters every time you change a line. But if editability is only for extreme emergencies, this is OK. -Ralph Johnson |
Hi!
Ralph Johnson wrote: > On Thu, Mar 12, 2009 at 5:28 AM, Göran Krampe <[hidden email]> wrote: >> - Human readable, just like a cs. We just gzip them and make up some nice >> extension like .dz or something. :) >> - Editable in a text editor. This means it can not be too complex. >> - Easy to extend. This means the base syntax should leave room for new >> elements and "relaxed parsing" that can ignore unknown elements >> - Very easy to parse. This means it needs to be simple, simple, simple. I >> don't want to depend on YAXO or similarly large package for parsing. >> - Not "compiler driven". I want the format to be safe and fast to load. This >> means the regular Smalltalk Compiler is out of the picture. > > I don't understand the combination of "human readable" and "not > compiler driven". Does this mean that you are going to include > bytecodes as well as source code? No. :) > If the file is editable in a text > editor, then it will be easy to make the bytecodes be incompatible > with the source code. No bytecodes, see below. > Maybe your definition of "human readable" and "editable in a text > editor" are different from mine. Perhaps you only intend human > readability as a last resort, as a way of debugging the system, for > example. No, I would like them to be "fairly readable". And I do want them to be quite easily "generated" and fiddled with (although I don't really expect them to be generated or edited by many other tools than the DS codebase itself). I think your mentioning of bytecodes comes from a misconception - a Delta carries source code, but it is *not* compiled when loaded into an image. The compilation is done in the "next step" - when you "apply" the Delta to the image (using SystemEditor to do it atomically). So the file is actually "just a serialization of a DSDelta instance". Since a DSDelta instance is basically a sequence of DSChange instances, and the most common one is a DSMethodModifiedChange - it kinda will look like a changeset, although much more "detailed" in nature. But from a usage perspective we could even dump them using SmartRefStream - since we first load them into the image - but I do think a "real" format has its advantages. Btw, what I meant with "Compiler driven" was how the chunk format works: http://wiki.squeak.org/squeak/1105 ...but I don't think I want an "executable" format like this. > Netstrings aren't very editable, because you have to keep > counting characters every time you change a line. But if editability > is only for extreme emergencies, this is OK. Good point, hadn't thought about that weakness of netstrings. Hmmmm. Perhaps Yaml is worth looking at? Need to figure out its advantages/disadvantages. regards, Göran |
In reply to this post by Ralph Johnson
Ralph Johnson <johnson <at> cs.uiuc.edu> writes:
> > On Thu, Mar 12, 2009 at 5:28 AM, Göran Krampe <goran <at> krampe.se> wrote: > > > - Human readable, just like a cs. We just gzip them and make up some nice > > extension like .dz or something. :) > > - Editable in a text editor. This means it can not be too complex. > > - Easy to extend. This means the base syntax should leave room for new > > elements and "relaxed parsing" that can ignore unknown elements > > - Very easy to parse. This means it needs to be simple, simple, simple. I > > don't want to depend on YAXO or similarly large package for parsing. > > - Not "compiler driven". I want the format to be safe and fast to load. This > > means the regular Smalltalk Compiler is out of the picture. > > I don't understand the combination of "human readable" and "not > compiler driven". Does this mean that you are going to include > bytecodes as well as source code? If the file is editable in a text > editor, then it will be easy to make the bytecodes be incompatible > with the source code. > > Maybe your definition of "human readable" and "editable in a text > editor" are different from mine. Perhaps you only intend human > readability as a last resort, as a way of debugging the system, for > example. Netstrings aren't very editable, because you have to keep > counting characters every time you change a line. But if editability > is only for extreme emergencies, this is OK. > > -Ralph Johnson > > Agree, double representation seems dangerous to me... Is it about transporting bytcodes from an image to another? Source might be portable. Bytecodes a little less... Not speaking of Eliot changes, a single addition of an instance variable in an upper class in your image would make importing bytecodes from another image very hard. Or I misunderstood something... Is it about storing upper level object abstractions in the stream? In this case, I dont' feel comfortable with the static view of an object tree: my experience is that it is a nightmare to maintain upward compatibility when classes have to evolve, and for sure they will. For my major application in VW, 20 years ago, I started with BOSS and this kind of thing. Textual representation of objects (like JSON, XML, whatever...) though readable, would not give me much better results. Guess what, the best syntax I ended up with was: variable := receiver selector arguments. Yes, my objects where fully saved as a script in Smalltalk syntax that could be filed in later. I chose a list of messages (an API) that construct complex object graphs corresponding to my (evolving) model, rather than low level inst var setters. I prefer to store the recipe rather than storing the cake. Much less upward compatibility problem, just had to maintain the API... Then there is the problem of inter-mixing source code and other abstractions in the same Stream. A well known problem in shell scripts, perl scripts etc... cat >$FILE <<END ... END For my problem, I ended up with a Compiler/Parser subclass for dealing with a small reflective addition: thisFile (like there is a thisContext). This could give some sentence like: Compiler compile: thisFile nextChunk in: MyClass. myMessage ^myInstVar doSomething! Of course, you could as well write thisFile nextCodeBetweenBracket.... [myMessage ^myInstVar doSomething] The reader is as simple as: [thisFile atEnd] whileFalse: [FileParser parseNextSentenceFrom: thisFile]... This is a very open, reflexive, Smalltalkish way of doing things. For example, I did not impose any ASCII,UTF8,16,32 encoding to my files. Instead I kept the format open because I would just have written in first sentences: thisFile encoding: #ISO8859L1. Do you think such a format would match your needs? Nicolas |
In reply to this post by Göran Krampe
> I think your mentioning of bytecodes comes from a misconception - a Delta
> carries source code, but it is *not* compiled when loaded into an image. The > compilation is done in the "next step" - when you "apply" the Delta to the > image (using SystemEditor to do it atomically). Yes, I misunderstood what you meant by "compiler driven" and that you thought it was OK to use the compiler to apply the delta, but wanted to read the delta into the image without using a compiler. Given that, I agree that JSON is a good match for your needs. I don't think you should use netstrings. -Ralph |
Ralph Johnson wrote:
>> I think your mentioning of bytecodes comes from a misconception - a Delta >> carries source code, but it is *not* compiled when loaded into an image. The >> compilation is done in the "next step" - when you "apply" the Delta to the >> image (using SystemEditor to do it atomically). > > Yes, I misunderstood what you meant by "compiler driven" and that you > thought it was OK to use the compiler to apply the delta, but wanted > to read the delta into the image without using a compiler. > > Given that, I agree that JSON is a good match for your needs. I don't > think you should use netstrings. Right, now I started looking into Yaml which solves a few "issues" that JSON has but on the other hand - the spec is huge and it is not as widely known/understood. Problems with JSON: - No "typing". For this use we probably don't care, we can just "type" our objects ourselves using some kind of String key. Yaml has some type system, not sure if it is nice. - Not so good handling of Strings. This was why I was thinking about netstrings. In fact, this one is a killer in some ways, if we can't have CRs in the file then we surely can't get "readable code". Yaml has "literal blocks" which means we can easily have readable code. Problems with Yaml: - Spec is HUGE. Even though we could use a handwritten parser/producer and only use exactly the subset we need (approximately JSON subset), it is still hard for people to understand etc. And not nearly as many implementations in other languages around. The code for Json in Squeak (available on SS) is a single class, and the code is trivial. We could probably extend it to deal with "verbatim source" - but then it would not be standard Json anymore. Hmmm. :) I am leaning towards a handwritten Yaml parser/producer that only uses a tiny, tiny subset of Yaml that we need. It would then be "true Yaml" although a subset, the code can probably be made as small. Found a very interesting blog article comparing the two: http://blog.ingy.net/2007/05/yaml-and-json.html regards, Göran |
On Thu, Mar 12, 2009 at 9:35 AM, Göran Krampe <[hidden email]> wrote:
> - Not so good handling of Strings. This was why I was thinking about > netstrings. In fact, this one is a killer in some ways, if we can't have CRs > in the file then we surely can't get "readable code". Yaml has "literal > blocks" which means we can easily have readable code. You are thinking of a method as being the smallest element. Instead, make a method be a sequence of lines. A line is basically everything between a CR. This will make it a little less readable, but not too much. -Ralph |
Ralph Johnson <johnson <at> cs.uiuc.edu> writes:
> > On Thu, Mar 12, 2009 at 9:35 AM, Göran Krampe <goran <at> krampe.se> wrote: > > > - Not so good handling of Strings. This was why I was thinking about > > netstrings. In fact, this one is a killer in some ways, if we can't have CRs > > in the file then we surely can't get "readable code". Yaml has "literal > > blocks" which means we can easily have readable code. > > You are thinking of a method as being the smallest element. Instead, > make a method be a sequence of lines. A line is basically everything > between a CR. This will make it a little less readable, but not too > much. > > -Ralph > > I don't understand why we should bother with YAML, XML, JSON... And why to limit ourself to a static description of things? Why don't we consider using plain Smalltalk Syntax, and plain Smalltalk power? The MESSAGES. What about letting DeltaStream be a list of executable Smalltalk sentences, not a list of lines delimited with CR, nor a list of chunks... The Parser is already equipped to parse a Smalltalk sentence, so let define a subclass FileParser (or StreamParser) that would: 1) parse and execute sequentially sentence by sentence. 2) declare every Undeclared variable as a global in the file scope (like a Workspace). 3) predefine a variable thisFile in file scope pointing to the stream. This reflexivity enables plenty of things like inlining source with a: (thisFile nextBracketedString). Example of naive DeltaStream API: "------------------" deltas := OrderedCollection new. deltas add: (MethodChange className: 'MyClass' sourceCode: thisFile nextBracketedString). [myMethod ^nil] deltas add: (ClassComment className: 'MyClass' comment: thisFile nextChunk). MyClass is aimed at representing my objects inst vars: name <String> the name of my object instance ! ^deltas "------------------" We could of course store a lot of Method attributes category-author-license... method := MethodChange className: 'MyClass' sourceCode: thisFile nextBracketedString; author: 'nice'; license: 'MIT'; category: 'accessing'. [myMethod ^nil] deltas add: method. We don't even have to use classes names like MethodChange: Predefine a file scope variable theBuilder := DeltaStreamBuilder new. theBuilder license: 'MIT'. theBuilder addMethodFor: 'MyClass' sourceCode: etc... We have replaced class names with an API. As a result, theBuilder can be set to another class just enumerating non-MIT licence or whatever task you want it to. Isn't that simple enough? Is there a good reason to restrict our power? Nicolas |
In reply to this post by Ralph Johnson
Ralph Johnson wrote:
> On Thu, Mar 12, 2009 at 9:35 AM, Göran Krampe <[hidden email]> wrote: > > >> - Not so good handling of Strings. This was why I was thinking about >> netstrings. In fact, this one is a killer in some ways, if we can't have CRs >> in the file then we surely can't get "readable code". Yaml has "literal >> blocks" which means we can easily have readable code. > > You are thinking of a method as being the smallest element. Instead, > make a method be a sequence of lines. A line is basically everything > between a CR. This will make it a little less readable, but not too > much. Mmmmmm, you mean for example in an array like: {"delta": { "method-source-change": { "class" : "SomeClass", "protocol": "some-method-category", "stamp" : "elder 3/21/1996 12:34", "source": [ "method line one with a \"comment\" in it.", "method line two with two tabs: \t\t", "method line three", "method line four", "method line five", ] } } ...well, sure. Not too shabby, I agree. Good idea! regards, Göran |
In reply to this post by Nicolas Cellier-3
Hi!
nicolas cellier wrote: > We don't even have to use classes names like MethodChange: > Predefine a file scope variable theBuilder := DeltaStreamBuilder new. > > theBuilder license: 'MIT'. > theBuilder addMethodFor: 'MyClass' sourceCode: etc... > > We have replaced class names with an API. > As a result, theBuilder can be set to another class just enumerating non-MIT > licence or whatever task you want it to. > > Isn't that simple enough? > Is there a good reason to restrict our power? I agree that this is a tempting route, for example, SM used something like this in its first incremental update protocol. My personal reasons against it are: - Safety. When we expose "full Smalltalk" in the file format you can do naughty stuff in it, both for malicious intent OR more likely "smart things" that will cause unwanted effects. - Speed. I think it will be much slower. Slow to be a problem? Not sure, but I want to gobble tons of Deltas so the more speed the better. - Compiler limitations. There are a fair deal of Compiler limitations to deal with regarding number of literals, yadda yadda. A problem? Not sure. ...BUT... :) On the other hand, a handwritten parser that can "parse a sequence of Smalltalk messages to self using Strings, Integers, true, false, nil and Symbols with comments in it" might be more than enough and would avoid ALL THREE of the problems above. Example: "This is a cool delta!" protocol: #delta1. "<- tell builder which protocol we use" delta: 'MyDelta' author: 'Göran Krampe'. "<- start a delta" "Here comes a single method source change in a single message" methodSourceChange: #SomeClass protocol: #some-method-category stamp: 'elder 3/21/1996 12:34' source: 'add: a to: b "Method comment" | result | ^result := a + b'. ...so while not exactly Smalltalk code that can be fed to the regular Compiler it would still be parsed and executed like a series of #perform: to a builder object. It would be fast (modulo speed of #perform:), secure (you can't run arbitrary code) and avoids limits of Compiler. ...ok, but what are the advantages compared to JSON? I presume these at least: - Easy to plug a different builder, like you mention. - Syntax is familiar to Smalltalkers. And Smalltalkers grok the builder object "fed by messages" idea. - Same escaping logic for Strings like in Smalltalk. - A builder could easily implement DNU to handle/ignore messages it does not understand/want to deal with - Lends itself more naturally to a "streaming" implementation like SAX. I can't see any real disadvantages compared to JSON/YAML. Cool! :) Comments? regards, Göran |
On Thu, Mar 12, 2009 at 4:44 PM, Göran Krampe <[hidden email]> wrote:
> ...so while not exactly Smalltalk code that can be fed to the regular > Compiler it would still be parsed and executed like a series of #perform: to > a builder object. It would be fast (modulo speed of #perform:), secure (you > can't run arbitrary code) and avoids limits of Compiler. With a little care, a format could be both normal Smalltalk code AND something that would be easy to parse. For example, it could have only keyword messages and strings, perhaps boolean literals and integers, but no binary messages, assignment, or array literals. Thus, you could first implement a parser by just reading in the string and evaluating it, and then you could build a real parser. This would make it easy to develop test-first, since you can focus first on writing out the objects, and use the trivial way of reading them in to test it. -Ralph |
Hi!
Ralph Johnson wrote: > On Thu, Mar 12, 2009 at 4:44 PM, Göran Krampe <[hidden email]> wrote: >> ...so while not exactly Smalltalk code that can be fed to the regular >> Compiler it would still be parsed and executed like a series of #perform: to >> a builder object. It would be fast (modulo speed of #perform:), secure (you >> can't run arbitrary code) and avoids limits of Compiler. > > With a little care, a format could be both normal Smalltalk code AND > something that would be easy to parse. For example, it could have > only keyword messages and strings, perhaps boolean literals and > integers, but no binary messages, assignment, or array literals. > Thus, you could first implement a parser by just reading in the string > and evaluating it, and then you could build a real parser. This would > make it easy to develop test-first, since you can focus first on > writing out the objects, and use the trivial way of reading them in to > test it. I have just implemented this little parser calling it "Tirade" and it is a small subset of Smalltalk that only differs in not having a receiver to the left, could be easily added though. I also did a quick and dirty benchmark, it is about 3-4 times faster than Compiler, no real profiling done yet. I did implement brace arrays but of course no expressions allowed, and also associations. And yes, there are tests. :) Let me include the current class comment here which describes it: Tirade - a long angry speech or scolding. Synonyms: diatribe, harangue, rant Tirade is a fast parser for a "bastard subset of Smalltalk" that is intended for file formats. The concept is that Tirade parses the input stream which consists of a sequence of Smalltalk messages with literals as arguments - expressions are not allowed. These messages are simply sent to a builder object supplied by you. Tirade uses the return value from the builder as the receiver of the next message, which means you can partition your protocol over multiple builders if needed. Tirade is almost a strict subset of Smalltalk BUT there is no receiver to the left. The receiver is the builder according to the above logic. The following example shows all allowed constructs which include: - Unary and keyword messages (no binary) without receiver. No cascades. Period mandatory. - nil/true/false pseudo variables. No thisContext, self or super. - String and Integer literals. No scientific notation. Single quotes are doubled. - Brace arrays of above, including nesting. - Associations between the above. - Smalltalk comments, but only between messages. - Whitespace just like in Smalltalk. This more or less matches the capability of JSON I think. Example input: "You can use Smalltalk comments in the input, but only on its own line!" "#start will be sent to builder, receiver is not written out, note period at end." start. "Keyword message using String and Integer." protocol: 'alpha' version: 23. "Strings follow normal Smalltalk escape rules, whitespace before, after, inbetween is ok." author: 'Joe ''the tiger'' Schmoe'. "true, false, nil are fine to use." humpty: true dumpty: false sat: nil on: 'a wall'. bracearray: { 'asdasd'->123. {12. 34}->'asdasd'. 123. true. false. nil. {'123123'.-123}}. |
Hi!
Göran Krampe wrote: > I have just implemented this little parser calling it "Tirade" and it is > a small subset of Smalltalk that only differs in not having a receiver > to the left, could be easily added though. I also did a quick and dirty > benchmark, it is about 3-4 times faster than Compiler, no real profiling > done yet. After some profiling Tirade is now about 7 times faster than doing evals using Compiler. Tirade can also parse the full file in one sweep without having to do it in chunks, same speed. I think this parser will be quite useful as a Delta parser - and for other similar formats. And yes, it currently does not have a receiver before the messages it would otherwise be fully legal Smalltalk code. Not sure why that would be interesting though. I will now proceed to test using this as serialization format for Deltas. regards, Göran |
Free forum by Nabble | Edit this page |