http://blog.datomic.com/2012/10/codeq.html
Executive summary: * Git gives version control over files * Clojure code typically has lots of functions or other chunks of code in one file * This means you can't ask for the version of a single unit of code * Static analyses over the files as they vary through time, dumped into a database, yields interesting stuff What they're calling "codeqs" ("code quantum") filetree folks would call a file, because filetree already splits everything (I think?) into bits, and versions everything at the "codeq" level by virtue of storing each bit in its own file: class definition, comment, method definition, etc. So we already have most of this stuff already - I couldn't live without my in-image method versions - but I'm wondering if anyone else can spot anything worth copying? frank |
On Wed, Jan 2, 2013 at 4:18 PM, Frank Shearar <[hidden email]> wrote: http://blog.datomic.com/2012/10/codeq.html Nah. They're basically figuring out how to extract the semantic changes from git, since git just treats the source code as opaque text. That gets them to what Monticello has now. I guess there's a bit of "imagine what you could do then!" that's unspecified.
Which is not to say that it's a bad idea. I'd love to create a huge database of, say, the update stream going back to the beginning, or the entire contents of squeaksource. But... then what? Things that spring to mind immediately: - universal senders and implementors - metrics like message sends per method or methods per class
- detection of package dependencies - analysis of how long-lived packages change over time - analysis of contribution and collaboration between coders
and so on. But, what good is it? Might be interesting, maybe there's some research papers to be written, but would it do us any good as a community? Would there be useful tools that came out of it? Would it be worth the effort? Hard to say.
Colin |
On 2 January 2013 22:17, Colin Putney <[hidden email]> wrote:
> > > > On Wed, Jan 2, 2013 at 4:18 PM, Frank Shearar <[hidden email]> > wrote: >> >> http://blog.datomic.com/2012/10/codeq.html >> >> Executive summary: >> * Git gives version control over files >> * Clojure code typically has lots of functions or other chunks of code >> in one file >> * This means you can't ask for the version of a single unit of code >> * Static analyses over the files as they vary through time, dumped >> into a database, yields interesting stuff >> >> What they're calling "codeqs" ("code quantum") filetree folks would >> call a file, because filetree already splits everything (I think?) >> into bits, and versions everything at the "codeq" level by virtue of >> storing each bit in its own file: class definition, comment, method >> definition, etc. >> >> So we already have most of this stuff already - I couldn't live >> without my in-image method versions - but I'm wondering if anyone else >> can spot anything worth copying? > > > Nah. They're basically figuring out how to extract the semantic changes from > git, since git just treats the source code as opaque text. That gets them to > what Monticello has now. I guess there's a bit of "imagine what you could do > then!" that's unspecified. That was pretty much what I was thinking. And filetree preserves this fine-grained "code quantum"-sized version control. The only advantage I still see of lots-of-stuff-inna-file is that you can very quickly hop around a bunch of code. Our tools just don't work that way. They _could_. Noone's just ever hurt enough to display code in this fashion. It's easy enough: what's not so easy is to make that big blob of text efficiently editable such that you still keep track of the, for example, individual methods. (I'll leave aside the lack of syntax around method definition. That's not a big problem.) For instance: parse the entire file, find the method definitions, update the image by compiling them. (Handwave around the imperative hacks one could do.) > Which is not to say that it's a bad idea. I'd love to create a huge database > of, say, the update stream going back to the beginning, or the entire > contents of squeaksource. But... then what? > > Things that spring to mind immediately: > > - universal senders and implementors > - metrics like message sends per method or methods per class > - detection of package dependencies This would be a massive win. I took a bash a while ago at extending DependencyBrowser to work over one's package-cache to do this. I didn't get terribly far, probably largely to me being pretty ignorant about just about everything I needed to know. I have the half-completed work lying around. Maybe I should publish it somewhere! frank > - analysis of how long-lived packages change over time > - analysis of contribution and collaboration between coders > > and so on. > > But, what good is it? Might be interesting, maybe there's some research > papers to be written, but would it do us any good as a community? Would > there be useful tools that came out of it? Would it be worth the effort? > Hard to say. > > Colin > > > |
In reply to this post by Colin Putney-3
On 02-01-2013, at 2:17 PM, Colin Putney <[hidden email]> wrote: > > Which is not to say that it's a bad idea. I'd love to create a huge database of, say, the update stream going back to the beginning, or the entire contents of squeaksource. But... then what? Well, if only the sensible compiled method format had been adopted so that source references could be proper objects rather than hacked-up numbers, then you could have source kept in a proper database. Like, say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and could refer back to a server for ancient history. Find all versions of a method back to the beginning of time. Find out about classes being renamed or deleted. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Do files get embarrassed when they get unzipped? |
In reply to this post by Frank Shearar-3
IIRC, that was one of the features of the
Whisker Browser back in the day.
http://wiki.squeak.org/squeak/1993 The goal of the Whisker Browser (a.k.a. Stacking Browser) is to provide a simple and intuitive way to view the contents of multiple classes and multiple methods simultaneously, while using screen real estate efficiently and not requiring a lot of window moving/resizing. It does this by introducing the concept of subpane stacking. ... Cheers, Bob On 1/2/13 5:32 PM, Frank Shearar wrote:
|
In reply to this post by timrowledge
On Wed, Jan 02, 2013 at 02:42:25PM -0800, tim Rowledge wrote:
> > On 02-01-2013, at 2:17 PM, Colin Putney <[hidden email]> wrote: > > > > > Which is not to say that it's a bad idea. I'd love to create a huge database of, say, the update stream going back to the beginning, or the entire contents of squeaksource. But... then what? > > Well, if only the sensible compiled method format had been adopted so that source references could be proper objects rather than hacked-up numbers, then you could have source kept in a proper database. Like, say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and could refer back to a server for ancient history. Find all versions of a method back to the beginning of time. Find out about classes being renamed or deleted. > Do you have a reference to the sensible compiled method format? I think I recall some discussions on that topic, but I don't recall when or by whom. But really, what are we missing? We have CompiledMethodTrailer that appears to provide an infinitely extensible mechanism for inventing new kinds of source pointers. And we have an abstract SourceFileArray which, if its class comment is to be believed, is intended to encourage someone to actually go out and do exactly what you describe: "This class is an abstract superclass for source code access mechanisms. It defines the messages that need to be understood by those subclasses that store and retrieve source chunks on files, over the network or in databases. The first concrete subclass, StandardSourceFileArray, supports access to the traditional sources and changes files. Other subclasses might implement multiple source files for different applications, or access to a network source server." We already have one new subclass (ExpandedSourceFileArray) that was used to eliminate the old size limit on changes files. There is nothing stopping someone from coming up with other implementations that delegate to databases or to something on the internet. As far as I can see, the only thing that is missing is for somebody to actually go do it. Dave |
On 02-01-2013, at 4:15 PM, "David T. Lewis" <[hidden email]> wrote: > > Do you have a reference to the sensible compiled method format? I think > I recall some discussions on that topic, but I don't recall when or by > whom. I was thinking of the now-ancient 'NewCompiledMethod', going back to about 1997. The last I heard on the subject was about 5 years ago. But.. > > But really, what are we missing? We have CompiledMethodTrailer that appears to > provide an infinitely extensible mechanism for inventing new kinds of source > pointers. … it reads as if that might provide the same result. Namely having the source pointer for each method be a proper oop, with all the obvious advantages over a weirdly encrypted 24bit number hidden within some bytes at the end of a byte array tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Document the code? Why do you think they call it "code?" |
On Wed, Jan 02, 2013 at 05:26:25PM -0800, tim Rowledge wrote:
> > On 02-01-2013, at 4:15 PM, "David T. Lewis" <[hidden email]> wrote: > > > > Do you have a reference to the sensible compiled method format? I think > > I recall some discussions on that topic, but I don't recall when or by > > whom. > > I was thinking of the now-ancient 'NewCompiledMethod', going back to about 1997. The last I heard on the subject was about 5 years ago. Ah, right, it's all coming back to me now. Thanks. > But.. > > > > But really, what are we missing? We have CompiledMethodTrailer that appears to > > provide an infinitely extensible mechanism for inventing new kinds of source > > pointers. > > ? it reads as if that might provide the same result. Namely having the source pointer for each method be a proper oop, with all the obvious advantages over a weirdly encrypted 24bit number hidden within some bytes at the end of a byte array > In principle, I think yes. Igor Stasenko created the CompiledMethodTrailer, which has provided a really nice way to keep the existing formats working while allowing all sorts of extensions. I'm not sure if he had in mind to implement source pointers as first class objects, but it seems like it would be a straightforward extension. Cross posting to pharo in order to lure Igor back into the discussion ;-) Dave |
In reply to this post by timrowledge
> numbers, then you could have source kept in a proper database. Like, say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and
> could refer back to a server for ancient history. Find all versions of a method back to the beginning of time... The all-method-history thing has been available for a while now via Magma. http://wiki.squeak.org/squeak/5603 |
On Thu, Jan 03, 2013 at 10:27:00AM -0600, Chris Muller wrote:
> > numbers, then you could have source kept in a proper database. Like, say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and > > could refer back to a server for ancient history. Find all versions of a method back to the beginning of time... > > The all-method-history thing has been available for a while now via Magma. > > http://wiki.squeak.org/squeak/5603 Now that sounds like the *right* way to do it :) Dave |
Dave,
I like this idea too of having earlier versions of methods in a separate database and not in the image. You recently reminded us to calculate the space of certain types of objects use in the image. I forgot the command again and I do not easily find it in the mail history. How do you calculate the space used by earlier method versions in the image? --Hannes On 1/3/13, David T. Lewis <[hidden email]> wrote: > On Thu, Jan 03, 2013 at 10:27:00AM -0600, Chris Muller wrote: >> > numbers, then you could have source kept in a proper database. Like, >> > say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and >> > could refer back to a server for ancient history. Find all versions of a >> > method back to the beginning of time... >> >> The all-method-history thing has been available for a while now via >> Magma. >> >> http://wiki.squeak.org/squeak/5603 > > Now that sounds like the *right* way to do it :) > > Dave > > > |
On Thu, Jan 3, 2013 at 2:49 PM, H. Hirzel <[hidden email]> wrote: How do you calculate the space used by earlier method versions in the image? None, actually. All source code is stored in the .source or .changes file. Colin |
On 03-01-2013, at 11:52 AM, Colin Putney <[hidden email]> wrote: > > On Thu, Jan 3, 2013 at 2:49 PM, H. Hirzel <[hidden email]> wrote: > > How do you calculate the space used by earlier method versions in the image? > > None, actually. All source code is stored in the .source or .changes file. I know source gets stored in the files but long ago it was the case that method version objects were kept in the image and they held on to a *lot* of crap. Did that get changed? IIRC it was part of a never completed attempt to have some sort of namespacey-effect using projects. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- Been playing with the pharmacy section again. |
Yes, I had in mind what Tim mentions.
--Hannes On 1/3/13, tim Rowledge <[hidden email]> wrote: > > On 03-01-2013, at 11:52 AM, Colin Putney <[hidden email]> wrote: >> >> On Thu, Jan 3, 2013 at 2:49 PM, H. Hirzel <[hidden email]> >> wrote: >> >> How do you calculate the space used by earlier method versions in the >> image? >> >> None, actually. All source code is stored in the .source or .changes >> file. > > I know source gets stored in the files but long ago it was the case that > method version objects were kept in the image and they held on to a *lot* of > crap. Did that get changed? IIRC it was part of a never completed attempt to > have some sort of namespacey-effect using projects. > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Useful random insult:- Been playing with the pharmacy section again. > > > > |
In reply to this post by timrowledge
On 3 January 2013 02:26, tim Rowledge <[hidden email]> wrote:
> > On 02-01-2013, at 4:15 PM, "David T. Lewis" <[hidden email]> wrote: >> >> Do you have a reference to the sensible compiled method format? I think >> I recall some discussions on that topic, but I don't recall when or by >> whom. > > I was thinking of the now-ancient 'NewCompiledMethod', going back to about 1997. The last I heard on the subject was about 5 years ago. > But.. >> >> But really, what are we missing? We have CompiledMethodTrailer that appears to >> provide an infinitely extensible mechanism for inventing new kinds of source >> pointers. > > … it reads as if that might provide the same result. Namely having the source pointer for each method be a proper oop, with all the obvious advantages over a weirdly encrypted 24bit number hidden within some bytes at the end of a byte array > +1 as well as bytecode can be held in one oop, leaving a compiled method need not to have separate object format, just a contract that its first ivar is bytecode. > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Document the code? Why do you think they call it "code?" -- Best regards, Igor Stasenko. |
In reply to this post by David T. Lewis
On 3 January 2013 02:55, David T. Lewis <[hidden email]> wrote:
> On Wed, Jan 02, 2013 at 05:26:25PM -0800, tim Rowledge wrote: >> >> On 02-01-2013, at 4:15 PM, "David T. Lewis" <[hidden email]> wrote: >> > >> > Do you have a reference to the sensible compiled method format? I think >> > I recall some discussions on that topic, but I don't recall when or by >> > whom. >> >> I was thinking of the now-ancient 'NewCompiledMethod', going back to about 1997. The last I heard on the subject was about 5 years ago. > > Ah, right, it's all coming back to me now. Thanks. > >> But.. >> > >> > But really, what are we missing? We have CompiledMethodTrailer that appears to >> > provide an infinitely extensible mechanism for inventing new kinds of source >> > pointers. >> >> ? it reads as if that might provide the same result. Namely having the source pointer for each method be a proper oop, with all the obvious advantages over a weirdly encrypted 24bit number hidden within some bytes at the end of a byte array >> > > In principle, I think yes. Igor Stasenko created the CompiledMethodTrailer, > which has provided a really nice way to keep the existing formats working while > allowing all sorts of extensions. I'm not sure if he had in mind to implement > source pointers as first class objects, but it seems like it would be a > straightforward extension. > I wrote about it multiple times. But we need someone who will put an idea into flesh :) > Cross posting to pharo in order to lure Igor back into the discussion ;-) > > Dave -- Best regards, Igor Stasenko. |
In reply to this post by David T. Lewis
It's a good way to have local improvement and its pure objects, but it
doesn't really help us as a community because 1) it doesn't allow external tools to interface to it and 2) Magma does not support authentication or authorization. On Thu, Jan 3, 2013 at 11:46 AM, David T. Lewis <[hidden email]> wrote: > On Thu, Jan 03, 2013 at 10:27:00AM -0600, Chris Muller wrote: >> > numbers, then you could have source kept in a proper database. Like, say, dabble. Or a dabble-ish thing that kept recent-ish stuff local and >> > could refer back to a server for ancient history. Find all versions of a method back to the beginning of time... >> >> The all-method-history thing has been available for a while now via Magma. >> >> http://wiki.squeak.org/squeak/5603 > > Now that sounds like the *right* way to do it :) > > Dave > > |
In reply to this post by Colin Putney-3
On 02.01.2013, at 23:17, Colin Putney <[hidden email]> wrote:
Wasn't this one of the goals Dale had for SqueakSource3? - Bert - |
In reply to this post by Colin Putney-3
On 2 January 2013 22:17, Colin Putney <[hidden email]> wrote:
> > > > On Wed, Jan 2, 2013 at 4:18 PM, Frank Shearar <[hidden email]> > wrote: >> >> http://blog.datomic.com/2012/10/codeq.html >> >> Executive summary: >> * Git gives version control over files >> * Clojure code typically has lots of functions or other chunks of code >> in one file >> * This means you can't ask for the version of a single unit of code >> * Static analyses over the files as they vary through time, dumped >> into a database, yields interesting stuff >> >> What they're calling "codeqs" ("code quantum") filetree folks would >> call a file, because filetree already splits everything (I think?) >> into bits, and versions everything at the "codeq" level by virtue of >> storing each bit in its own file: class definition, comment, method >> definition, etc. >> >> So we already have most of this stuff already - I couldn't live >> without my in-image method versions - but I'm wondering if anyone else >> can spot anything worth copying? > > > Nah. They're basically figuring out how to extract the semantic changes from > git, since git just treats the source code as opaque text. That gets them to > what Monticello has now. I guess there's a bit of "imagine what you could do > then!" that's unspecified. > > Which is not to say that it's a bad idea. I'd love to create a huge database > of, say, the update stream going back to the beginning, or the entire > contents of squeaksource. But... then what? > > Things that spring to mind immediately: > > - universal senders and implementors > - metrics like message sends per method or methods per class > - detection of package dependencies > - analysis of how long-lived packages change over time > - analysis of contribution and collaboration between coders > > and so on. > > But, what good is it? Might be interesting, maybe there's some research > papers to be written, but would it do us any good as a community? Would > there be useful tools that came out of it? Would it be worth the effort? > Hard to say. I eventually remembered the paper I'd recently read: http://scg.unibe.ch/archive/papers/Rob12aAPIDeprecations.pdf "How Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem" looks at how APIs change and how developers react to same, and it mines SS for its data. Hopefully, research papers _would_ benefit the community. (In other words, they'd hopefully be research papers into things that were useful, or that enabled useful things.) frank > Colin > > > |
In reply to this post by timrowledge
On Thu, Jan 3, 2013 at 2:58 PM, tim Rowledge <[hidden email]> wrote:
That must have been before my time. These days, all versions are stored on disk. Each chunk has the source pointer for the previous version in it, and the tools walk back through the changes/source files collecting all the versions in the chain.
Here's an example: !MCVersionInfo methodsFor: 'converting' stamp: 'bf 4/18/2010 23:25' prior: 23175569! asDictionary
^ Dictionary new at: #name put: name; at: #id put: id asString;
at: #message put: message; at: #date put: date; at: #time put: time;
at: #author put: author; at: #ancestors put: (self ancestors collect: [:a | a asDictionary]); yourself! !
That "prior" parameters points to the previous version. Colin |
Free forum by Nabble | Edit this page |