Monticello Version Info

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Monticello Version Info

Camillo Bruni-3
I am still having a look at the Monticello implementation.
Now coming from the git world, this seems very weird:

Why does each Monticello version store the complete ancestor history?

------------------------------------------------------------------------

Wouldn't it simply be enough to keep pointers to the immediate ancestors, and then lazily load and cache them?

Where is the complete ancestry needed, besides diffing/merging?

------------------------------------------------------------------------

The current setup implies that something like

MCCacheRepository default loadVersionFromFileNamed: 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'

takes around 1.5 seconds to complete, whereas this could be done in a fragment of a second for most cases...
Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Dale Henrichs
For the majority of use cases I think that the immediate ancestor is the only one needed ... so you might be onto something here...

Dale

----- Original Message -----
| From: "Camillo Bruni" <[hidden email]>
| To: "Pharo Development" <[hidden email]>
| Sent: Thursday, March 1, 2012 1:53:48 PM
| Subject: [Pharo-project] Monticello Version Info
|
| I am still having a look at the Monticello implementation.
| Now coming from the git world, this seems very weird:
|
| Why does each Monticello version store the complete ancestor history?
|
| ------------------------------------------------------------------------
|
| Wouldn't it simply be enough to keep pointers to the immediate
| ancestors, and then lazily load and cache them?
|
| Where is the complete ancestry needed, besides diffing/merging?
|
| ------------------------------------------------------------------------
|
| The current setup implies that something like
|
| MCCacheRepository default loadVersionFromFileNamed:
| 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'
|
| takes around 1.5 seconds to complete, whereas this could be done in a
| fragment of a second for most cases...
|

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Nicolas Cellier
I think the main reason is that you cannot load just the metadata, but
the whole mcz when you need to dig into the history...
That ain't cheap, and that happens when you merge more or less distant branches.

Also, it's not unusual to upload directly version N+5 without the
whole N+1 to: N+4 ancestry...
In this case MC can still find a common ancestor.

Nicolas

Le 1 mars 2012 23:07, Dale Henrichs <[hidden email]> a écrit :

> For the majority of use cases I think that the immediate ancestor is the only one needed ... so you might be onto something here...
>
> Dale
>
> ----- Original Message -----
> | From: "Camillo Bruni" <[hidden email]>
> | To: "Pharo Development" <[hidden email]>
> | Sent: Thursday, March 1, 2012 1:53:48 PM
> | Subject: [Pharo-project] Monticello Version Info
> |
> | I am still having a look at the Monticello implementation.
> | Now coming from the git world, this seems very weird:
> |
> | Why does each Monticello version store the complete ancestor history?
> |
> | ------------------------------------------------------------------------
> |
> | Wouldn't it simply be enough to keep pointers to the immediate
> | ancestors, and then lazily load and cache them?
> |
> | Where is the complete ancestry needed, besides diffing/merging?
> |
> | ------------------------------------------------------------------------
> |
> | The current setup implies that something like
> |
> | MCCacheRepository default loadVersionFromFileNamed:
> | 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'
> |
> | takes around 1.5 seconds to complete, whereas this could be done in a
> | fragment of a second for most cases...
> |
>

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Camillo Bruni-3
well, you usually cache the mcz in your local directory cache (at least that's the default).
and extracting the meta-data out of a local mcz is rather cheap since zip allows you to directly extract certain files.

maybe we could add another entry to the mcz with just the stripped down version info in it. this way older monticello versions would still be able to load it the slow way, whereas an updated version could benefit from incremental loading?


On 2012-03-01, at 23:11, Nicolas Cellier wrote:

> I think the main reason is that you cannot load just the metadata, but
> the whole mcz when you need to dig into the history...
> That ain't cheap, and that happens when you merge more or less distant branches.
>
> Also, it's not unusual to upload directly version N+5 without the
> whole N+1 to: N+4 ancestry...
> In this case MC can still find a common ancestor.
>
> Nicolas
>
> Le 1 mars 2012 23:07, Dale Henrichs <[hidden email]> a écrit :
>> For the majority of use cases I think that the immediate ancestor is the only one needed ... so you might be onto something here...
>>
>> Dale
>>
>> ----- Original Message -----
>> | From: "Camillo Bruni" <[hidden email]>
>> | To: "Pharo Development" <[hidden email]>
>> | Sent: Thursday, March 1, 2012 1:53:48 PM
>> | Subject: [Pharo-project] Monticello Version Info
>> |
>> | I am still having a look at the Monticello implementation.
>> | Now coming from the git world, this seems very weird:
>> |
>> | Why does each Monticello version store the complete ancestor history?
>> |
>> | ------------------------------------------------------------------------
>> |
>> | Wouldn't it simply be enough to keep pointers to the immediate
>> | ancestors, and then lazily load and cache them?
>> |
>> | Where is the complete ancestry needed, besides diffing/merging?
>> |
>> | ------------------------------------------------------------------------
>> |
>> | The current setup implies that something like
>> |
>> | MCCacheRepository default loadVersionFromFileNamed:
>> | 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'
>> |
>> | takes around 1.5 seconds to complete, whereas this could be done in a
>> | fragment of a second for most cases...
>> |
>>
>


cbc
Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

cbc
The issue is that Monticello is setup for distributed processing, and
allowing for multiple repositories, some of which may not be available
to all of the users for a project.  For instance, a project might be
developed internally (or on the developers hard-drive) until they feel
comfortable distributing the code later.  So, publicly, you get
version 12, 17, 34, and 37.  There is no access to the intermediate
ones (unless you happen to be the one that created them and didn't
release them).  The 'whole ancestry' let's you do diffs off of a
version derived from 37 against one derived from 34 - the ancestry can
determine that version 34 if 'common', and work from there.  [Note
that just numbers aren't enough - the original developer, say, cbc
could have version cbc.34, while you could have, say, CamilloBruni.34,
but yours is based off of 17 (since you picked up that verison and
started working there).  So, merging cbc.37 with CamilloBruni.34 would
need to pull down cbc.17 for a good merge to work.]

At least, that's my understanding from long ago discussions.

-Chris

On Thu, Mar 1, 2012 at 2:15 PM, Camillo Bruni <[hidden email]> wrote:

> well, you usually cache the mcz in your local directory cache (at least that's the default).
> and extracting the meta-data out of a local mcz is rather cheap since zip allows you to directly extract certain files.
>
> maybe we could add another entry to the mcz with just the stripped down version info in it. this way older monticello versions would still be able to load it the slow way, whereas an updated version could benefit from incremental loading?
>
>
> On 2012-03-01, at 23:11, Nicolas Cellier wrote:
>
>> I think the main reason is that you cannot load just the metadata, but
>> the whole mcz when you need to dig into the history...
>> That ain't cheap, and that happens when you merge more or less distant branches.
>>
>> Also, it's not unusual to upload directly version N+5 without the
>> whole N+1 to: N+4 ancestry...
>> In this case MC can still find a common ancestor.

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Stéphane Ducasse
In reply to this post by Nicolas Cellier
+1

I would like that we respect a bit MC. Because so far this is the only thing that we have that is working well.
Ranting about it does not make it any better (Camillo this is not about you in particular but in general).

Stef

On Mar 1, 2012, at 11:11 PM, Nicolas Cellier wrote:

> I think the main reason is that you cannot load just the metadata, but
> the whole mcz when you need to dig into the history...
> That ain't cheap, and that happens when you merge more or less distant branches.
>
> Also, it's not unusual to upload directly version N+5 without the
> whole N+1 to: N+4 ancestry...
> In this case MC can still find a common ancestor.
>
> Nicolas
>
> Le 1 mars 2012 23:07, Dale Henrichs <[hidden email]> a écrit :
>> For the majority of use cases I think that the immediate ancestor is the only one needed ... so you might be onto something here...
>>
>> Dale
>>
>> ----- Original Message -----
>> | From: "Camillo Bruni" <[hidden email]>
>> | To: "Pharo Development" <[hidden email]>
>> | Sent: Thursday, March 1, 2012 1:53:48 PM
>> | Subject: [Pharo-project] Monticello Version Info
>> |
>> | I am still having a look at the Monticello implementation.
>> | Now coming from the git world, this seems very weird:
>> |
>> | Why does each Monticello version store the complete ancestor history?
>> |
>> | ------------------------------------------------------------------------
>> |
>> | Wouldn't it simply be enough to keep pointers to the immediate
>> | ancestors, and then lazily load and cache them?
>> |
>> | Where is the complete ancestry needed, besides diffing/merging?
>> |
>> | ------------------------------------------------------------------------
>> |
>> | The current setup implies that something like
>> |
>> | MCCacheRepository default loadVersionFromFileNamed:
>> | 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'
>> |
>> | takes around 1.5 seconds to complete, whereas this could be done in a
>> | fragment of a second for most cases...
>> |
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Stéphane Ducasse
In reply to this post by Camillo Bruni-3

> well, you usually cache the mcz in your local directory cache (at least that's the default).
> and extracting the meta-data out of a local mcz is rather cheap since zip allows you to directly extract certain files.
>
> maybe we could add another entry to the mcz with just the stripped down version info in it. this way older monticello versions would still be able to load it the slow way, whereas an updated version could benefit from incremental loading?

may be
I do not know enough of it.

Stef

> On 2012-03-01, at 23:11, Nicolas Cellier wrote:
>
>> I think the main reason is that you cannot load just the metadata, but
>> the whole mcz when you need to dig into the history...
>> That ain't cheap, and that happens when you merge more or less distant branches.
>>
>> Also, it's not unusual to upload directly version N+5 without the
>> whole N+1 to: N+4 ancestry...
>> In this case MC can still find a common ancestor.
>>
>> Nicolas
>>
>> Le 1 mars 2012 23:07, Dale Henrichs <[hidden email]> a écrit :
>>> For the majority of use cases I think that the immediate ancestor is the only one needed ... so you might be onto something here...
>>>
>>> Dale
>>>
>>> ----- Original Message -----
>>> | From: "Camillo Bruni" <[hidden email]>
>>> | To: "Pharo Development" <[hidden email]>
>>> | Sent: Thursday, March 1, 2012 1:53:48 PM
>>> | Subject: [Pharo-project] Monticello Version Info
>>> |
>>> | I am still having a look at the Monticello implementation.
>>> | Now coming from the git world, this seems very weird:
>>> |
>>> | Why does each Monticello version store the complete ancestor history?
>>> |
>>> | ------------------------------------------------------------------------
>>> |
>>> | Wouldn't it simply be enough to keep pointers to the immediate
>>> | ancestors, and then lazily load and cache them?
>>> |
>>> | Where is the complete ancestry needed, besides diffing/merging?
>>> |
>>> | ------------------------------------------------------------------------
>>> |
>>> | The current setup implies that something like
>>> |
>>> | MCCacheRepository default loadVersionFromFileNamed:
>>> | 'SLICE-Issue-5416--Improve-MC-version-loading-CamilloBruni.1.mcz'
>>> |
>>> | takes around 1.5 seconds to complete, whereas this could be done in a
>>> | fragment of a second for most cases...
>>> |
>>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Sven Van Caekenberghe
In reply to this post by cbc

On 02 Mar 2012, at 01:52, Chris Cunningham wrote:

> The issue is that Monticello is setup for distributed processing, and
> allowing for multiple repositories, some of which may not be available
> to all of the users for a project.  For instance, a project might be
> developed internally (or on the developers hard-drive) until they feel
> comfortable distributing the code later.  So, publicly, you get
> version 12, 17, 34, and 37.  There is no access to the intermediate
> ones (unless you happen to be the one that created them and didn't
> release them).  The 'whole ancestry' let's you do diffs off of a
> version derived from 37 against one derived from 34 - the ancestry can
> determine that version 34 if 'common', and work from there.  [Note
> that just numbers aren't enough - the original developer, say, cbc
> could have version cbc.34, while you could have, say, CamilloBruni.34,
> but yours is based off of 17 (since you picked up that verison and
> started working there).  So, merging cbc.37 with CamilloBruni.34 would
> need to pull down cbc.17 for a good merge to work.]
>
> At least, that's my understanding from long ago discussions.
This makes sense, but how is this handled with git ?

Sven

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Camillo Bruni-3

On 2012-03-02, at 09:31, Sven Van Caekenberghe wrote:

>
> On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
>
>> The issue is that Monticello is setup for distributed processing, and
>> allowing for multiple repositories, some of which may not be available
>> to all of the users for a project.  For instance, a project might be
>> developed internally (or on the developers hard-drive) until they feel
>> comfortable distributing the code later.  So, publicly, you get
>> version 12, 17, 34, and 37.  There is no access to the intermediate
>> ones (unless you happen to be the one that created them and didn't
>> release them).  The 'whole ancestry' let's you do diffs off of a
>> version derived from 37 against one derived from 34 - the ancestry can
>> determine that version 34 if 'common', and work from there.  [Note
>> that just numbers aren't enough - the original developer, say, cbc
>> could have version cbc.34, while you could have, say, CamilloBruni.34,
>> but yours is based off of 17 (since you picked up that verison and
>> started working there).  So, merging cbc.37 with CamilloBruni.34 would
>> need to pull down cbc.17 for a good merge to work.]
>>
>> At least, that's my understanding from long ago discussions.
>
> This makes sense, but how is this handled with git ?


Git always publishes all versions (unless you squash multiple versions on purpose):
- generally you work and commit locally for quite some time
- then you pull the latest changes from a common / remote repository
- you merge with that repository

The main diff to MC is that git will publish all intermediate versions as well,
you can see an example of that here:

        https://github.com/dh83/trex/network

- two people worked on the project [ black  / blue ]
- the blue version development happened in a different / local repos
- yet all the intermediate versions between merge points are in the main [black] repos

so, chris, you're absolutely right, that if you want to be able to diff against
the local history you need the complete history.

But I am not completely sure how the internals work with MC, but from my research
so far I am almost convinced that versions are found merely by their name. Since
the VersionInfo does not track the repository it relates to.

having that I would say it would suffice to publish all the intermediate versions
to a remote repository to get a complete and nice diff behavior with MC, no?

most probably I am missing a piece here ;)

best
cami
cbc
Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

cbc
On Fri, Mar 2, 2012 at 5:30 AM, Camillo Bruni <[hidden email]> wrote:
>
> But I am not completely sure how the internals work with MC, but from my research
> so far I am almost convinced that versions are found merely by their name. Since
> the VersionInfo does not track the repository it relates to.
>
It does appear to find versions by name, but there is an internal id
number that it uses to validate the file is the right one.  I've been
'bitten' by this a few times - it is a really good idea to NOT rename
MCZ files.

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Dale Henrichs
In reply to this post by Sven Van Caekenberghe
Sven,

A Monticello mcz file is a version data base for a single package .... Git is a version data base for a directory structure ...

Monticello has branching by convention (change the name of a file to create the branch), although the mcz ancestry handles branches just fine. In Git branches are first class objects ... it is difficult to do things in git if you are not on one branch or another ...

You can merge with Monticello and you can merge with Git ...

The big difference is that Git allows you to version a bunch of files together and with Monticello you are versioning a single file.

Part of what Metacello was invented to do was to create a "data base" of versioned collections of mcz files ... Git was designed to manage collections of files...

Is this what you were asking?

Dale

----- Original Message -----
| From: "Sven Van Caekenberghe" <[hidden email]>
| To: [hidden email]
| Sent: Friday, March 2, 2012 12:31:42 AM
| Subject: Re: [Pharo-project] Monticello Version Info
|
|
| On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
|
| > The issue is that Monticello is setup for distributed processing,
| > and
| > allowing for multiple repositories, some of which may not be
| > available
| > to all of the users for a project.  For instance, a project might
| > be
| > developed internally (or on the developers hard-drive) until they
| > feel
| > comfortable distributing the code later.  So, publicly, you get
| > version 12, 17, 34, and 37.  There is no access to the intermediate
| > ones (unless you happen to be the one that created them and didn't
| > release them).  The 'whole ancestry' let's you do diffs off of a
| > version derived from 37 against one derived from 34 - the ancestry
| > can
| > determine that version 34 if 'common', and work from there.  [Note
| > that just numbers aren't enough - the original developer, say, cbc
| > could have version cbc.34, while you could have, say,
| > CamilloBruni.34,
| > but yours is based off of 17 (since you picked up that verison and
| > started working there).  So, merging cbc.37 with CamilloBruni.34
| > would
| > need to pull down cbc.17 for a good merge to work.]
| >
| > At least, that's my understanding from long ago discussions.
|
| This makes sense, but how is this handled with git ?
|
| Sven

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Dale Henrichs
In reply to this post by cbc
Chris,

The last time I delved into the area there was one line of code in Monticello (I was looking at pharo) that compared the file UUID and it might have been involved in hashing. At one point the UUID calculation was broken in Pharo and noone noticed until they tried to load the package into GemStone (which is using an older version of Monitcello) and things went haywire:)

So yes the name is the primary lookup mechanism...

You might have gotten messed up by changing the name of the mcz file to NOT match the internal package name ... there are definitely expectations in the Monticello ecosystem that the package name and file name match at some level...

Dale

----- Original Message -----
| From: "Chris Cunningham" <[hidden email]>
| To: [hidden email]
| Sent: Friday, March 2, 2012 8:30:27 AM
| Subject: Re: [Pharo-project] Monticello Version Info
|
| On Fri, Mar 2, 2012 at 5:30 AM, Camillo Bruni
| <[hidden email]> wrote:
| >
| > But I am not completely sure how the internals work with MC, but
| > from my research
| > so far I am almost convinced that versions are found merely by
| > their name. Since
| > the VersionInfo does not track the repository it relates to.
| >
| It does appear to find versions by name, but there is an internal id
| number that it uses to validate the file is the right one.  I've been
| 'bitten' by this a few times - it is a really good idea to NOT rename
| MCZ files.
|
|

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Frank Shearar-3
In reply to this post by Dale Henrichs
On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
> Sven,
>
> A Monticello mcz file is a version data base for a single package .... Git is a version data base for a directory structure ...
>
> Monticello has branching by convention (change the name of a file to create the branch), although the mcz ancestry handles branches just fine. In Git branches are first class objects ... it is difficult to do things in git if you are not on one branch or another ...

Bearing in mind that a branch is just a pointer to a commit: look in
your blah/.git/refs/heads/ and each file is a branch containing the
SHA1 id of the head of that branch. (And each commit knows its
ancestor/s, just like an mcz file, except that the hash means the
relationship's based on the commit's _contents_, not its _name_.)

frank

> You can merge with Monticello and you can merge with Git ...
>
> The big difference is that Git allows you to version a bunch of files together and with Monticello you are versioning a single file.
>
> Part of what Metacello was invented to do was to create a "data base" of versioned collections of mcz files ... Git was designed to manage collections of files...
>
> Is this what you were asking?
>
> Dale
>
> ----- Original Message -----
> | From: "Sven Van Caekenberghe" <[hidden email]>
> | To: [hidden email]
> | Sent: Friday, March 2, 2012 12:31:42 AM
> | Subject: Re: [Pharo-project] Monticello Version Info
> |
> |
> | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
> |
> | > The issue is that Monticello is setup for distributed processing,
> | > and
> | > allowing for multiple repositories, some of which may not be
> | > available
> | > to all of the users for a project.  For instance, a project might
> | > be
> | > developed internally (or on the developers hard-drive) until they
> | > feel
> | > comfortable distributing the code later.  So, publicly, you get
> | > version 12, 17, 34, and 37.  There is no access to the intermediate
> | > ones (unless you happen to be the one that created them and didn't
> | > release them).  The 'whole ancestry' let's you do diffs off of a
> | > version derived from 37 against one derived from 34 - the ancestry
> | > can
> | > determine that version 34 if 'common', and work from there.  [Note
> | > that just numbers aren't enough - the original developer, say, cbc
> | > could have version cbc.34, while you could have, say,
> | > CamilloBruni.34,
> | > but yours is based off of 17 (since you picked up that verison and
> | > started working there).  So, merging cbc.37 with CamilloBruni.34
> | > would
> | > need to pull down cbc.17 for a good merge to work.]
> | >
> | > At least, that's my understanding from long ago discussions.
> |
> | This makes sense, but how is this handled with git ?
> |
> | Sven
>

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Dale Henrichs
Frank,

That's right ... the major difference between the two is that git manages multiple files ...

Dale

----- Original Message -----
| From: "Frank Shearar" <[hidden email]>
| To: [hidden email]
| Sent: Friday, March 2, 2012 9:27:39 AM
| Subject: Re: [Pharo-project] Monticello Version Info
|
| On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
| > Sven,
| >
| > A Monticello mcz file is a version data base for a single package
| > .... Git is a version data base for a directory structure ...
| >
| > Monticello has branching by convention (change the name of a file
| > to create the branch), although the mcz ancestry handles branches
| > just fine. In Git branches are first class objects ... it is
| > difficult to do things in git if you are not on one branch or
| > another ...
|
| Bearing in mind that a branch is just a pointer to a commit: look in
| your blah/.git/refs/heads/ and each file is a branch containing the
| SHA1 id of the head of that branch. (And each commit knows its
| ancestor/s, just like an mcz file, except that the hash means the
| relationship's based on the commit's _contents_, not its _name_.)
|
| frank
|
| > You can merge with Monticello and you can merge with Git ...
| >
| > The big difference is that Git allows you to version a bunch of
| > files together and with Monticello you are versioning a single
| > file.
| >
| > Part of what Metacello was invented to do was to create a "data
| > base" of versioned collections of mcz files ... Git was designed
| > to manage collections of files...
| >
| > Is this what you were asking?
| >
| > Dale
| >
| > ----- Original Message -----
| > | From: "Sven Van Caekenberghe" <[hidden email]>
| > | To: [hidden email]
| > | Sent: Friday, March 2, 2012 12:31:42 AM
| > | Subject: Re: [Pharo-project] Monticello Version Info
| > |
| > |
| > | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
| > |
| > | > The issue is that Monticello is setup for distributed
| > | > processing,
| > | > and
| > | > allowing for multiple repositories, some of which may not be
| > | > available
| > | > to all of the users for a project.  For instance, a project
| > | > might
| > | > be
| > | > developed internally (or on the developers hard-drive) until
| > | > they
| > | > feel
| > | > comfortable distributing the code later.  So, publicly, you get
| > | > version 12, 17, 34, and 37.  There is no access to the
| > | > intermediate
| > | > ones (unless you happen to be the one that created them and
| > | > didn't
| > | > release them).  The 'whole ancestry' let's you do diffs off of
| > | > a
| > | > version derived from 37 against one derived from 34 - the
| > | > ancestry
| > | > can
| > | > determine that version 34 if 'common', and work from there.
| > | >  [Note
| > | > that just numbers aren't enough - the original developer, say,
| > | > cbc
| > | > could have version cbc.34, while you could have, say,
| > | > CamilloBruni.34,
| > | > but yours is based off of 17 (since you picked up that verison
| > | > and
| > | > started working there).  So, merging cbc.37 with
| > | > CamilloBruni.34
| > | > would
| > | > need to pull down cbc.17 for a good merge to work.]
| > | >
| > | > At least, that's my understanding from long ago discussions.
| > |
| > | This makes sense, but how is this handled with git ?
| > |
| > | Sven
| >
|
|

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Frank Shearar-3
On 2 March 2012 17:32, Dale Henrichs <[hidden email]> wrote:
> Frank,
>
> That's right ... the major difference between the two is that git manages multiple files ...

Well, kind've. A blob usually does contain the contents of a file.
(http://book.git-scm.com/1_the_git_object_model.html) That is, when
you checkout some branch you end up with a directory structure
containing files, which is what you're talking about. It's probably
better not to think of them as files: the blobs might look like files
in your working copy, but they're just, well, blobs. Chunks of binary
stuff that, for versioning software, happens to be UTF-8 encoded plain
text (or whatever).

(This is exactly why, from standing inside an image, mapping a method
to a file makes sense: a method's a single unit of stuff.)

Which is probably why you express yourself this way: Monticello turns
a whole bunch of methods + comments + class definitions into a big
snapshot.st - a single file in the zip - and a corresponding list of
changes to an image - "add this, remove that".

Both are still versioning a collection of things together, though: I
see no problem with saying "Monticello package Foo-fbs.2 means that
you have this class with this definition and that method with that
definition". That's pretty much the same thing as saying "git commit
id deadbeef means that you have this file with these contents and that
file with those contents".

frank

> Dale
>
> ----- Original Message -----
> | From: "Frank Shearar" <[hidden email]>
> | To: [hidden email]
> | Sent: Friday, March 2, 2012 9:27:39 AM
> | Subject: Re: [Pharo-project] Monticello Version Info
> |
> | On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
> | > Sven,
> | >
> | > A Monticello mcz file is a version data base for a single package
> | > .... Git is a version data base for a directory structure ...
> | >
> | > Monticello has branching by convention (change the name of a file
> | > to create the branch), although the mcz ancestry handles branches
> | > just fine. In Git branches are first class objects ... it is
> | > difficult to do things in git if you are not on one branch or
> | > another ...
> |
> | Bearing in mind that a branch is just a pointer to a commit: look in
> | your blah/.git/refs/heads/ and each file is a branch containing the
> | SHA1 id of the head of that branch. (And each commit knows its
> | ancestor/s, just like an mcz file, except that the hash means the
> | relationship's based on the commit's _contents_, not its _name_.)
> |
> | frank
> |
> | > You can merge with Monticello and you can merge with Git ...
> | >
> | > The big difference is that Git allows you to version a bunch of
> | > files together and with Monticello you are versioning a single
> | > file.
> | >
> | > Part of what Metacello was invented to do was to create a "data
> | > base" of versioned collections of mcz files ... Git was designed
> | > to manage collections of files...
> | >
> | > Is this what you were asking?
> | >
> | > Dale
> | >
> | > ----- Original Message -----
> | > | From: "Sven Van Caekenberghe" <[hidden email]>
> | > | To: [hidden email]
> | > | Sent: Friday, March 2, 2012 12:31:42 AM
> | > | Subject: Re: [Pharo-project] Monticello Version Info
> | > |
> | > |
> | > | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
> | > |
> | > | > The issue is that Monticello is setup for distributed
> | > | > processing,
> | > | > and
> | > | > allowing for multiple repositories, some of which may not be
> | > | > available
> | > | > to all of the users for a project.  For instance, a project
> | > | > might
> | > | > be
> | > | > developed internally (or on the developers hard-drive) until
> | > | > they
> | > | > feel
> | > | > comfortable distributing the code later.  So, publicly, you get
> | > | > version 12, 17, 34, and 37.  There is no access to the
> | > | > intermediate
> | > | > ones (unless you happen to be the one that created them and
> | > | > didn't
> | > | > release them).  The 'whole ancestry' let's you do diffs off of
> | > | > a
> | > | > version derived from 37 against one derived from 34 - the
> | > | > ancestry
> | > | > can
> | > | > determine that version 34 if 'common', and work from there.
> | > | >  [Note
> | > | > that just numbers aren't enough - the original developer, say,
> | > | > cbc
> | > | > could have version cbc.34, while you could have, say,
> | > | > CamilloBruni.34,
> | > | > but yours is based off of 17 (since you picked up that verison
> | > | > and
> | > | > started working there).  So, merging cbc.37 with
> | > | > CamilloBruni.34
> | > | > would
> | > | > need to pull down cbc.17 for a good merge to work.]
> | > | >
> | > | > At least, that's my understanding from long ago discussions.
> | > |
> | > | This makes sense, but how is this handled with git ?
> | > |
> | > | Sven
> | >
> |
> |
>

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Camillo Bruni-3
In reply to this post by Frank Shearar-3

On 2012-03-02, at 18:27, Frank Shearar wrote:

> On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
>> Sven,
>>
>> A Monticello mcz file is a version data base for a single package .... Git is a version data base for a directory structure ...
>>
>> Monticello has branching by convention (change the name of a file to create the branch), although the mcz ancestry handles branches just fine. In Git branches are first class objects ... it is difficult to do things in git if you are not on one branch or another ...
>
> Bearing in mind that a branch is just a pointer to a commit: look in
> your blah/.git/refs/heads/ and each file is a branch containing the
> SHA1 id of the head of that branch. (And each commit knows its
> ancestor/s, just like an mcz file, except that the hash means the
> relationship's based on the commit's _contents_, not its _name_.)

that's right. However mcz store a bunch of redundant data in there by having the complete ancestor history in there. in git you only store the pointers / hashes to the immediate ancestors which is enough in most cases.

right now we spend quite some time in just parsing the complete ancestor history, whereas this could be done lazily or in a more efficient format.

cami

>
> frank
>
>> You can merge with Monticello and you can merge with Git ...
>>
>> The big difference is that Git allows you to version a bunch of files together and with Monticello you are versioning a single file.
>>
>> Part of what Metacello was invented to do was to create a "data base" of versioned collections of mcz files ... Git was designed to manage collections of files...
>>
>> Is this what you were asking?
>>
>> Dale
>>
>> ----- Original Message -----
>> | From: "Sven Van Caekenberghe" <[hidden email]>
>> | To: [hidden email]
>> | Sent: Friday, March 2, 2012 12:31:42 AM
>> | Subject: Re: [Pharo-project] Monticello Version Info
>> |
>> |
>> | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
>> |
>> | > The issue is that Monticello is setup for distributed processing,
>> | > and
>> | > allowing for multiple repositories, some of which may not be
>> | > available
>> | > to all of the users for a project.  For instance, a project might
>> | > be
>> | > developed internally (or on the developers hard-drive) until they
>> | > feel
>> | > comfortable distributing the code later.  So, publicly, you get
>> | > version 12, 17, 34, and 37.  There is no access to the intermediate
>> | > ones (unless you happen to be the one that created them and didn't
>> | > release them).  The 'whole ancestry' let's you do diffs off of a
>> | > version derived from 37 against one derived from 34 - the ancestry
>> | > can
>> | > determine that version 34 if 'common', and work from there.  [Note
>> | > that just numbers aren't enough - the original developer, say, cbc
>> | > could have version cbc.34, while you could have, say,
>> | > CamilloBruni.34,
>> | > but yours is based off of 17 (since you picked up that verison and
>> | > started working there).  So, merging cbc.37 with CamilloBruni.34
>> | > would
>> | > need to pull down cbc.17 for a good merge to work.]
>> | >
>> | > At least, that's my understanding from long ago discussions.
>> |
>> | This makes sense, but how is this handled with git ?
>> |
>> | Sven
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Frank Shearar-3
On 2 March 2012 18:52, Camillo Bruni <[hidden email]> wrote:

>
> On 2012-03-02, at 18:27, Frank Shearar wrote:
>
>> On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
>>> Sven,
>>>
>>> A Monticello mcz file is a version data base for a single package .... Git is a version data base for a directory structure ...
>>>
>>> Monticello has branching by convention (change the name of a file to create the branch), although the mcz ancestry handles branches just fine. In Git branches are first class objects ... it is difficult to do things in git if you are not on one branch or another ...
>>
>> Bearing in mind that a branch is just a pointer to a commit: look in
>> your blah/.git/refs/heads/ and each file is a branch containing the
>> SHA1 id of the head of that branch. (And each commit knows its
>> ancestor/s, just like an mcz file, except that the hash means the
>> relationship's based on the commit's _contents_, not its _name_.)
>
> that's right. However mcz store a bunch of redundant data in there by having the complete ancestor history in there. in git you only store the pointers / hashes to the immediate ancestors which is enough in most cases.
>
> right now we spend quite some time in just parsing the complete ancestor history, whereas this could be done lazily or in a more efficient format.

But since there's nothing stopping you from losing an mcz (by accident
or on purpose: it's happened to me by accident), storing the entire
history lets you know _something_ about the history. Otherwise, with
the git-style pointer, you'd be screwed if you wanted history. "Oh,
where's this mcz? I have no idea, nor any idea where to look!"

Of course it's not necessary to parse the whole lot, which I suspect
is what you mean by "lazy".

frank

> cami
>
>>
>> frank
>>
>>> You can merge with Monticello and you can merge with Git ...
>>>
>>> The big difference is that Git allows you to version a bunch of files together and with Monticello you are versioning a single file.
>>>
>>> Part of what Metacello was invented to do was to create a "data base" of versioned collections of mcz files ... Git was designed to manage collections of files...
>>>
>>> Is this what you were asking?
>>>
>>> Dale
>>>
>>> ----- Original Message -----
>>> | From: "Sven Van Caekenberghe" <[hidden email]>
>>> | To: [hidden email]
>>> | Sent: Friday, March 2, 2012 12:31:42 AM
>>> | Subject: Re: [Pharo-project] Monticello Version Info
>>> |
>>> |
>>> | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
>>> |
>>> | > The issue is that Monticello is setup for distributed processing,
>>> | > and
>>> | > allowing for multiple repositories, some of which may not be
>>> | > available
>>> | > to all of the users for a project.  For instance, a project might
>>> | > be
>>> | > developed internally (or on the developers hard-drive) until they
>>> | > feel
>>> | > comfortable distributing the code later.  So, publicly, you get
>>> | > version 12, 17, 34, and 37.  There is no access to the intermediate
>>> | > ones (unless you happen to be the one that created them and didn't
>>> | > release them).  The 'whole ancestry' let's you do diffs off of a
>>> | > version derived from 37 against one derived from 34 - the ancestry
>>> | > can
>>> | > determine that version 34 if 'common', and work from there.  [Note
>>> | > that just numbers aren't enough - the original developer, say, cbc
>>> | > could have version cbc.34, while you could have, say,
>>> | > CamilloBruni.34,
>>> | > but yours is based off of 17 (since you picked up that verison and
>>> | > started working there).  So, merging cbc.37 with CamilloBruni.34
>>> | > would
>>> | > need to pull down cbc.17 for a good merge to work.]
>>> | >
>>> | > At least, that's my understanding from long ago discussions.
>>> |
>>> | This makes sense, but how is this handled with git ?
>>> |
>>> | Sven
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Igor Stasenko
i'd rather ask, why you need to parse ancestry info,
while other stuff is simply serialized and takes much less time to load.
i don't think that size of history matters. the problem is that it
matters when you using slow method(s) to load it.


--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Camillo Bruni-3
In reply to this post by Frank Shearar-3

On 2012-03-02, at 20:09, Frank Shearar wrote:

> On 2 March 2012 18:52, Camillo Bruni <[hidden email]> wrote:
>>
>> On 2012-03-02, at 18:27, Frank Shearar wrote:
>>
>>> On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
>>>> Sven,
>>>>
>>>> A Monticello mcz file is a version data base for a single package .... Git is a version data base for a directory structure ...
>>>>
>>>> Monticello has branching by convention (change the name of a file to create the branch), although the mcz ancestry handles branches just fine. In Git branches are first class objects ... it is difficult to do things in git if you are not on one branch or another ...
>>>
>>> Bearing in mind that a branch is just a pointer to a commit: look in
>>> your blah/.git/refs/heads/ and each file is a branch containing the
>>> SHA1 id of the head of that branch. (And each commit knows its
>>> ancestor/s, just like an mcz file, except that the hash means the
>>> relationship's based on the commit's _contents_, not its _name_.)
>>
>> that's right. However mcz store a bunch of redundant data in there by having the complete ancestor history in there. in git you only store the pointers / hashes to the immediate ancestors which is enough in most cases.
>>
>> right now we spend quite some time in just parsing the complete ancestor history, whereas this could be done lazily or in a more efficient format.
>
> But since there's nothing stopping you from losing an mcz (by accident
> or on purpose: it's happened to me by accident), storing the entire
> history lets you know _something_ about the history. Otherwise, with
> the git-style pointer, you'd be screwed if you wanted history. "Oh,
> where's this mcz? I have no idea, nor any idea where to look!"
>
> Of course it's not necessary to parse the whole lot, which I suspect
> is what you mean by "lazy".

right. I honestly don't care about the data being around ;) that is indeed a nice
integrity property of an mcz version. But as igor pointed out, we should consider
a slightly more efficient way of storing / retrieving data. the current approach
is just too slow I think.

I don't know what happens on server side of squeaksource, but is it easy to just
add another entry to an mzc file? To me this seems like an easy solution to speed
up MC quite a bit on Pharo but still keep it backwards compatible on other platforms.

best
cami


Reply | Threaded
Open this post in threaded view
|

Re: Monticello Version Info

Dale Henrichs
In reply to this post by Frank Shearar-3
Frank,

  I see no problem with saying "Monticello package Foo-fbs.2 means
  that you have this class with this definition and that method with
  that definition".

Okay I don't have a problem with that sentence.

  That's pretty much the same thing as saying "git commit
  id deadbeef means that you have this file with these contents and that
  file with those contents".

If you are implying that a single package version can be versioned in git with same semantics as Monticello, then I will agree.

Again, where git and Monticello differ is that one can version multiple packages with git (same version/same commit) and one cannot with Monticello (different files for each package and no shared commit history)...

Dale

----- Original Message -----
| From: "Frank Shearar" <[hidden email]>
| To: [hidden email]
| Sent: Friday, March 2, 2012 9:46:39 AM
| Subject: Re: [Pharo-project] Monticello Version Info
|
| On 2 March 2012 17:32, Dale Henrichs <[hidden email]> wrote:
| > Frank,
| >
| > That's right ... the major difference between the two is that git
| > manages multiple files ...
|
| Well, kind've. A blob usually does contain the contents of a file.
| (http://book.git-scm.com/1_the_git_object_model.html) That is, when
| you checkout some branch you end up with a directory structure
| containing files, which is what you're talking about. It's probably
| better not to think of them as files: the blobs might look like files
| in your working copy, but they're just, well, blobs. Chunks of binary
| stuff that, for versioning software, happens to be UTF-8 encoded
| plain
| text (or whatever).
|
| (This is exactly why, from standing inside an image, mapping a method
| to a file makes sense: a method's a single unit of stuff.)
|
| Which is probably why you express yourself this way: Monticello turns
| a whole bunch of methods + comments + class definitions into a big
| snapshot.st - a single file in the zip - and a corresponding list of
| changes to an image - "add this, remove that".
|
| Both are still versioning a collection of things together, though: I
| see no problem with saying "Monticello package Foo-fbs.2 means that
| you have this class with this definition and that method with that
| definition". That's pretty much the same thing as saying "git commit
| id deadbeef means that you have this file with these contents and
| that
| file with those contents".
|
| frank
|
| > Dale
| >
| > ----- Original Message -----
| > | From: "Frank Shearar" <[hidden email]>
| > | To: [hidden email]
| > | Sent: Friday, March 2, 2012 9:27:39 AM
| > | Subject: Re: [Pharo-project] Monticello Version Info
| > |
| > | On 2 March 2012 17:02, Dale Henrichs <[hidden email]> wrote:
| > | > Sven,
| > | >
| > | > A Monticello mcz file is a version data base for a single
| > | > package
| > | > .... Git is a version data base for a directory structure ...
| > | >
| > | > Monticello has branching by convention (change the name of a
| > | > file
| > | > to create the branch), although the mcz ancestry handles
| > | > branches
| > | > just fine. In Git branches are first class objects ... it is
| > | > difficult to do things in git if you are not on one branch or
| > | > another ...
| > |
| > | Bearing in mind that a branch is just a pointer to a commit: look
| > | in
| > | your blah/.git/refs/heads/ and each file is a branch containing
| > | the
| > | SHA1 id of the head of that branch. (And each commit knows its
| > | ancestor/s, just like an mcz file, except that the hash means the
| > | relationship's based on the commit's _contents_, not its _name_.)
| > |
| > | frank
| > |
| > | > You can merge with Monticello and you can merge with Git ...
| > | >
| > | > The big difference is that Git allows you to version a bunch of
| > | > files together and with Monticello you are versioning a single
| > | > file.
| > | >
| > | > Part of what Metacello was invented to do was to create a "data
| > | > base" of versioned collections of mcz files ... Git was
| > | > designed
| > | > to manage collections of files...
| > | >
| > | > Is this what you were asking?
| > | >
| > | > Dale
| > | >
| > | > ----- Original Message -----
| > | > | From: "Sven Van Caekenberghe" <[hidden email]>
| > | > | To: [hidden email]
| > | > | Sent: Friday, March 2, 2012 12:31:42 AM
| > | > | Subject: Re: [Pharo-project] Monticello Version Info
| > | > |
| > | > |
| > | > | On 02 Mar 2012, at 01:52, Chris Cunningham wrote:
| > | > |
| > | > | > The issue is that Monticello is setup for distributed
| > | > | > processing,
| > | > | > and
| > | > | > allowing for multiple repositories, some of which may not
| > | > | > be
| > | > | > available
| > | > | > to all of the users for a project.  For instance, a project
| > | > | > might
| > | > | > be
| > | > | > developed internally (or on the developers hard-drive)
| > | > | > until
| > | > | > they
| > | > | > feel
| > | > | > comfortable distributing the code later.  So, publicly, you
| > | > | > get
| > | > | > version 12, 17, 34, and 37.  There is no access to the
| > | > | > intermediate
| > | > | > ones (unless you happen to be the one that created them and
| > | > | > didn't
| > | > | > release them).  The 'whole ancestry' let's you do diffs off
| > | > | > of
| > | > | > a
| > | > | > version derived from 37 against one derived from 34 - the
| > | > | > ancestry
| > | > | > can
| > | > | > determine that version 34 if 'common', and work from there.
| > | > | >  [Note
| > | > | > that just numbers aren't enough - the original developer,
| > | > | > say,
| > | > | > cbc
| > | > | > could have version cbc.34, while you could have, say,
| > | > | > CamilloBruni.34,
| > | > | > but yours is based off of 17 (since you picked up that
| > | > | > verison
| > | > | > and
| > | > | > started working there).  So, merging cbc.37 with
| > | > | > CamilloBruni.34
| > | > | > would
| > | > | > need to pull down cbc.17 for a good merge to work.]
| > | > | >
| > | > | > At least, that's my understanding from long ago
| > | > | > discussions.
| > | > |
| > | > | This makes sense, but how is this handled with git ?
| > | > |
| > | > | Sven
| > | >
| > |
| > |
| >
|
|

12