git and author/timestamps, 10000's of files, etc.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

git and author/timestamps, 10000's of files, etc.

Dale Henrichs-3
I've been skimming the ironically named "blame" thread and just want to
clear up some apparent misconceptions.

git/github is not the reason that the author/timestamps information was
"lost" ... when tonel was introduced the author/timestamp info was not
included in the format as a separately serialized file. filetree's
implementation of author/timestamp support (around for almost 6 years
now) was an annoying source of commit conflicts and often prevented
automatic merges.

git/github has perfectly functional blame support[1], so the decision to
rely on git for supplying the author/timestamp for tonel was a sound
decision. Thierry Goubier's GitFileTree[2] implementation does a very
good job of converting git author/timestamp information into Monticello
meta data, and is proof that author/timestamp information can be
extracted from a git repository.

AFAICT, the single issue here is that the code to link between git's
author/timestamp information and the in-image author/timestamp
information has not been written YET ...

git/github is not the reason that there are "1000's of files on disk".
Tonel uses a file per class format which significanly reduces the file
count for Smalltalk repositories on disk. FileTree format uses a file
per method format and for large projects leads to a large number of
files, which in and of itself is not a problem (at least for git), but
does lead to excessive disk space consumption.

SmalltalkCI[2], which provides support for Smalltalk on travis-ci[3] and
appveyory[4] was created by Fabio Niephaus an active member of the
Squeak community. Travis-ci makes it possible to run cross-dialect tests
to validate github pull requests and checkins[5] for cross-dialect
projects.

It seems that there are at least some (less vocal) members of the Squeak
community who are interested in using git.

Dale

[1] https://www.git-scm.com/docs/git-blame
[2] https://github.com/hpi-swa/smalltalkCI
[3] https://travis-ci.org/
[4] https://www.appveyor.com/
[5] https://travis-ci.org/Metacello/metacello

Reply | Threaded
Open this post in threaded view
|

Re: git and author/timestamps, 10000's of files, etc.

Nicolas Cellier
Hi Dale,
What is the atomic change considered? Method level or line level? In the first case, author and timestamps won't conflict unless the method conflicts. Sure, it is redundant with git data if using git exclusively and if caring to import history like with gitsvn... in a context of exchange with others it's not.

Le 14 janv. 2018 7:12 PM, "Dale Henrichs" <[hidden email]> a écrit :
I've been skimming the ironically named "blame" thread and just want to clear up some apparent misconceptions.

git/github is not the reason that the author/timestamps information was "lost" ... when tonel was introduced the author/timestamp info was not included in the format as a separately serialized file. filetree's implementation of author/timestamp support (around for almost 6 years now) was an annoying source of commit conflicts and often prevented automatic merges.

git/github has perfectly functional blame support[1], so the decision to rely on git for supplying the author/timestamp for tonel was a sound decision. Thierry Goubier's GitFileTree[2] implementation does a very good job of converting git author/timestamp information into Monticello meta data, and is proof that author/timestamp information can be extracted from a git repository.

AFAICT, the single issue here is that the code to link between git's author/timestamp information and the in-image author/timestamp information has not been written YET ...

git/github is not the reason that there are "1000's of files on disk". Tonel uses a file per class format which significanly reduces the file count for Smalltalk repositories on disk. FileTree format uses a file per method format and for large projects leads to a large number of files, which in and of itself is not a problem (at least for git), but does lead to excessive disk space consumption.

SmalltalkCI[2], which provides support for Smalltalk on travis-ci[3] and appveyory[4] was created by Fabio Niephaus an active member of the Squeak community. Travis-ci makes it possible to run cross-dialect tests to validate github pull requests and checkins[5] for cross-dialect projects.

It seems that there are at least some (less vocal) members of the Squeak community who are interested in using git.

Dale

[1] https://www.git-scm.com/docs/git-blame
[2] https://github.com/hpi-swa/smalltalkCI
[3] https://travis-ci.org/
[4] https://www.appveyor.com/
[5] https://travis-ci.org/Metacello/metacello

Reply | Threaded
Open this post in threaded view
|

Re: git and author/timestamps, 10000's of files, etc.

Guillermo Polito
In reply to this post by Dale Henrichs-3


On Sun, Jan 14, 2018 at 7:11 PM, Dale Henrichs <[hidden email]> wrote:
I've been skimming the ironically named "blame" thread and just want to clear up some apparent misconceptions.

git/github is not the reason that the author/timestamps information was "lost" ... when tonel was introduced the author/timestamp info was not included in the format as a separately serialized file. filetree's implementation of author/timestamp support (around for almost 6 years now) was an annoying source of commit conflicts and often prevented automatic merges.

git/github has perfectly functional blame support[1], so the decision to rely on git for supplying the author/timestamp for tonel was a sound decision. Thierry Goubier's GitFileTree[2] implementation does a very good job of converting git author/timestamp information into Monticello meta data, and is proof that author/timestamp information can be extracted from a git repository.

AFAICT, the single issue here is that the code to link between git's author/timestamp information and the in-image author/timestamp information has not been written YET ...

Actually, I remember Esteban doing a prototype some months ago. The idea was to query author/timestamp information from git just after bootstrapping, and to embed this information in the resulting image.
The problem is that this approach was SUPER slow, because it had to blame on each method. And increasing the build times is not very nice because it kind of slows down the frequency of integrations, specially during sprints.

Now, each commit usually changes only a couple of methods, so author/timestamp of methods will remain the same for 99.9% of the methods most of the time. It would be possible to cache it and re-calculate the new ones following the commit diff. At least this would restore the timestamp of the methods found in the image.

A second point, and orthogonal to the previous one, would be to also manage the "browse versions". Today the version browser queryies only the changes file. I think it would make sense to do it by epicea files (for local modifications) and then fallback to a git repository (if any) to give a full history. Technically, it should not be difficult to do, except for the detail that the repository was transformed at some point from filetree to tonel, so blame will not quite work out of the box :).


git/github is not the reason that there are "1000's of files on disk". Tonel uses a file per class format which significanly reduces the file count for Smalltalk repositories on disk. FileTree format uses a file per method format and for large projects leads to a large number of files, which in and of itself is not a problem (at least for git), but does lead to excessive disk space consumption.

SmalltalkCI[2], which provides support for Smalltalk on travis-ci[3] and appveyory[4] was created by Fabio Niephaus an active member of the Squeak community. Travis-ci makes it possible to run cross-dialect tests to validate github pull requests and checkins[5] for cross-dialect projects.

It seems that there are at least some (less vocal) members of the Squeak community who are interested in using git.

Dale

[1] https://www.git-scm.com/docs/git-blame
[2] https://github.com/hpi-swa/smalltalkCI
[3] https://travis-ci.org/
[4] https://www.appveyor.com/
[5] https://travis-ci.org/Metacello/metacello




--

   

Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - http://www.cnrs.fr


Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: git and author/timestamps, 10000's of files, etc.

EstebanLM
Hi,

On 15 Jan 2018, at 10:44, Guillermo Polito <[hidden email]> wrote:



On Sun, Jan 14, 2018 at 7:11 PM, Dale Henrichs <[hidden email]> wrote:
I've been skimming the ironically named "blame" thread and just want to clear up some apparent misconceptions.

git/github is not the reason that the author/timestamps information was "lost" ... when tonel was introduced the author/timestamp info was not included in the format as a separately serialized file. filetree's implementation of author/timestamp support (around for almost 6 years now) was an annoying source of commit conflicts and often prevented automatic merges.

git/github has perfectly functional blame support[1], so the decision to rely on git for supplying the author/timestamp for tonel was a sound decision. Thierry Goubier's GitFileTree[2] implementation does a very good job of converting git author/timestamp information into Monticello meta data, and is proof that author/timestamp information can be extracted from a git repository.

AFAICT, the single issue here is that the code to link between git's author/timestamp information and the in-image author/timestamp information has not been written YET ...

Actually, I remember Esteban doing a prototype some months ago. The idea was to query author/timestamp information from git just after bootstrapping, and to embed this information in the resulting image.
The problem is that this approach was SUPER slow, because it had to blame on each method. And increasing the build times is not very nice because it kind of slows down the frequency of integrations, specially during sprints.

Now, each commit usually changes only a couple of methods, so author/timestamp of methods will remain the same for 99.9% of the methods most of the time. It would be possible to cache it and re-calculate the new ones following the commit diff. At least this would restore the timestamp of the methods found in the image.

A second point, and orthogonal to the previous one, would be to also manage the "browse versions". Today the version browser queryies only the changes file. I think it would make sense to do it by epicea files (for local modifications) and then fallback to a git repository (if any) to give a full history. Technically, it should not be difficult to do, except for the detail that the repository was transformed at some point from filetree to tonel, so blame will not quite work out of the box :).

yes. We explored this and there is a “log” functionality already in iceberg, that works for filetree.
I’m working on make it work for tonel. 

As I said we made the experiment, but then after analysis we realise that is very slow and even if we could cache. etc. , we also realised that we do not need to reproduce the same behaviour to have the desired result.

So, we are going to change the way author/timestamp work: 

you will see you in-image versions as now, but then if you want to see everything of a method you will have a button in versions browser that will fill ALL method history, taken from the git repository. This full history will provide correct authorship and an extra, very valuable bonus: you will see the method history from the moment he is born (you can’t have this with plain monticello today)

cheers, 
Esteban




git/github is not the reason that there are "1000's of files on disk". Tonel uses a file per class format which significanly reduces the file count for Smalltalk repositories on disk. FileTree format uses a file per method format and for large projects leads to a large number of files, which in and of itself is not a problem (at least for git), but does lead to excessive disk space consumption.

SmalltalkCI[2], which provides support for Smalltalk on travis-ci[3] and appveyory[4] was created by Fabio Niephaus an active member of the Squeak community. Travis-ci makes it possible to run cross-dialect tests to validate github pull requests and checkins[5] for cross-dialect projects.

It seems that there are at least some (less vocal) members of the Squeak community who are interested in using git.

Dale

[1] https://www.git-scm.com/docs/git-blame
[2] https://github.com/hpi-swa/smalltalkCI
[3] https://travis-ci.org/
[4] https://www.appveyor.com/
[5] https://travis-ci.org/Metacello/metacello




--
   
Guille Polito
Research Engineer


Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - http://www.cnrs.fr

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: git and author/timestamps, 10000's of files, etc.

Sven Van Caekenberghe-2


> On 15 Jan 2018, at 11:13, Esteban Lorenzano <[hidden email]> wrote:
>
> Hi,
>
>> On 15 Jan 2018, at 10:44, Guillermo Polito <[hidden email]> wrote:
>>
>>
>>
>> On Sun, Jan 14, 2018 at 7:11 PM, Dale Henrichs <[hidden email]> wrote:
>> I've been skimming the ironically named "blame" thread and just want to clear up some apparent misconceptions.
>>
>> git/github is not the reason that the author/timestamps information was "lost" ... when tonel was introduced the author/timestamp info was not included in the format as a separately serialized file. filetree's implementation of author/timestamp support (around for almost 6 years now) was an annoying source of commit conflicts and often prevented automatic merges.
>>
>> git/github has perfectly functional blame support[1], so the decision to rely on git for supplying the author/timestamp for tonel was a sound decision. Thierry Goubier's GitFileTree[2] implementation does a very good job of converting git author/timestamp information into Monticello meta data, and is proof that author/timestamp information can be extracted from a git repository.
>>
>> AFAICT, the single issue here is that the code to link between git's author/timestamp information and the in-image author/timestamp information has not been written YET ...
>>
>> Actually, I remember Esteban doing a prototype some months ago. The idea was to query author/timestamp information from git just after bootstrapping, and to embed this information in the resulting image.
>> The problem is that this approach was SUPER slow, because it had to blame on each method. And increasing the build times is not very nice because it kind of slows down the frequency of integrations, specially during sprints.
>>
>> Now, each commit usually changes only a couple of methods, so author/timestamp of methods will remain the same for 99.9% of the methods most of the time. It would be possible to cache it and re-calculate the new ones following the commit diff. At least this would restore the timestamp of the methods found in the image.
>>
>> A second point, and orthogonal to the previous one, would be to also manage the "browse versions". Today the version browser queryies only the changes file. I think it would make sense to do it by epicea files (for local modifications) and then fallback to a git repository (if any) to give a full history. Technically, it should not be difficult to do, except for the detail that the repository was transformed at some point from filetree to tonel, so blame will not quite work out of the box :).
>
> yes. We explored this and there is a “log” functionality already in iceberg, that works for filetree.
> I’m working on make it work for tonel.
>
> As I said we made the experiment, but then after analysis we realise that is very slow and even if we could cache. etc. , we also realised that we do not need to reproduce the same behaviour to have the desired result.
>
> So, we are going to change the way author/timestamp work:
>
> you will see you in-image versions as now, but then if you want to see everything of a method you will have a button in versions browser that will fill ALL method history, taken from the git repository. This full history will provide correct authorship and an extra, very valuable bonus: you will see the method history from the moment he is born (you can’t have this with plain monticello today)

That is what we want/need. Esteban, our hero !

> cheers,
> Esteban
>
>
>>
>>
>> git/github is not the reason that there are "1000's of files on disk". Tonel uses a file per class format which significanly reduces the file count for Smalltalk repositories on disk. FileTree format uses a file per method format and for large projects leads to a large number of files, which in and of itself is not a problem (at least for git), but does lead to excessive disk space consumption.
>>
>> SmalltalkCI[2], which provides support for Smalltalk on travis-ci[3] and appveyory[4] was created by Fabio Niephaus an active member of the Squeak community. Travis-ci makes it possible to run cross-dialect tests to validate github pull requests and checkins[5] for cross-dialect projects.
>>
>> It seems that there are at least some (less vocal) members of the Squeak community who are interested in using git.
>>
>> Dale
>>
>> [1] https://www.git-scm.com/docs/git-blame
>> [2] https://github.com/hpi-swa/smalltalkCI
>> [3] https://travis-ci.org/
>> [4] https://www.appveyor.com/
>> [5] https://travis-ci.org/Metacello/metacello
>>
>>
>>
>>
>> --
>>    
>> Guille Polito
>> Research Engineer
>>
>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>> CRIStAL - UMR 9189
>> French National Center for Scientific Research - http://www.cnrs.fr
>>
>> Web: http://guillep.github.io
>> Phone: +33 06 52 70 66 13