Smalltalk › Pharo › Pharo Smalltalk Developers

Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Dale Henrichs-3

Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

This is the first post in what I expect to be a series of posts describing how I work with git and Smalltalk.

The content of this post is based on work that I have done over the last 4 years working with FileTree[4], Metacello[3], tODE[1], GsDevKit_home[2], and git.

It's important to note that while I will talk exclusively about git in this series of posts, 99% of what I talk about can be directly mapped to other disk-based SCM like svn, mercurial, etc.

I will describe the structure used by GsDevKit_home with the intent to provide a reference point for discussion. The structure I am describing is basically my third attempt at disk structure, so it benefits from "mistakes" that I have made in previous attempts. I've been using the current structure on a daily basis for the last 9 months, so what I'm describing is based on real-life structures and code.
----

Currently,the fundamental pharo structure can be described as consisting of: a vm, a sources file, an image file, a changes file, and a package-cache. The vm and sources file are shared resources and can be used with a collection of images. The image and changes files are tied together as the changes files has the source for methods that are in the image. The image and changes files are also tied to specific versions of the sources file and vm... The package-cache can be shared amongst a wide variety of images including images created by different vms ...

When you plug git into the mix, you'll have a a local git repository for each project that you have loaded in your image or intend to load into your image. The git repository should be considered as a resource that lies in between the package-cache and a changes file.

The git repository is not tightly coupled to the image like the changes file (you can delete a git repository without impacting the functionality of an image), but if you are making changes in the image, then you _are_ coupled to a particular git commit identified by its SHA.

Metacello will record the SHA of the git repository at the time that packages are loaded from a git repository (to enable this feature in Pharo, some methods need to be implemented in the MetacelloPlatform for Pharo) and this information is stored along with the Metacello registration for the project.

Since the git repository can be independently manipulated from the shell or an image can be saved that references a SHA that is no longer "current" .... it is very important that the SHA be recorded and made visible to users via the tools.

In the early days of my work with git, I didn't have this feature and it was very difficult to recover work from old images - I also made mistakes by committing an older version of a package over a newer commit ... nasty stuff ... Currently in tODE, i provide two menu items to work with "version skew": `skew diff` and `skew save`.

The `skew diff` menu item just show the differences between to two commits (the commit when the code was loaded and the current commit) - if your project isn't dirty and the diff doesn't contain unexpected changeds, then you may simple load the new commit into your image and move forward.

The `skew save` menu item, basically does a merge between the code in the image and the current commit - there a number of git operations that go into this process: checkout original SHA; save packages and commit; checkout custom branch from the "current commit", merge new commit into the branch.

I plan on adding a `skew clone` menu item that does a git clone into a repository directory that is reserved for use by the stone (image) and then checks out the original SHA... this option would be more useful for folks who "know what they are doing" and allow for different work flows to be followed...

Soooo --- managing skew is an important function for the "project browser"... Here's what the tODE "Project Browser" looks like when version skew is detected:

project list with version skew

Well I think that's enough for now...Here are some topics that I plan to cover in future posts:
- what has to be done to get the initial clone and hook it into an image?
- sharing git repos between separate "active images"
- cloning a github:// repos?
- ssh vs https clones
- working with remotes

Dale

[1] https://github.com/dalehenrich/tode
[2] https://github.com/GsDevKit/GsDevKit_home
[3] https://github.com/dalehenrich/metacello-work
[4] https://github.com/dalehenrich/filetree

Ben Coman

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

On Thu, Jan 28, 2016 at 5:03 AM, Dale Henrichs <[hidden email]> wrote:

The git repository is not tightly coupled to the image like the changes file (you can delete a git repository without impacting the functionality of an image), but if you are making changes in the image, then you _are_ coupled to a particular git commit identified by its SHA.

Metacello will record the SHA of the git repository at the time that packages are loaded from a git repository (to enable this feature in Pharo, some methods need to be implemented in the MetacelloPlatform for Pharo) and this information is stored along with the Metacello registration for the project.

Since the git repository can be independently manipulated from the shell or an image can be saved that references a SHA that is no longer "current" .... it is very important that the SHA be recorded and made visible to users via the tools.

By "current", do you mean the currently checked out directory as seen from the shell command line?

Slightly off topic...

I wonder if we could avoid needing a currently checked out directory as seen from the command line and have the Image work directly with the git repository (within the .git folder - is this what libgit integration may give us?) thus making the Image the equivalent of the current checked out directory? Then lets dump the .changes file by... rather than a method's source reference being a (fragile) hardcoded index into the .changes file, make each method an SHA key into the git (or mercurial) repository.

* Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. You’ll learn more about what this means in a bit. [2]

* Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time. [3]

* [In Mercurial...] each revision of a file is identified by a 'NodeID', which is a SHA hash of its contents (combined with the position of that node in the history). [4]

In short, should our source code indexing be "content-addressable" with the possibility to use *any* key-value repository. Each method SHA would be calculated from its own source code and parent class SHA. Each class would calculate its SHA from its definition and superclass' SHA. A package/project SHA would be calculated from all its component class and method SHAs.

[1] https://git-scm.com/book/en/v1/Git-Internals-Plumbing-and-Porcelain

[2] https://git-scm.com/book/en/v1/Git-Internals

[3] https://git-scm.com/book/en/v1/Git-Internals-Git-Objects

[4] http://ericsink.com/vcbe/html/repository_structure.html

cheer -ben

Soooo --- managing skew is an important function for the "project browser"... Here's what the tODE "Project Browser" looks like when version skew is detected:

In this I assume the red ones are skewed and the X ^ Y represent commits, but could you explain which ones.

Well I think that's enough for now...Here are some topics that I plan to cover in future posts:
- what has to be done to get the initial clone and hook it into an image?
- sharing git repos between separate "active images"
- cloning a github:// repos?
- ssh vs https clones
- working with remotes

Dale

[1] https://github.com/dalehenrich/tode
[2] https://github.com/GsDevKit/GsDevKit_home
[3] https://github.com/dalehenrich/metacello-work
[4] https://github.com/dalehenrich/filetree

Ben Coman

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

On Thu, Jan 28, 2016 at 9:41 AM, Ben Coman <[hidden email]> wrote:

On Thu, Jan 28, 2016 at 5:03 AM, Dale Henrichs <[hidden email]> wrote:

The git repository is not tightly coupled to the image like the changes file (you can delete a git repository without impacting the functionality of an image), but if you are making changes in the image, then you _are_ coupled to a particular git commit identified by its SHA.

Metacello will record the SHA of the git repository at the time that packages are loaded from a git repository (to enable this feature in Pharo, some methods need to be implemented in the MetacelloPlatform for Pharo) and this information is stored along with the Metacello registration for the project.

Since the git repository can be independently manipulated from the shell or an image can be saved that references a SHA that is no longer "current" .... it is very important that the SHA be recorded and made visible to users via the tools.

By "current", do you mean the currently checked out directory as seen from the shell command line?

Slightly off topic...
I wonder if we could avoid needing a currently checked out directory as seen from the command line and have the Image work directly with the git repository (within the .git folder - is this what libgit integration may give us?) thus making the Image the equivalent of the current checked out directory? Then lets dump the .changes file by... rather than a method's source reference being a (fragile) hardcoded index into the .changes file, make each method an SHA key into the git (or mercurial) repository.
* Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. You’ll learn more about what this means in a bit. [2]
* Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time. [3]
* [In Mercurial...] each revision of a file is identified by a 'NodeID', which is a SHA hash of its contents (combined with the position of that node in the history). [4]

* The baseline data structures for Fossil and Git are the same (modulo formatting details). Both systems store check-ins as immutable objects referencing their immediate ancestors and named by their SHA1 hash. [5]

So to firm up this idea, while we might not want to tie ourselves too tightly to git, rather than try working with *all* types of source code management, would it be reasonable to concentrate on an abstraction layer focused on *content-addressable* source code? That is, would this provide flexibility for sufficient backend options, but a better experience with those than a totally generic source code management. (btw, Could EPICEA be made to work with SHAs?)

cheers -ben

In short, should our source code indexing be "content-addressable" with the possibility to use *any* key-value repository. Each method SHA would be calculated from its own source code and parent class SHA. Each class would calculate its SHA from its definition and superclass' SHA. A package/project SHA would be calculated from all its component class and method SHAs.

[1] https://git-scm.com/book/en/v1/Git-Internals-Plumbing-and-Porcelain
[2] https://git-scm.com/book/en/v1/Git-Internals
[3] https://git-scm.com/book/en/v1/Git-Internals-Git-Objects
[4] http://ericsink.com/vcbe/html/repository_structure.html

[5] http://fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.wiki

cheer -ben

Soooo --- managing skew is an important function for the "project browser"... Here's what the tODE "Project Browser" looks like when version skew is detected:

In this I assume the red ones are skewed and the X ^ Y represent commits, but could you explain which ones.

Well I think that's enough for now...Here are some topics that I plan to cover in future posts:
- what has to be done to get the initial clone and hook it into an image?
- sharing git repos between separate "active images"
- cloning a github:// repos?
- ssh vs https clones
- working with remotes

Dale

[1] https://github.com/dalehenrich/tode
[2] https://github.com/GsDevKit/GsDevKit_home
[3] https://github.com/dalehenrich/metacello-work
[4] https://github.com/dalehenrich/filetree

Dale Henrichs-3

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

In reply to this post by Ben Coman

On 01/27/2016 05:41 PM, Ben Coman wrote:

On Thu, Jan 28, 2016 at 5:03 AM, Dale Henrichs <[hidden email]> wrote:

The git repository is not tightly coupled to the image like the changes file (you can delete a git repository without impacting the functionality of an image), but if you are making changes in the image, then you _are_ coupled to a particular git commit identified by its SHA.

Metacello will record the SHA of the git repository at the time that packages are loaded from a git repository (to enable this feature in Pharo, some methods need to be implemented in the MetacelloPlatform for Pharo) and this information is stored along with the Metacello registration for the project.

Since the git repository can be independently manipulated from the shell or an image can be saved that references a SHA that is no longer "current" .... it is very important that the SHA be recorded and made visible to users via the tools.

By "current", do you mean the currently checked out directory as seen from the shell command line?

Yes, the shell command line, and the directory that filtree:// repo in the image references ...

Slightly off topic...

I wonder if we could avoid needing a currently checked out directory as seen from the command line and have the Image work directly with the git repository (within the .git folder - is this what libgit integration may give us?) thus making the Image the equivalent of the current checked out directory? Then lets dump the .changes file by... rather than a method's source reference being a (fragile) hardcoded index into the .changes file, make each method an SHA key into the git (or mercurial) repository.

* Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. You’ll learn more about what this means in a bit. [2]

* Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time. [3]

* [In Mercurial...] each revision of a file is identified by a 'NodeID', which is a SHA hash of its contents (combined with the position of that node in the history). [4]

In short, should our source code indexing be "content-addressable" with the possibility to use *any* key-value repository. Each method SHA would be calculated from its own source code and parent class SHA. Each class would calculate its SHA from its definition and superclass' SHA. A package/project SHA would be calculated from all its component class and method SHAs.

Well, this is a direction that could be taken, but there some advantages to sharing the actual checkouts ... in my workflow, I have a collection of git repos that represent my current base system and I have a number of stones (images) that share this same base system and when I make a bugfix in one, I am interested in being able to easily load the bugfix into the other stones before starting work in that particular stone so having a shared "current directory" amongst a set of stones is actually convenient .... this "convenience" also extends to building a new stone (image) since the build scripts only need to load from the currently checkout versions without having to know the specific SHA of interest ...

Dale

Dale Henrichs-3

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

In reply to this post by Ben Coman

On 01/27/2016 06:01 PM, Ben Coman wrote:

On Thu, Jan 28, 2016 at 9:41 AM, Ben Coman <[hidden email]> wrote:

On Thu, Jan 28, 2016 at 5:03 AM, Dale Henrichs <[hidden email]> wrote:

The git repository is not tightly coupled to the image like the changes file (you can delete a git repository without impacting the functionality of an image), but if you are making changes in the image, then you _are_ coupled to a particular git commit identified by its SHA.

Metacello will record the SHA of the git repository at the time that packages are loaded from a git repository (to enable this feature in Pharo, some methods need to be implemented in the MetacelloPlatform for Pharo) and this information is stored along with the Metacello registration for the project.

Since the git repository can be independently manipulated from the shell or an image can be saved that references a SHA that is no longer "current" .... it is very important that the SHA be recorded and made visible to users via the tools.

By "current", do you mean the currently checked out directory as seen from the shell command line?

Slightly off topic...

I wonder if we could avoid needing a currently checked out directory as seen from the command line and have the Image work directly with the git repository (within the .git folder - is this what libgit integration may give us?) thus making the Image the equivalent of the current checked out directory? Then lets dump the .changes file by... rather than a method's source reference being a (fragile) hardcoded index into the .changes file, make each method an SHA key into the git (or mercurial) repository.

* Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. You’ll learn more about what this means in a bit. [2]

* Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time. [3]

* [In Mercurial...] each revision of a file is identified by a 'NodeID', which is a SHA hash of its contents (combined with the position of that node in the history). [4]

* The baseline data structures for Fossil and Git are the same (modulo formatting details). Both systems store check-ins as immutable objects referencing their immediate ancestors and named by their SHA1 hash. [5]

So to firm up this idea, while we might not want to tie ourselves too tightly to git, rather than try working with *all* types of source code management, would it be reasonable to concentrate on an abstraction layer focused on *content-addressable* source code? That is, would this provide flexibility for sufficient backend options, but a better experience with those than a totally generic source code management. (btw, Could EPICEA be made to work with SHAs?)

Yes I read the paper on "An Abstraction for Version Control Systems"[1] and if I recall correctly SVN is lacking in a few of the features that had to be "simulated" ... so an abstraction layer that did not have to "fake features out" makes a lot of sense ... with that said, it also makes a lot of sense to not exclude SVN, so an abstraction layer that had provision for unsupported features would be useful ...

Dale

[1] https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwjotZ6SmM3KAhVH2mMKHSItBSEQFggjMAA&url=https%3A%2F%2Fpublishup.uni-potsdam.de%2Ffiles%2F5708%2Ftbhpi54.pdf&usg=AFQjCNF6aORe6AHsTr0mrIMe1QueECxgKw

Stephan Eggermont-3

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

In reply to this post by Dale Henrichs-3

On 28-01-16 19:26, Dale Henrichs wrote:

> Well, this is a direction that could be taken, but there some advantages
> to sharing the actual checkouts ... in my workflow, I have a collection
> of git repos that represent my current base system and I have a number
> of stones (images) that share this same base system and when I make a
> bugfix in one, I am interested in being able to easily load the bugfix
> into the other stones before starting work in that particular stone so
> having a shared "current directory" amongst a set of stones is actually
> convenient .... this "convenience" also extends to building a new stone
> (image) since the build scripts only need to load from the currently
> checkout versions without having to know the specific SHA of interest ...

So we'd basically have a directory of vms, one of image templates and
one of git repos, and a project directory where I'd make a particular
combination of the three? Or when multi-platform a platform directory
above the vms and images one?

Stephan

Dale Henrichs-3

Re: Experiences with git, tODE and GsDevKit_home: Part 1 - basic structure considerations

On 01/28/2016 02:28 PM, Stephan Eggermont wrote:

> On 28-01-16 19:26, Dale Henrichs wrote:
>> Well, this is a direction that could be taken, but there some advantages
>> to sharing the actual checkouts ... in my workflow, I have a collection
>> of git repos that represent my current base system and I have a number
>> of stones (images) that share this same base system and when I make a
>> bugfix in one, I am interested in being able to easily load the bugfix
>> into the other stones before starting work in that particular stone so
>> having a shared "current directory" amongst a set of stones is actually
>> convenient .... this "convenience" also extends to building a new stone
>> (image) since the build scripts only need to load from the currently
>> checkout versions without having to know the specific SHA of interest
>> ...
>
> So we'd basically have a directory of vms, one of image templates and
> one of git repos, and a project directory where I'd make a particular
> combination of the three? Or when multi-platform a platform directory
> above the vms and images one?

Stephan,

It does make sense to begin thinking about a structure that makes sense
for Pharo.

In GsDevKit_home, I have a $GS_HOME env var that points to the root of
the directory structure for a collection of stones and the top-level
directory structure looks like the following:

▸ $GS_HOME
▸ bin/
▸ dev/
▸ docs/
▸ etc/
▸ server/
▸ shared/
▸ sys/
▸ tests/
▸ travis/

and for just the stones and git structure, I have the following:

▸ $GS_HOME
▾ server/
▸ stones/
▸ <stone-name-1>/
▸ git/
▸ product/
▸ <stone-name-2>/
▸ git/
▸ product/
▸ <stone-name-3>/
▸ git/
▸ product/
▾ shared/
▸ downloads/
▸ products/
▸ repos/

where:
- $GS_HOME/shared/repos is the location for the shared git repositories
- $GS_HOME/shared/downloads/products is the location of the GEMSTONE
distibution
basically vms and templates
- $GS_HOME/server/stones/<stone-name-1> is the location of the
stone-specific files
and directories.
- $GS_HOME/server/stones/<stone-name-1>/product is a symbolic link
from the stone
to the $GS_HOME/shared/downloads/products/<gemstone-product> directory
- $GS_HOME/server/stones/<stone-name-1>/git is the standard location
for stone-specific
git clones

You can see that I make provision for "shared repos" and
"stone-specific" repos --- the stone-specific repo comes into play only
occasionally but making a provision for this possibility is useful on
the odd occasion when you are going to do protracted work on one of the
shared repositories and that work is likely to break something important
... I think of the stone-specific directories as transient and once the
work is done and integrated into the shared/repos git repo, the local
clone is destroyed...

So in your case you might have something like this:

▾ projects/
▾ <project-1>/
▸ git/
▸ project.image
▸ project.changes/
▾ <project-2>/
▸ git/
▸ project.image
▸ project.changes/
▸ repos/
▸ templates/
▸ vms/

The idea with the projects directory is that you might have a multiple
images in play at once and the <project-n> directory would give you
isolation for independent git repos as well as other artifacts that
should be kept separate ...

If it makes more sense to keep all of the image and changes files in a
common directory, then an perhaps the following would be more appropriate:

▾ projects/
▸ <project-1>.image
▸ <project-1>.changes
▸ <project-2>.image
▸ <project-2>.changes
▾ git/
▸ <project-1>/
▸<project-2>/
▸ repos/
▸ templates/
▸ vms/

or something different:) For GemStone it is necessary to to dedicate a
directory per stone, because of the volume of files/directory involved:

▸ backups/
▸ bin/
▸ extents/
▸ git/
▸ logs/
▸ product/
▸ snapshots/
▸ stats/
▸ tranlogs/

For Pharo, there might not need to be such separation on a per
image/changes combo, although I can image that applications may indeed
need additional files or directories associated in which case a
directory per image would be useful ...

Well I think this just about covers most of the things I was going to
talk about on the topic of "sharing git repos between separate active
images.

Dale