sustainable Monticello

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sustainable Monticello

Chris Muller-3
This will probably be a long post, but I would like to tell you about
the Monticello upgrades I'm about to move to the trunk.

Monticello has several repository types:

                MCRepository #('creationTemplate' 'storeDiffs')
                        MCDictionaryRepository #('description' 'dict')
                        MCFileBasedRepository #('cache' 'allFileNames')
                                MCDirectoryRepository #('directory')
                                        MCCacheRepository #('packageCaches' 'seenFiles')
                                        MCSubDirectoryRepository #()
                                MCFtpRepository #('host' 'directory' 'user' 'password' 'connection')
                                MCHttpRepository #('location' 'user' 'password' 'readerCache')
                                MCSMCacheRepository #('smCache')
                        MCGOODSRepository #('hostname' 'port' 'connection')
                        MCWriteOnlyRepository #()
                                MCSMReleaseRepository #('packageName' 'user' 'password')
                                MCSmtpRepository #('email')

but MCFileBasedRepository is the one that has been given all of the
focus, the other repository types have been ignored over the years.
MCHttpRepository is the one that interfaces with SqueakSource, and
MCDirectoryRepository are pretty much the only types being used.

I know this because external users of MCRepository API, like the
Repository-browser tools and MC-Configurations and Installer; these
are all using API's that are specific to MCFileBasedRepository - not
generally understood by the other repository-types or the abstract API
in MCRepository.

This is worthy of concern because of the access-limitations of a
MCFileBasedRepository.  Unlike a MCGOODSRepository, for example, a
file-system-based repository cannot efficiently meet the demands of
being a MCRepository without, at some points, needing to enumerating
ALL version names (files) in its file-system location.

As the number of versions in a repository reaches 1-million and
beyond, performance will grind to a halt due to the number of files
that must be constantly downloaded into RAM (another area of
unscalability and unsustainability related to FileBased Repository's).
 A purging of old versions could be done, but a philosophy of
Monticello, from the outset, has been that repository's are intended
to contain "all" of version history.

I have therefore reworked the MCRepository API's and external tools to
talk using only an API that is understood by any repository that
implements the methods identified as #subclassResponsibility in
MCRepository.  This minimally-required API is now:

  #allPackageNames - answer a list of package names in this repository.
  #basicStoreVersion: - add a Version to this repository.
  #includesVersionNamed: - does a version with this name exist in this
repository?
  #versionNamed: - answer the first Version object with the given name.
  #versionNamesForPackageNamed: - answer the version names for the
given package name.
  #versionWithInfo:ifAbsent: - answer the Version object with the
given unique VersionInfo

In deference to the limitations of FileBasedRepository's, we only ask
for the _names_ of things rather than the whole object, because the
names are all that is needed to satisfy tool requirements, except in
cases where we need a single Version object (like loading).  FileBased
cannot access the Version objects quickly, just the (file)names (incl.
author & version-number).

During the process of this refactoring, I was able to signficantly
improve the coherence of the code.  It was really, really bad in some
areas.

I've also verified the viability of this API by updating
MCMagmaRepository, and demonstrating using Magma as a
totally-sustainable and scalable MC repository.  Employing a
Magma-based Repository also affords some additional benefits, which I
will describe in a separate follow-up mail.

I think SqueakSource will eventually have to change to something more
scalable.  At least now we have have a viable alternative, and with
much cleaner MC code in the process.

Please load my latest versions of Monticello,
MonticelloConfigurations, Installer and Tests from the Inbox and let
me know if you experience any issues.  You should not see any
difference in day-to-day operations.

 - Chris

Reply | Threaded
Open this post in threaded view
|

Re: sustainable Monticello

Colin Putney-3
On Tue, Mar 8, 2011 at 1:33 PM, Chris Muller <[hidden email]> wrote:
> This will probably be a long post, but I would like to tell you about
> the Monticello upgrades I'm about to move to the trunk.

[snip details]

> I think SqueakSource will eventually have to change to something more
> scalable.  At least now we have have a viable alternative, and with
> much cleaner MC code in the process.
>
> Please load my latest versions of Monticello,
> MonticelloConfigurations, Installer and Tests from the Inbox and let
> me know if you experience any issues.  You should not see any
> difference in day-to-day operations.

Hi Chris,

Thanks so much for doing this work. You've fixed a pretty painful
design flaw that we've been working around for years. Originally, the
repository protocol was very, very simple. It was based entirely on
VersionInfo and we expected that repositories would use the UUIDs they
contain to store versions. (You'd basically just need #storeVersion
and #versionWithInfo:ifAbsent:). We then found that naming mcz files
with the version name instead of the UUID made it very easy manage
repositories using the OS, and so all the cruft involving version
names and custom RepositoryInspectors grew up to make that viable.

I've loaded your work from the Inbox and find that it works well - all
tests are green and poking around with repository inspectors, loading,
saving etc didn't cause any problems. Nice work!

Colin