Monticello's Package Loading Strategy (Bug)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Monticello's Package Loading Strategy (Bug)

Eliot Miranda-2
Hi All, Hi Bert, Hi Colin,

    today I was working with Ron Teitelbaum on porting the Terf client to Squeak 6.x.  The bootstrap builder creates some base MCConfigurations and then updates them from the various repositories, hence downloading the latest versions of certain packages into the package cache.

Some of the downloaded packages are for Tweak, and Tweak classes include field definitions which require special Monticello support to manage.  The special handling is provided by the TweakMC package.  Here are some of the relevant definitions:

MCStWriter>>visitTweakFieldDefinition: definition
self writeTweakFieldDefinition: definition


MCStWriter>>writeTweakFieldDefinition: definition
self chunkContents: [:s | definition printDefinitionOn: stream]

which produces output like the following in the snapshot.st file:

CCostumeGrid defineFields: '<?xml  version="1.0" ?>
<fields>
        <field toGet="origin" toSet="origin:" changeEvent="originChanged">origin</field>
        <field toGet="extent" toSet="extent:" changeEvent="extentChanged">extent</field>
        <field toGet="enabled" toSet="enabled:" changeEvent="enabledChanged">enabled</field>
        <field toGet="visible" toSet="visible:" changeEvent="visibleChanged">visible</field>
        <field toGet="color" toSet="color:" changeEvent="colorChanged">color</field>
</fields>'!

Note that no special support is needed for producing the snapshot.bin because that can save arbitrary objects.

So, (I think) because we were loading packages saved by Squeak V3 into a Squeak Spur image the binary isn't loaded and instead the source is compiled. (Am I right?)  If the update happens *before* TweakMC has been loaded then the packages that get installed into package-cache are missing all the field definitions because the support to produce them is missing.

This was exactly our situation; we're freshly bootstrapping a Terf image, which includes adding support for Tweak, and TweakMC is loaded by the bootstrap /after/ the configurations have been updated, and hence the Tweak packages in the package-cache are corrupted.

We fixed our issue by modifying the bootstrap to load TweakMC up front.  But IMO *this shouldn't be necessary*.

Why, when Monticello downloads a package from a repository, does it rewrite that package to install it in the package-cache directory?  Surely it has *no business* rewriting the contents of the package and could simply copy the file itself.  If it does the rewrite, as it is doing now, it opens up the transcription to all sorts of errors, the above being one.  Not only is the approach slow, and error prone, it is unnecessary.  The file could be copied quite simply.

Is this easy to fix?  I think it's a really bad design flaw.

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Bert Freudenberg
On Wed, Jun 28, 2017 at 12:19 AM, Eliot Miranda <[hidden email]> wrote:
Hi All, Hi Bert, Hi Colin,

    today I was working with Ron Teitelbaum on porting the Terf client to Squeak 6.x.  The bootstrap builder creates some base MCConfigurations and then updates them from the various repositories, hence downloading the latest versions of certain packages into the package cache.

Some of the downloaded packages are for Tweak, and Tweak classes include field definitions which require special Monticello support to manage.  The special handling is provided by the TweakMC package.  Here are some of the relevant definitions:

MCStWriter>>visitTweakFieldDefinition: definition
self writeTweakFieldDefinition: definition


MCStWriter>>writeTweakFieldDefinition: definition
self chunkContents: [:s | definition printDefinitionOn: stream]

which produces output like the following in the snapshot.st file:

CCostumeGrid defineFields: '<?xml  version="1.0" ?>
<fields>
        <field toGet="origin" toSet="origin:" changeEvent="originChanged">origin</field>
        <field toGet="extent" toSet="extent:" changeEvent="extentChanged">extent</field>
        <field toGet="enabled" toSet="enabled:" changeEvent="enabledChanged">enabled</field>
        <field toGet="visible" toSet="visible:" changeEvent="visibleChanged">visible</field>
        <field toGet="color" toSet="color:" changeEvent="colorChanged">color</field>
</fields>'!

Note that no special support is needed for producing the snapshot.bin because that can save arbitrary objects.

So, (I think) because we were loading packages saved by Squeak V3 into a Squeak Spur image the binary isn't loaded and instead the source is compiled. (Am I right?)

​It's not an ImageSegmen​t, so that should not be a problem. Missing class definitions, however, would be a problem. This might be the case with MCFieldDefs (or whatever they are called, don't
​remember).

 
 If the update happens *before* TweakMC has been loaded then the packages that get installed into package-cache are missing all the field definitions because the support to produce them is missing.

This was exactly our situation; we're freshly bootstrapping a Terf image, which includes adding support for Tweak, and TweakMC is loaded by the bootstrap /after/ the configurations have been updated, and hence the Tweak packages in the package-cache are corrupted.

We fixed our issue by modifying the bootstrap to load TweakMC up front.  But IMO *this shouldn't be necessary*.

Why, when Monticello downloads a package from a repository, does it rewrite that package to install it in the package-cache directory? 

​Because MC has no concept of "files". Each repository can have a different representation for how it stores a version (e.g. MCDictionaryRepository is literally just a dictionary of MCVersions). That's why an MCVersion is created by an MCReader when loading from a repo, and writing to a repo invokes that repo's MCWriter.

Surely it has *no business* rewriting the contents of the package and could simply copy the file itself.  If it does the rewrite, as it is doing now, it opens up the transcription to all sorts of errors, the above being one.  Not only is the approach slow, and error prone, it is unnecessary.  The file could be copied quite simply.

​Only if both repos are MCFileBasedRepositories. ​

Is this easy to fix?  I think it's a really bad design flaw.

​I don't think it's simple to fix. A version is not a file conceptually, so anything we hack to make it look like it is would be just that, a hack. The Right Way is to load the extension for MC itself first, and then use the mechanics that are there.

That hack could be added to MCVersion>>addToCache along the lines of

addToCache
(thisContext findContextSuchThat: [:c | c receiver isKindOf: MCReader]) ifNotNil:
[:ctxt | | contents |
contents := (ctxt receiver instVarNamed: 'stream') contentsOfEntireFile.
self halt].
MCCacheRepository default storeVersion: self

... and then store 'contents' in the cache directly. The hack could be cleaned up by using a notification to get at the stream by more official means. However, I don't think that's a good idea, as it goes against the design as explained above. Other opinions/ideas very welcome :)

- Bert -


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Colin Putney-3


On Wed, Jun 28, 2017 at 4:15 AM, Bert Freudenberg <[hidden email]> wrote:
 
​Because MC has no concept of "files". Each repository can have a different representation for how it stores a version (e.g. MCDictionaryRepository is literally just a dictionary of MCVersions). That's why an MCVersion is created by an MCReader when loading from a repo, and writing to a repo invokes that repo's MCWriter.

Agreed. How (and whether) MCVersions are serialized is an implementation detail of the repository implementation. Early on we expected more diversity of repository implementations - Magma, Gemstone, peer images etc. There was even a transactional store that got junked shortly before Avi started showing people what we were working on. In practice we seem to have settled on http and file repositories, so there does seem to be an opportunity for optimization. However, even if we make that optimization, extensions like TweakMC shouldn't rely on it to be able to function correctly.

Colin

 


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Chris Muller-3
It's a good abstraction.  MCMagmaRepository took advantage of it since
early on, and currently used by source.squeak.org for the "browse
revisions" and "browse origin" functions of the IDE.  This wiki page
describes how to extend it to include one's own proprietary code, too.

  http://wiki.squeak.org/squeak/6365

On Wed, Jun 28, 2017 at 11:53 AM, Colin Putney <[hidden email]> wrote:

>
>
> On Wed, Jun 28, 2017 at 4:15 AM, Bert Freudenberg <[hidden email]>
> wrote:
>
>>
>> Because MC has no concept of "files". Each repository can have a different
>> representation for how it stores a version (e.g. MCDictionaryRepository is
>> literally just a dictionary of MCVersions). That's why an MCVersion is
>> created by an MCReader when loading from a repo, and writing to a repo
>> invokes that repo's MCWriter.
>
>
> Agreed. How (and whether) MCVersions are serialized is an implementation
> detail of the repository implementation. Early on we expected more diversity
> of repository implementations - Magma, Gemstone, peer images etc. There was
> even a transactional store that got junked shortly before Avi started
> showing people what we were working on. In practice we seem to have settled
> on http and file repositories, so there does seem to be an opportunity for
> optimization. However, even if we make that optimization, extensions like
> TweakMC shouldn't rely on it to be able to function correctly.
>
> Colin
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Bert Freudenberg
In reply to this post by Colin Putney-3


On Wed, Jun 28, 2017 at 6:53 PM, Colin Putney <[hidden email]> wrote:


On Wed, Jun 28, 2017 at 4:15 AM, Bert Freudenberg <[hidden email]> wrote:
 
​Because MC has no concept of "files". Each repository can have a different representation for how it stores a version (e.g. MCDictionaryRepository is literally just a dictionary of MCVersions). That's why an MCVersion is created by an MCReader when loading from a repo, and writing to a repo invokes that repo's MCWriter.

Agreed. How (and whether) MCVersions are serialized is an implementation detail of the repository implementation. Early on we expected more diversity of repository implementations - Magma, Gemstone, peer images etc. There was even a transactional store that got junked shortly before Avi started showing people what we were working on. In practice we seem to have settled on http and file repositories, so there does seem to be an opportunity for optimization. However, even if we make that optimization, extensions like TweakMC shouldn't rely on it to be able to function correctly.

​Colin,

can you imagine how to fit "unknown" definitions into the design? That is, definitions that are not "understood" by this version of MC, but would still be preserved somehow, and stored with a new version. 

- Bert -​




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Colin Putney-3


On Wed, Jun 28, 2017 at 2:35 PM, Bert Freudenberg <[hidden email]> wrote:

can you imagine how to fit "unknown" definitions into the design? That is, definitions that are not "understood" by this version of MC, but would still be preserved somehow, and stored with a new version. 

Good idea. 

One thing we could do is make serialization into a more first-class concept, orthogonal to repositories. Then each serialization scheme would be responsible for creating "unknown" definitions if it reads something it doesn't recognize. Loading etc would ignore them, but the serializer would handle them correctly when reserializing. Some serializations might even be able to embed "foreign" unknown definitions - a binary serializer like Fuel or S&M could handle an MCUnknownChunkDefinition just fine, for example. But if not, we'd just drop the unknown definition just as we do now.

That would make the optimization you talked about before more legit—we can just copy bytes iff we know we're using the same serialization. 

Another nice side effect of that would be to make repositories more flexible. We could use Content-Type headers, file extensions etc to figure out the right serialization scheme, instead of coupling file format to transport/storage mechanisms like we do now. 

Does that make sense?

Colin


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Monticello's Package Loading Strategy (Bug)

Eliot Miranda-2
Hi Colin,

On Jun 28, 2017, at 4:46 PM, Colin Putney <[hidden email]> wrote:



On Wed, Jun 28, 2017 at 2:35 PM, Bert Freudenberg <[hidden email]> wrote:

can you imagine how to fit "unknown" definitions into the design? That is, definitions that are not "understood" by this version of MC, but would still be preserved somehow, and stored with a new version. 

Good idea. 

One thing we could do is make serialization into a more first-class concept, orthogonal to repositories. Then each serialization scheme would be responsible for creating "unknown" definitions if it reads something it doesn't recognize. Loading etc would ignore them, but the serializer would handle them correctly when reserializing. Some serializations might even be able to embed "foreign" unknown definitions - a binary serializer like Fuel or S&M could handle an MCUnknownChunkDefinition just fine, for example. But if not, we'd just drop the unknown definition just as we do now.

That would make the optimization you talked about before more legit—we can just copy bytes iff we know we're using the same serialization. 

Another nice side effect of that would be to make repositories more flexible. We could use Content-Type headers, file extensions etc to figure out the right serialization scheme, instead of coupling file format to transport/storage mechanisms like we do now. 

Does that make sense?

Yes and it's very important.  This fixes a very nasty bug.  Anyone using Monticello to view Tweak packages will end up silently corrupted by those packages unless they happen to have the TweakMC package loaded.


Colin



Loading...