Smalltalk › Cincom › VisualWorks

Unwanted (?) new instances of StoreMethod through merge or load

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Thomas Brodt-2

Unwanted (?) new instances of StoreMethod through merge or load

I noticed recently that I had "duplicate" method versions in Store,
which had the same content, but different timestamps. Today I lookup up
why this happens:

You publish a new definition (version2) of an existing method (version1)
in an existing class. Then you have two versions of the package, two
versions of the method and two instances of MethodInPackage, one
relation for each package with the corresponding version of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2

Now you take the older version 1 of the package which still includes
method version1 and create a fork of that package: You load version 2 of
the method into the version1 of the package and publish this as version
1.1. (the same also happens by a merge in the merge tool)

I had assumed that this would result in one new package version (1.1)
and one new MethodInPackage between version 1.1 of the package and
version2 of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2

Instead I get a new version2dup of the method and a relation between
package 1.1 and the version2dup of the method
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2dup
and the additional version2dup in StoreMethod

There are two points:

1) in this scenario, I loose the information that the version 2 of the
method is the same in package version 2 as in package version 1.1 and
their descendants.
2) the linear traceability of method evolution is interrupted through an
artificial spawn
3) I get additional "versions" of methods that are just copies of
merged ones.

the third one is disturbing because it pretends many changes that aren't
ones, especially if there are several versions of applications in
development (as we have it with several deployed version that still
require changes to them).

Is this by design? If I remember correctly this wasn't the case in
earlier versions (before Store on top of Glorp?).
Am I wrong with my expectations? Do I miss any disadvantages?

Thanks for any answer.

Thomas

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight-2

Re: Unwanted (?) new instances of StoreMethod through merge or load

Store does not keep information for each individual method around in the image. All it keeps is the single pointer for each package to the database version of it. So when you load an individual method it does not know what the database version that it originally came from was. When publishing, it would need to look through the whole database, or at least through the other versions of that same package, to find any matches, which would be very expensive.

It would definitely have more capabilities along the lines you describe if it kept that information for all the sub-entities. But it's never worked that way, and there's non-trivial overhead of both space and bookkeeping to do that. I'd say that the reverse error (you made a change to a method which the system erroneously thought was the same as some other version, so it threw away your code and just saved a reference to the other) would be much worse.

On 6 December 2013 07:15, Thomas Brodt <[hidden email]> wrote:

I noticed recently that I had "duplicate" method versions in Store, which had the same content, but different timestamps. Today I lookup up why this happens:

You publish a new definition (version2) of an existing method (version1) in an existing class. Then you have two versions of the package, two versions of the method and two instances of MethodInPackage, one relation for each package with the corresponding version of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2

Now you take the older version 1 of the package which still includes method version1 and create a fork of that package: You load version 2 of the method into the version1 of the package and publish this as version 1.1. (the same also happens by a merge in the merge tool)

I had assumed that this would result in one new package version (1.1) and one new MethodInPackage between version 1.1 of the package and version2 of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2

Instead I get a new version2dup of the method and a relation between package 1.1 and the version2dup of the method
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2dup
and the additional version2dup in StoreMethod

There are two points:

1) in this scenario, I loose the information that the version 2 of the method is the same in package version 2 as in package version 1.1 and their descendants.
2) the linear traceability of method evolution is interrupted through an artificial spawn
3) I get additional "versions" of methods that are just copies of merged ones.

the third one is disturbing because it pretends many changes that aren't ones, especially if there are several versions of applications in development (as we have it with several deployed version that still require changes to them).

Is this by design? If I remember correctly this wasn't the case in earlier versions (before Store on top of Glorp?).
Am I wrong with my expectations? Do I miss any disadvantages?

Thanks for any answer.

Thomas

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Thomas Brodt-2

Re: Unwanted (?) new instances of StoreMethod through merge or load

Hi Alan

thank you for your explanatory answer. Nice to hear from you, and you still seem to know something about Smalltalk ;-).

Am 10.12.2013 23:17, schrieb Alan Knight:

Store does not keep information for each individual method around in the image. All it keeps is the single pointer for each package to the database version of it. So when you load an individual method it does not know what the database version that it originally came from was. When publishing, it would need to look through the whole database, or at least through the other versions of that same package, to find any matches, which would be very expensive.

Agreed. It would be expensive to look for any "matching" method for the current source when publishing, and guesses might also be wrong. My thoughts (wishes?) were that the merge tool would better handle these two cases different when you either choose existing method editions as resolution or otherwise new source code that was entered manually. When I choose an already published edition, the merge tool could know that it only has to create another link to an existing method record instead of also creating a new method with the same content. Regular changes to method via source code changes would of course need to cut the relation to the existing method in Store and create a new one.

In the meanwhile I digged a bit into the publishing code and saw that the XChangesets only keep information about #change, but not how the change was made, nor any link to store records. So you have no means to detect such more sophisticated possibilities. And so a #change unfortunately results in a completely new method.

It would definitely have more capabilities along the lines you describe if it kept that information for all the sub-entities. But it's never worked that way, and there's non-trivial overhead of both space and bookkeeping to do that. I'd say that the reverse error (you made a change to a method which the system erroneously thought was the same as some other version, so it threw away your code and just saved a reference to the other) would be much worse.

There is more overhead in bookkeeping, that's true. That's why I thought of the merging process only where you clearly specify what you want when you choose a resolution.

The sad result is that simultaneously keeping several versions open for development leads to more editions of code that would be necessary.
So you loose relations and oversight over "real" changes.

Thomas

On 6 December 2013 07:15, Thomas Brodt <[hidden email]> wrote:

I noticed recently that I had "duplicate" method versions in Store, which had the same content, but different timestamps. Today I lookup up why this happens:

You publish a new definition (version2) of an existing method (version1) in an existing class. Then you have two versions of the package, two versions of the method and two instances of MethodInPackage, one relation for each package with the corresponding version of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2

Now you take the older version 1 of the package which still includes method version1 and create a fork of that package: You load version 2 of the method into the version1 of the package and publish this as version 1.1. (the same also happens by a merge in the merge tool)

I had assumed that this would result in one new package version (1.1) and one new MethodInPackage between version 1.1 of the package and version2 of the method.
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2

Instead I get a new version2dup of the method and a relation between package 1.1 and the version2dup of the method
package version 1 <=> method version 1
package version 2 <=> method version 2
package version 1.1 <=> method version2dup
and the additional version2dup in StoreMethod

There are two points:

1) in this scenario, I loose the information that the version 2 of the method is the same in package version 2 as in package version 1.1 and their descendants.
2) the linear traceability of method evolution is interrupted through an artificial spawn
3) I get additional "versions" of methods that are just copies of merged ones.

the third one is disturbing because it pretends many changes that aren't ones, especially if there are several versions of applications in development (as we have it with several deployed version that still require changes to them).

Is this by design? If I remember correctly this wasn't the case in earlier versions (before Store on top of Glorp?).
Am I wrong with my expectations? Do I miss any disadvantages?

Thanks for any answer.

Thomas

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Niall Ross

Re: Unwanted (?) new instances of StoreMethod through merge or load

In reply to this post by Thomas Brodt-2

Dear Thomas,
Alan is correct that the image does not store method version
information. During a merge, the merge tool does have (the ability to
deduce) this information, so could functionally do the kind of thing you
describe. The reasons why it does not currently do so relate to (1)
history (2) the need to make a synchronous change to the replicator (3)
the need to assess performance impact, and (4) having more urgent tasks.

Functionally, if the user merges branch package version
MyPackage(1.0 + importantFix)
onto trunk version
MyPackage(1.0)
and a method clash is observed, e.g. in the trunk
MyClass>>initialize
self makeGhastlyError
versus, in the branch
MyClass>>initialize
self doSaneThing
then, at the point of merge, the merge tool knows which method versions
clashed, and also knows which version the user chose to publish to merged
MyPackage(1.1)
If the user chooses the trunk version
MyClass>>initialize
self makeGhastlyError
then no new clone is created: Store recognises nothing has changed and
simply has MyPackage(1.1) and MyPackage(1.0) both point at the record of
that method. If, however, the user chooses the branch version
MyClass>>initialize
self doSaneThing
then Store notices only that the trunk has changed and so clones a fresh
method for MyPackage(1.1) with the same source as the branch method in
MyPackage(1.0 + importantFix).

As Alan points out, it could be unsafe in general for Store to assume
that code-identical methods were in fact the same. However when using
the merge tool, the user supplies all the information in the act of
merging. Store's schema is as well able to have MyPackage(1.1) and
MyPackage(1.0 + importantFix) both point at the record of the method
MyClass>>initialize
self doSaneThing
as to have MyPackage(1.1) and MyPackage(1.0) point at the record of method
MyClass>>initialize
self makeGhastlyError
when the trunk version is chosen. When the user tells the merge tool to
resolve the conflict in favour of branch method
MyClass>>initialize
self doSaneThing
it is as clear this is what is meant as when the user tells the merge
tool to resolve the conflict in favour of
MyClass>>initialize
self makeGhastlyError
So why is this not done, I hear you ask.

1) History: the tools that drive the Store schema have never used the
schema in this way. They are coded to confine sharing of method
versions (and class definition versions and other objects pointed at by
package versions) strictly within the package inheritance tree: the
tools will share an object between a package version and its ancestors
but not between a package version and its siblings; whenever presented
with the latter, they clone.

I have a decades-long familiarity with this class of problems:
2-dimensional containment and connection graphs, and the issue of
whether and when to constrain them for ease of modelling. Were I in the
place of the original Store tool creators in the early-mid-90s, I would
not have made their choice, but I'm familiar with that choice being made.

2) Synchronous replicator change: the replicator is a more recent Store
tool. because of (1), it does not expect to encounter method versions
shared between siblings. When first I did experiments on this, I
promptly discovered that the base replicator simply recreates the clones
in a target database, so needed change too. Thus anyone who maintains
mirror Store databases (as we do), would need to upgrade their
replicator as well as their merge tool before seeing benefit (and we
might need to review whether a temporary intermediate state could risk
any hiccoughs).

3) Performance: having sharing confined within a package's ancestor
tree loses information supplied at merge time, but also permits simple
coding of some queries. Could the complexities of having a method's
inheritance not be within its package's inheritance cause performance
problems in the revised versions of these queries? My very slight and
partial experiments a while back suggested not, but I must repeat "my
very _slight_ and _partial_ experiments". This would need review.

4) Store has other and important tasks for 8.0. This is not on any
plan, and should not be presumed to be intended to be so for any given
future time.

In summary, I understand where you are coming from on this. I've had
similar thoughts and done experiments. The above explains why I'm not
suggesting this idea to the Store team save as a long term 'maybe think
about'.

HTH
Niall Ross

> I noticed recently that I had "duplicate" method versions in Store,
> which had the same content, but different timestamps. Today I lookup
> up why this happens:
>
> You publish a new definition (version2) of an existing method
> (version1) in an existing class. Then you have two versions of the
> package, two versions of the method and two instances of
> MethodInPackage, one relation for each package with the corresponding
> version of the method.
> package version 1 <=> method version 1
> package version 2 <=> method version 2
>
> Now you take the older version 1 of the package which still includes
> method version1 and create a fork of that package: You load version 2
> of the method into the version1 of the package and publish this as
> version 1.1. (the same also happens by a merge in the merge tool)
>
> I had assumed that this would result in one new package version (1.1)
> and one new MethodInPackage between version 1.1 of the package and
> version2 of the method.
> package version 1 <=> method version 1
> package version 2 <=> method version 2
> package version 1.1 <=> method version2
>
> Instead I get a new version2dup of the method and a relation between
> package 1.1 and the version2dup of the method
> package version 1 <=> method version 1
> package version 2 <=> method version 2
> package version 1.1 <=> method version2dup
> and the additional version2dup in StoreMethod
>
> There are two points:
>
> 1) in this scenario, I loose the information that the version 2 of the
> method is the same in package version 2 as in package version 1.1 and
> their descendants.
> 2) the linear traceability of method evolution is interrupted through
> an artificial spawn
> 3) I get additional "versions" of methods that are just copies of
> merged ones.
>
> the third one is disturbing because it pretends many changes that
> aren't ones, especially if there are several versions of applications
> in development (as we have it with several deployed version that still
> require changes to them).
>
> Is this by design? If I remember correctly this wasn't the case in
> earlier versions (before Store on top of Glorp?).
> Am I wrong with my expectations? Do I miss any disadvantages?
>
> Thanks for any answer.
>
> Thomas
>
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc