Another bug: ZipArchive does not understand #'readFrom:'

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Another bug: ZipArchive does not understand #'readFrom:'

Pieter Nagel-3
While working on a script to reproduce the bug I reported earlier, I came
across a different variant that apparently reproduces another bug we've
often stumbled over:

Just run the following against an extent0.seaside.dbf, it needs a
directory called /tmp/somewhere to exist.

While loading Core31x-dk5.mcz, you will get an exception that ZipArchive
does not understand #'readFrom:'

        Gofer new
                package: 'GsUpgrader-Core';
                url: 'http://ss3.gemtalksystems.com/ss/gsUpgrader';
                load.
        (Smalltalk at: #GsUpgrader) upgradeGrease.
        (Smalltalk at: #GsDeployer)
                deploy: [
                        Metacello new
                                configuration: 'Magritte3';
                                version: #'release3.1';
                                repository: 'http://www.smalltalkhub.com/mc/Magritte/Magritte3/main';
                                onConflict: [ :ex :existingRegistration :newRegistration |
                                                        Transcript
                                                                show:
                                                                        'Conflict between existing: ' , existingRegistration className ,
' ' , existingRegistration printString
                                                                                , Character cr asString , '  and new: ' , newRegistration
className , ' ' , newRegistration printString
                                                                                , Character cr asString , '  resolved in favor of existing'.
                                                        ex disallow ];
                                cacheRepository: '/tmp/somewhere';
                                load: 'Magritte-Seaside' ]


At the time of the exception, the class ZipArchive has been deleted (and,
based on the DNU, had all its methods removed). ZipArchive is part of
Core-Squeak, so it seems that the problem is that as part of loading a new
Core package the machinery needed to load itself is removed.


--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Dale Henrichs-3
Pieter,

Interesting ... this is occuring because I move the class Zip-archive
into the package GemStone-Compression from the package Core. which means
that the when the new version of Core is loaded, the ZipArchive class is
removed from the system and anything that uses ZipArchive (like an mcz
package load) will fail until the GemStone-Compression (after the Core
package)...so this is basically a GLASS/GsDevKit packaging error.

The reason that I hadn't seen it before and why `(Smalltalk at:
#GsUpgrader) upgradeGLASS1` works is that FileTree repos don't use mcz
packages and all my tests do loads from FileTree repos ...

The mcz loading gets introduced when you use the `cacheRepository:`.

This bug[1] will be a bit difficult to fix, because I'm not sure how
much of the Core package the ZipArchive code is dependent upon (the
GemStone-Compression package will have to be loaded independently before
the Core package).

The repackaging was done because I implemented gzip compression in
support of Zinc and it was not necessary to backport gzip to 2.4.x...

Is doing `(Smalltalk at: #GsUpgrader) upgradeGLASS1` an acceptable
workaround until I can get around to fixing the bug?

Dale

[1] https://github.com/GsDevKit/GsDevKit/issues/56

On 01/28/2015 05:01 AM, Pieter Nagel wrote:

> While working on a script to reproduce the bug I reported earlier, I came
> across a different variant that apparently reproduces another bug we've
> often stumbled over:
>
> Just run the following against an extent0.seaside.dbf, it needs a
> directory called /tmp/somewhere to exist.
>
> While loading Core31x-dk5.mcz, you will get an exception that ZipArchive
> does not understand #'readFrom:'
>
> Gofer new
> package: 'GsUpgrader-Core';
> url: 'http://ss3.gemtalksystems.com/ss/gsUpgrader';
> load.
> (Smalltalk at: #GsUpgrader) upgradeGrease.
> (Smalltalk at: #GsDeployer)
> deploy: [
> Metacello new
> configuration: 'Magritte3';
> version: #'release3.1';
> repository: 'http://www.smalltalkhub.com/mc/Magritte/Magritte3/main';
> onConflict: [ :ex :existingRegistration :newRegistration |
> Transcript
> show:
> 'Conflict between existing: ' , existingRegistration className ,
> ' ' , existingRegistration printString
> , Character cr asString , '  and new: ' , newRegistration
> className , ' ' , newRegistration printString
> , Character cr asString , '  resolved in favor of existing'.
> ex disallow ];
> cacheRepository: '/tmp/somewhere';
> load: 'Magritte-Seaside' ]
>
>
> At the time of the exception, the class ZipArchive has been deleted (and,
> based on the DNU, had all its methods removed). ZipArchive is part of
> Core-Squeak, so it seems that the problem is that as part of loading a new
> Core package the machinery needed to load itself is removed.
>
>

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Pieter Nagel-3
Hi Dale,

> Is doing `(Smalltalk at: #GsUpgrader) upgradeGLASS1` an acceptable
> workaround until I can get around to fixing the bug?

and (from the other thread)

> To be honest, the only thing that I can safely do in this case, is throw
> an error that there are multi-byte characters in the source package and
> refuse to write out the .mcz file.

Both your suggestions boil down to not having cached mcz's for some subset
of the load.

We could live with and hack around your suggestions for a while. But in
this day and age of Continuous Integration, a package manager/dependency
management tool that cannot in some or other way work from local always-on
mirror repositories is a severely limited tool. That would negatively
impact our workflow, and let me explain why.

Firstly, our load scripts operate in two modes: the first where we want
Metacello to go out over the network and collect new mczs to cache as a
side effect of doing the load. We keep and look after these mczs. This we
do only at controlled times, when we deliberately choose to change or
update our dependencies. At all other times we want it to NOT go out to
real repositories at all, and just stick to loading from our cached mczs.
If we could get round to firewalling Metacello from the network for extra
confidence on this, we would.

The reason we do this is because we've lots of bad experience with, say,
new packages being published inbetween us deploying our test servers, and
when we deploy production. We're really anal about wanting to deploy the
exact same tested code and dependencies as what we tested, and not getting
surprises.

Another reason is that repositories go down at random times, we've seen
Smalltalkhub go dead many times, and with the proliferation of alternative
repositories these days those risks are even worse (the Metacello
community seems not yet to have rallied around a single repository with
mirrors like, say, the debian/ubuntu or Python PyPI world has).

Our deployments are automated. We don't want to have deployments fail at
midnight because some random repository happened to be unreachable at that
moment.

Yet another reason we do this is because we're heavily invested in
Continuous Integration. Every time a team member pushed to the central Git
repository, that launches a complete re-build from extent0.seaside.dbf of
our system, against which our tests are run. We don't want those tests to

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Dale Henrichs-3
Pieter,

The first bug is a consequence of a package dependency error ... not a
"failure of the package management system":)

The second bug is a consequence of a fundamental limitation the .mcz
package format, again, not a "failure of the package management
system":) .mcz files were not designed to handle multi-byte characters.

The proper way to mirror repositories in Metacello is to clone the git
repository for each of the projects that you use. The `Metacello lock`
tells Metacello to use your local repository copies. All builds can then
be done while  disconnected from the network and you use git to manage
control exactly what code is to be included in your builds.

I know that not all projects are managed under git, but my long term
plan is to move all GLASS/GsDevKit projects to git to avoid the problems
and issues that are a consequence of using mcz-based repositories. You
can create local git repos for those projects that aren't using git -
it's not that difficult to do ...

If you prefer to or "have to" continue to use the overrideRepositories:
feature of Metacello, then you need to make sure that all of the mcz
files that you are using have no multi-byte characters in them and
you'll be fine.

Dale

On 1/30/15 3:15 AM, Pieter Nagel wrote:

> Hi Dale,
>
>> Is doing `(Smalltalk at: #GsUpgrader) upgradeGLASS1` an acceptable
>> workaround until I can get around to fixing the bug?
> and (from the other thread)
>
>> To be honest, the only thing that I can safely do in this case, is throw
>> an error that there are multi-byte characters in the source package and
>> refuse to write out the .mcz file.
> Both your suggestions boil down to not having cached mcz's for some subset
> of the load.
>
> We could live with and hack around your suggestions for a while. But in
> this day and age of Continuous Integration, a package manager/dependency
> management tool that cannot in some or other way work from local always-on
> mirror repositories is a severely limited tool. That would negatively
> impact our workflow, and let me explain why.
>
> Firstly, our load scripts operate in two modes: the first where we want
> Metacello to go out over the network and collect new mczs to cache as a
> side effect of doing the load. We keep and look after these mczs. This we
> do only at controlled times, when we deliberately choose to change or
> update our dependencies. At all other times we want it to NOT go out to
> real repositories at all, and just stick to loading from our cached mczs.
> If we could get round to firewalling Metacello from the network for extra
> confidence on this, we would.
>
> The reason we do this is because we've lots of bad experience with, say,
> new packages being published inbetween us deploying our test servers, and
> when we deploy production. We're really anal about wanting to deploy the
> exact same tested code and dependencies as what we tested, and not getting
> surprises.
>
> Another reason is that repositories go down at random times, we've seen
> Smalltalkhub go dead many times, and with the proliferation of alternative
> repositories these days those risks are even worse (the Metacello
> community seems not yet to have rallied around a single repository with
> mirrors like, say, the debian/ubuntu or Python PyPI world has).
>
> Our deployments are automated. We don't want to have deployments fail at
> midnight because some random repository happened to be unreachable at that
> moment.
>
> Yet another reason we do this is because we're heavily invested in
> Continuous Integration. Every time a team member pushed to the central Git
> repository, that launches a complete re-build from extent0.seaside.dbf of
> our system, against which our tests are run. We don't want those tests to
>

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Pieter Nagel-3
> The proper way to mirror repositories in Metacello is to clone the git
> repository for each of the projects that you use.

Ok, but what is the correct way of assembling a coherent set of such
cloned git repositories, in a way that ensures that each is at the exact
correct commit that satisfies the others' dependencies?

Do I need to manually read the code of each BaselineOf... that you depend
on, scan it for git repositories, clone that repository, and then manually
recurse through the second-order dependencies of whatever BaselineOfs that
refers to?

Isn't that kind of algorithmic recursion precisely what the tool should do
for you?

Or is there some sort of "automatically make local clones of repositories"
directive to Metacello that I've missed?

And suppose I've now got a working set of locally cloned git projects
together, and suppose I've cloned commit X of Seaside3.1 in my local
repository. And now in future I want to upgrade to some newer Magritte, so
I go pull its commit from github, but I don't realise that Magritt only
works with commit X+100 of Seaside 3.1.

But since I've locked my local Metacello to the Seaside repository that is
stuck at X, won't Metacello then happily go loading my (wrong) commit X,
without giving any feedback - because that is what "lock" means? Lock
means "no matter what that project over there says, use mine instead". But
how am I supposed to know when mine is wrong?

So do I now need to manually go read all the repositories again just to
figure out which commits I should update each of my locally cloned
repositories to?

I've moved to the "just let Metacello fetch whatever versions it wants,
and cache whatever it fetches" model precisely because I thought that
would solve maintenance nightnmares like the previous paragraphs. I
thought letting Metacello cache mcz's would absolve us from any
complexities other than just declaring my own direct dependencies
correctly, and then Metacello would recurse through them for me.

Is there something glaring I'm missing about how all this is supposed to
work?

This all would all work just a heck of a lot better if each project just
declared its dependencies using semantic versioning. And if, instead of
all kinds of locks and overrides that give you the ability to say "use
this Seaside even if other guys ask for another one", Metacello would just
error out and say "I'm sorry, I can't load the set of packages as you
asked for, because X asked for Seaside >=3.1 < 3.2, but Y over there
depends on Seaside >=3.0 < 3.1, and that is mutually inconsistent".  Which
is, frankly, how everything I've ever used, from apt to rubygems to PIP to
RPMs have worked.

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Dale Henrichs-3

On 01/30/2015 10:39 AM, Pieter Nagel wrote:
>> The proper way to mirror repositories in Metacello is to clone the git
>> repository for each of the projects that you use.
> Ok, but what is the correct way of assembling a coherent set of such
> cloned git repositories, in a way that ensures that each is at the exact
> correct commit that satisfies the others' dependencies?
This is a good question and I am glad that you asked it.  The simplest
answer is to do the load without any local repositories or locks. Then
clone each of the git repositories to your local disk and lock each of
the repositories.

Do your testing and if you are lucky, all of your tests will pass and
you will be set.

If you are unlucky you will have to make local changes to one or more of
the packages in one or more of the projects. But when you are done
making your mods, you have a local copy of exactly what you need to
build and deploy your project. You can choose to feed back any changes
that you have to the original project.

Moving forward, you can pull down and merge new versions of the projects
as necessary.
>
> Do I need to manually read the code of each BaselineOf... that you depend
> on, scan it for git repositories, clone that repository, and then manually
> recurse through the second-order dependencies of whatever BaselineOfs that
> refers to?
No.
>
> Isn't that kind of algorithmic recursion precisely what the tool should do
> for you?
Yes.
>
> Or is there some sort of "automatically make local clones of repositories"
> directive to Metacello that I've missed?
In tODE I am building up a fairly rich set of git-aware tools. At the
moment there is a menu item on the tODE project list that you can use to
clone the currently loaded "version" of a git repository, which comes
pretty close to automatic, given you've followed the approach that I've
outlined above ... Metacello records the details about the commit (SHA
and branch) that is loaded into the image, making this sort of thing
possible.

>
> And suppose I've now got a working set of locally cloned git projects
> together, and suppose I've cloned commit X of Seaside3.1 in my local
> repository. And now in future I want to upgrade to some newer Magritte, so
> I go pull its commit from github, but I don't realise that Magritt only
> works with commit X+100 of Seaside 3.1.
Yes this is a classic problem. If you upgrade without thinking and
paying attention you can certainly expect trouble: what if your
application will no longer work with commit X+100 of Seaside3.1.

>
> But since I've locked my local Metacello to the Seaside repository that is
> stuck at X, won't Metacello then happily go loading my (wrong) commit X,
> without giving any feedback - because that is what "lock" means? Lock
> means "no matter what that project over there says, use mine instead". But
> how am I supposed to know when mine is wrong?
During a load, Metacello allows you to selectively 'break' locks and
this feature combined with an appropriately coded onConflict: block
should allow you to get a pretty clear picture of just what is happening
(I think Johan Brichau does something along these lines) ... If there
are particular scenarios that you run into that cannot be analysed with
this technique, I would be very interested in accommodating your needs.
>
> So do I now need to manually go read all the repositories again just to
> figure out which commits I should update each of my locally cloned
> repositories to?
No.
>
> I've moved to the "just let Metacello fetch whatever versions it wants,
> and cache whatever it fetches" model precisely because I thought that
> would solve maintenance nightnmares like the previous paragraphs. I
> thought letting Metacello cache mcz's would absolve us from any
> complexities other than just declaring my own direct dependencies
> correctly, and then Metacello would recurse through them for me.
As I said in my original reply, if you want to use mcz's and
overrideRepository:, then continue to do so. There are disadvantages to
this approach as well, but if it works for you, then I don't see any
reason that you should change.

As I also said, the two bugs that you reported have nothing to with
Metacello. Once was a mistake of mispackaging some classes that was not
uncovered by testing. The other case is due to the fundamental problem
that mcz files cannot handle multi-byte characters correctly.
Filetree-based repositories use UTF8 to encode the source and commits,
so multi-byte characters are handled in a consistent way. The bug that
you encountered came about because I did not anticipate the consequences
of copying a package from Filetree with multi-byte characters into an
mcz file and the only proper response to that is to signal an error ...
the source of an mcz file cannot be automatically altered ... a specific
version (UUID) of an mcz file should be treated as immutable ... if
characters are to be altered in the file, then a new version of the mcz
file must be created.
>
> Is there something glaring I'm missing about how all this is supposed to
> work?
Well, yes. Hopefully I've clarified some of the things.
>
> This all would all work just a heck of a lot better if each project just
> declared its dependencies using semantic versioning. And if, instead of
> all kinds of locks and overrides that give you the ability to say "use
> this Seaside even if other guys ask for another one", Metacello would just
> error out and say "I'm sorry, I can't load the set of packages as you
> asked for, because X asked for Seaside >=3.1 < 3.2, but Y over there
> depends on Seaside >=3.0 < 3.1, and that is mutually inconsistent".
If you want to do this Metacello has an onUpgrade: and onDowngrade:
block that should allow you to do just what you are asking for.

But haven't we wandered away from the "hey I don't want to hit the web
every time I load" into "what I want to have happen when I hit the web"?


Frankly, I think that Metacello can actually be used to do everything
that you want .... AND I am willing to work with you to a) educate you
on what Metacello's capabilities are; and to  b) try to meet any needs
that you have that cannot be addressed by the current feature set.

I know that your current build environment is based on the
overrideRepositories: approach and as I have said multiple times, if you
need to stick with overrideRepositories: because it's too much work to
change your build process, I understand. BUT if you are willing consider
changing your build process to leverage some of the features of
Metacello when you started, then I'm willing to work with you to make
sure that Metacello is serving your needs ....

Dale

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Pieter Nagel-3
Apologies for my exasperated tone in the previous email.

To be fair, at lot of my frustration which might have sounded like
frustration with Metacello, the tool, is actually frustration with the
culture and style of dependency management that has grown in the
communnity around Metacello.

Specifically, the fact that dependencies on single versions instead of
dependencies on ranges of versions, a la Semantic Versioning or something
similar.  But that is a topic for another day.

> This is a good question and I am glad that you asked it.  The simplest
> answer is to do the load without any local repositories or locks. Then
> clone each of the git repositories to your local disk and lock each of
> the repositories.

So if I want to programatically collect all the repositories that were
used in the first phase so I can go clone them, will onUpgrade: and
onDowngrade: be notified of all loads - does the very first load count as
an "upgrade"?.

> Yes this is a classic problem. If you upgrade without thinking and
> paying attention you can certainly expect trouble: what if your
> application will no longer work with commit X+100 of Seaside3.1.

You miss my point. The scenario I asked about is where my application
works with any commit since X. It is the new version of, say, Magritte,
that depends on commit X+100. How would I find out?

By analogy, if I were do create say, an .deb package for my application,
and I were to declare contradictory dependencies which via indirect
dependencies attempted to depend on two different versions of some lower
level library, APT would give me an error when I tried to load the result.

What is the equivalent mechanism in the Metacello world that tells me
there is an error in my dependencies?

> As I said in my original reply, if you want to use mcz's and
> overrideRepository:, then continue to do so. There are disadvantages to
> this approach as well, but if it works for you, then I don't see any
> reason that you should change.

It's not that I *want* to use mcz's. I cannot use github repositories for
everything, since not all of our dependencies are on github. A lot of them
are still old-style mcz-based ConfigurationOfs. So I need something that
works for both.

All I want is just for all my dependencies to be cached in some format, I
don't care which, as appropriate per dependency, whatever is needed to
load them again later without going over the network again.

But it just struck me: if I could cache to a filetree repository instead
of a naked directory, the filetree format should be sufficiently rich to
cache both the "old-school" projects and the newer git-style projects,
right?



--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Another bug: ZipArchive does not understand #'readFrom:'

Dale Henrichs-3

On 1/30/15 11:45 PM, Pieter Nagel wrote:
> Apologies for my exasperated tone in the previous email.
>
> To be fair, at lot of my frustration which might have sounded like
> frustration with Metacello, the tool, is actually frustration with the
> culture and style of dependency management that has grown in the
> communnity around Metacello.
Understood.
>
> Specifically, the fact that dependencies on single versions instead of
> dependencies on ranges of versions, a la Semantic Versioning or something
> similar.  But that is a topic for another day.
FWIW, I believe that I _could_  add range-based dependencies to
Metacello, but so few projects actually use Semantic Versioning that it
would take a campaign to educate (and police) folks on the use of
Semantic Versioning ... something that I don't have time for.

For the projects that care about Semantic Versioning, folks have started
using symbolic versions following a Semantic Versioning naming convention:

   - #release1
   - #release1.0
   - #release1.1

this approach works very well with Metacello and I think captures most
of the capabilities that range-based dependencies would provide.

For git-based projects, where a BaselineOf is used instead of a
ConfiguratonOf and symbolic versions are not available (In a
BaselineOf). I've added tag pattern matching where one can specify a git
tag pattern that matches a range of versions similar to the capabilites
available from symbolic versions:

   - v1.?
   - v1.0.?
   - v1.1.?

I see that you've sent an additional email and it looks like your new
mail covers the same topics as this one, so I will respond in that email.

Dale

--
You received this message because you are subscribed to the Google Groups "Metacello" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.