Subject: Re: CVS commit: pkgsrc/databases/db4
To: Jeremy C. Reed <reed@reedmedia.net>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-pkg
Date: 12/29/2004 12:48:39
    Date:        Tue, 28 Dec 2004 19:42:56 -0800 (PST)
    From:        "Jeremy C. Reed" <reed@reedmedia.net>
    Message-ID:  <Pine.LNX.4.43.0412281935140.2696-100000@pilchuck.reedmedia.net>

  | A better fix would be to have pkg_add use the REQUIRES (but get rid of
  | superfluous libraries) and PROVIDES information.

This would be an improvement for sure - the current dependency setup
for binary packages is totally broken (it is making all kinds of
unjustifiable assumptions).

  | Another fix would be to
  | have no open-ended dependencies -- but that would be bad.

I'm guessing a bit at what I think you mean here, but if you mean
that binary packages shouldn't attempt to predict the future, and
claim compatibility with any new version that happens to appear,
then no, I don't think it would be bad, I think it would be a huge
step forward.

There is an issue here, that is being looked at backwards.   That is,
we have a package X, which depends upon Y.   It is known to work with
Y version n (and for now, assume no versions of Y before n).   So,
we're encoding a dependency on Y>=n - essentially, we're claiming that
X will work with every version of Y from n onwards.   That's clearly
absurd.

We kind of keep this mostly working by changing the name of Y whenever
a new version of Y appears which isn't backwards compatible with earlier
versions (or we change its name for a while, then sometime later, we
forget why the name was changed, the new name looks "wrong" and someone
goes and changes it back again...)

That's clearly not the right way.

When we create (binary pkg) X, we know which versions of Y X will work
with, we should be saying this explicitly.   For a library package Y,
that will really only be the particular Y that X was linked against (no-one
expects older libraries to work here, and we have no idea what will
happen with newer ones).

The problem people have here, is that when some trivial change is made
to Y (perhaps even as trivial as fixing a spelling mistake in a manual
page, which clearly affects absolutely nothing) Y gets a version bump,
(and needs one) and the old binary package X would need to be updated,
for no good reason at all.   That's clearly not good.

However, that doesn't mean that our solution (allow X to work against
any new version of Y) is the right one - it works in this case, but doesn't
if the new Y changes some interface (even fixes a bug that X knew about
and was working around - perhaps a function is documented as freeing
memory, but doesn't, it leaks instead, so X does the free for it, when
the library is fixed, and the free added, which should be a backwards
compatible change, X now fails, because of the double free).

So, what else is possible?   Clearly, start looking elsewhere for the
solution - we're currently trying to work around unknown changes in Y
in the pkg for X - no wonder it is difficult...    Instead, handle
changes in Y in Y itself.   That way we can do the right thing.

That is have X require Y version n.  (Y=n type dependency).

Then, when Y version n+1 appears, if Y(n+1) is binary upwards compatible
with Y(n) record in (or with) the package that Y(n+1) can replace Y(n)
in all dependencies - either version is OK (the new one might have new
stuff, that the old X cannot possibly be using, or might be faster,
smaller (yeah, sure!) or have bug fixes that can't be worked around
(buffer overflow bugs, ...) in it - but is guaranteed to work correctly
the same as the old one for anything the old one could do.

Now when X is installed, it can look, see Y(n+1) is there (either
installed, or available to be installed), note that Y(n+1) isn't the
Y(n) that X really wants, but go ask Y whether the Y(n) compatibility
is promised for Y(n+1).   If so, go with it.   If not, either look for
a Y(n) and install that, or simply fail, and require source rebuilding.

This is why I said "(or with)" a couple of paragraphs ago - currently
we encode all of the version info in the file name, so it is possible to
decide whether downloading a binary package file is the right thing to
do, just by looking at its file name.   With this scheme, that won't
work any more, there's simply going to be too much information to record
to include it in a file name (even a long one).    If the version
equivalence info is "in" the binary package, it would have to be downloaded,
just to find out if we can actually use it or not.   That's not desirable
for large packages (or for any really, though people would probably tolerate
it for small ones).   So, we may need to keep the equivalence info in
a separate file, that will certainly be very small (a couple of KB max
should handle anything we ever need - truly old equiv info can just
get dropped without significant harm).   That is, "with" the package
rather than "in" it.

If we can do something like this for binary packages (and I really think
we have to, if binary packages are actually to get promoted as something
that's really safe to use, rather than just a risky way to save
compilation time, which they now are) then I think that removes the
only even half way sane argument that I've ever seen for keeping the
BUILDLILNK_DEPENDS info inside buildlink files (that is, anywhere in the
pkgsrc package that is being depended upon).   From a pkgsrc point of
view that's totally insane, and completely unjustifiable, but I keep
hearing arguments about how it is essential for binary packages (an
argument that's never been made clearly enough for me to actually believe
it - but I do at least accept that others genuinely believe it, and they
may be correct, since I don't really understand it, I don't know - currently).

kre