Re: Reply to core statement on version control systems

To: tech-repository%netbsd.org@localhost
Subject: Re: Reply to core statement on version control systems
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Tue, 6 Jan 2015 08:24:20 +0000
some comments and notes:

On Mon, Jan 05, 2015 at 05:10:57PM -0500, Eric S. Raymond wrote:
 > * I recommend in favor of a big-bang cutover with CVS gateway
 >   enabled immediately, and *against* trying to incrementally propagate CVS 
 >   commits to a target DVCS during an extended fade-over. CVS has some
 >   misfeatures that would make such a fade-over risky, and there are
 >   better ways to mitigate the problems of the big-bang approach.

While in general I agree, you do realize we already have one family of
incremental conversions running, right?

 > > Some of the things that we would like to be addressed in a 
 > > transition plan are:
 > > 
 > >  * How well the proposed system satisfies the requirements and
 > >    desires of the community, in terms of features, ease of use,
 > >    performance, and other considerations that have been mentioned
 > >    in the tech-repository mailing list.  It would be useful to
 > >    have a matrix similar to the one produced by FreeBSD, available
 > >    at <https://wiki.freebsd.org/VersionControl>.
 > 
 > I have examined the FreeBSD matrix.  I can tell you some things
 > that I believe will simplify your decision tree.
 > 
 > First: It is a matter of fact that the bzr and SVK systems are
 > moribund.  Cogito has not been maintained since 2006; today's
 > option is "bare" git.

We're more or less aware of that - the possible choices are git, hg,
maybe Fossil, and "write something", where that last isn't very
realistic.

 > I can do a full conversion to either target with relative ease.  My
 > tools would go through a git-fast-export stream as an intermediate
 > representation.  Because this is the case, you need not commit to a
 > final choice between the two until a late stage of the cutover.

So, because git doesn't have real branches (only git-branches) the
current conversion loses branch information. Is this limitation also
present in the git-fast-export format? If so, is there a way to avoid
throwing away branch information when converting to hg?

 > >  * Performance implications of the desired VCS system, especially
 > >    for hosts with low or moderate amounts of memory.
 > 
 > With certainty, neither git nor hg performance will pose any
 > noticeable performance issues on recent desktop or server hardware
 > with RAM of 4GB or up.  I know this through having experimented with
 > the src repository myself and by reports from cvs-fast-export devs.

That's not what "low or moderate" means in these parts. Will it work
adequately on a Raspberry Pi? That's "moderate" to even "moderately
high". Meanwhile, "low" means more like a tenth of *that*. Regardless
of what one may think of the merits of ancient junkyard machines, some
people around here like to use them and there are enough such people
to have political pull.

Nobody expects anything to run *fast* on such machines, but we'd like
them to be able to run at all and not take a week of swapping to death
to do a checkout. (There's quite a bit of room to swap and still be
faster than CVS, so I don't think this is a ridiculous expectation.)

 > >  * New standards for log messages that refer to earlier commits,
 > >    to avoid tying us to any particular VCS in the future.
 > >    (Roughly, what to say in log messages instead of "revision
 > >    <number>" or "commit <hash>" or "the previous commit".)
 > 
 > I use a format I call an "action stamp" that looks like this:
 > 
 > 2011-10-25T15:11:09Z!fred%foonly.com@localhost
 > 
 > While this is not absolutely guaranteed unique it is close enough in
 > practice.

There are enough references to CVS version numbers outside the
repository (mail archives, published and signed security advisories,
bug reports) that we need to preserve the CVS version numbers either
as searchable metadata in the VCS or in some external searchable table
of equivalences.

It might be possible to find and update all references in text within
the repository, but converting the rest of that material is not
practical.

 > >  * How the existing repository will be converted.  The following
 > >    items would be nice to have (in decreasing order of importance):
 > > 
 > >      - how CVS vendor branches will be handled, including
 > >        cases where the same vendor tag has been used for
 > >        logically-distinct branches (as is common in pkgsrc).
 > 
 > I'm going to need more technical detail on this, and a pointer to
 > examples; it sounds ugly.  In the worst case I may need to write 
 > a custom pre-conversion step to hack the CVS branch labels.

For a long time we added new packages to pkgsrc by cvs importing the
contents of the package directory. There was a good reason for this
originally (having to do with borrowing stuff from freebsd ports) and
some additional real reasons (having to do with cvs add of
directories, IIRC) but for a long time it was done that way for no
other real reason than because that's how we did it.

The upshot though is that many packages were disjointly and
independently imported using the same vendor branch name, at widely
varying times.

I don't think you need to look very hard in pkgsrc to find examples,
but if you can't find any readily I'm sure someone can rake some up
for you.

My opinion on this is that all or nearly all of these more or less
bogus branches should just be eliminated and the import turned into a
regular add and commit. It might take hand review to identify which
branches need this treatment; but a good approximation (once one has
changesets) is any vendor branch import changeset in pkgsrc where the
same files have never had another version imported on that or any
other vendor branch.

This does not address the other (real) vendor branches in src; I think
it's clear what the proper semantics are there though.

 > >      - how (if at all) historical repository moves and copies
 > >        will be identified and fixed up during the conversion;
 > 
 > This is a non-issue under git, which does no container tracking and
 > does not represent moves and copies internally.  (Instead it deduces this
 > informatiom when needed.)

I... had thought it deduced and stored the information at commit time.
This may be slightly off topic in this thread, but: how does this
work, and how can it possibly both scale and work reliably? Does it
check every other file in the repository for similarity (and in every
previous version) every time you do git log?

 > hg *does* container-track.  Should you elect it as a final target,
 > rename and move operations can be automatically deduced at the
 > last stage.

To what extent do your tools allow importing external annotations
about renames?

 > >  * Considerations to avoid lock-in to a particular version
 > >    control system, but to allow for a future change to yet
 > >    another system.  (For example, we could choose a VCS system
 > >    with a widely supported import and export format, and restrict
 > >    our workflow to features that are supported by many VCS
 > >    systems, and avoid the use of features that are unique to the
 > >    chosen system; however, the set of widely-supported features
 > >    should be identified.)
 > 
 > The set of widely-supported features is easy: it's what will fit in
 > a git-fast-import stream.  The only real compromise this may entail is
 > giving up on container-tracking if future importers are not capable of 
 > deducing these operations.

...as above, what about branch metadata?

 > In a conversion on a CVS repository this large and this old, the
 > probability that we will trip over some nasty and heretofore unknown
 > repository malformation(s) approaches unity.

Given that we've had conversions running for some time, which required
doing a lot of cleanup and turned up some fascinatingly broken things,
it seems likely to me that we've already stepped on most of these
problems.

-- 
David A. Holland
dholland%netbsd.org@localhost
References:
- Reply to core statement on version control systems
  - From: Eric S. Raymond
Prev by Date: Re: Reply to core statement on version control systems
Next by Date: Re: Proposed conversion strategy
Previous by Thread: Re: Reply to core statement on version control systems
Next by Thread: CVS gatewaying of a repository conversion
Indexes:
Home | Main Index | Thread Index | Old Index