tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposed conversion strategy

On 10/26/14 5:59 AM, Eric S. Raymond wrote:
I have been reminded of something interesting. The git project ships
a CVS emulation server.  This brings the front-end set that it can
support to at least three: git, hg, *and* CVS.

I think this implies a workable conversion strategy. That is: upgrade
the repos to git so you get full changesets in the whole history, then
make a point of supporting all three front ends.  This ought to make
everyone reasonably happy, or at least not so unhappy that they'll

The role I would like to have in this is ensuring that your upgraded
repo is a really *good* conversion.  I consider high-quality of
conversion important - it is why I have devoted so much effort
to writing reposurgeon and maintaining cvs-fast-export.

I would love to see a better representation of the state of some of these tools which present different "views" of a repo - whether this is using hg tools to access a git repo, git to access hg, or cvs, or whatever - my experience with this sort of thing is generally that they work OK for a limited set of operations, but not real-world usage. I'd love to be proved wrong.

Likewise, some of the assertions about "git fast-export" format being a lingua franca don't seem to be panning out (at least in my googling) either - I've been having a heckuva time finding a useful way to (for example) go from the output of cvs-fast-export directly into Mercurial. There are several different repos kicking around purporting to provide "hg fastimport", but none of them appear to work out of the box, at least not with the version of hg I have (looks like "3.1.2").

Thus far, the fastest and most reliable tool we have is for conversion from CVS to fossil (written by Jörg Sonnenberger, who maintains the github NetBSD mirror) - unfortunately, fossil itself has issues working with a dataset the size of the NetBSD repo, and is not currently a good end-choice. The conversion from fossil to git (which the most-used git mirror of NetBSD uses) takes nearly twice as long as from CVS->fossil. cvs-fast-export (which thus far I've only gotten to stage - I have not yet actually imported into git or hg) alone takes slightly longer than the CVS->fossil->git dance that we're currently doing, at least on my hardware.*

We also have a Mercurial conversion from that git repo (a third step!) but branches do not work properly.

Here are some traits I think a good conversion should have, beyond the
obvious one of "get the changesets and tags right":

1. Unix IDs in pre-DVCS systems like CVS and Subversion should be
completely mapped to DVCS-style developer IDs with full name and email.

1. Ignore files and patterns should be fully migrated, not only in the
head version but through the enire history.

2. Commit references in commit comments should be fixed up so they
make sense in the new system.

These are fine and I mostly agree.

3. When practical, commit comments should have whitespace inserted so
they conform to the summary/blank-line/details format of DVCSes.  This
helps tools like gitk and hg view work better.

4. In general, the objective should be for the repository to look
as though the new system had been in use since the beginning of
time, minimizing developer friction when browsing far back in the
history.  (Keeping the old repo still available meets objections
about rewriting history.)

These will require more testing to see how they work in practice.

These are the premises I am executing right now in converting Emacs.
You cannot get a conversion this good from a fully automated tool;
it takes human judgment and taste.  The problems are similar to
(though more tractable than) idiomatic translation between languages.
Fortunately, I enjoy solving them.

I am of the opinion (which is not as widely shared within NetBSD as I'd like) that it does make sense to begin actively working on a CVS replacement within NetBSD, but any replacement is going to require months if not years of active developers working with both the existing setup and proposed replacement to find the gotchas. This, in turn, is going to require the conversion tools to continue to improve, because we're not going to be able to evaluate things properly until we can get incremental conversions working properly - or get the full-run time (from CVS to system X) down to a few hours. We're much, much closer than we used to be, but we're not there yet. Any help getting there is appreciated - but this is going to take a long time.


* - my conversion box (which does not do the conversion currently hosted on github, it's just what I've been using to test with) is a HP ProLiant G160 with 12 cores (24 threads) of Xeon 5639 @ 2.13Ghz and 72GB of RAM, running NetBSD 7.0_BETA.

Home | Main Index | Thread Index | Old Index