tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Why I'm working on a NetBSD conversion



(As just posted to my blog)

Some people on the NetBSD tech-repository list have <a
href="http://mail-index.netbsd.org/tech-repository/tindex.html";>wondered</a>
why I've been working on a full NetBSD repository conversion without a
formal request from NetBSD's maintainers that I do so.

It's a fair question. An answer to it involves both historical
contingency and some general issues about moving and mirroring large
repositories.  Because of the accident that a lot of people have
recently dropped money on me in part to support an attack on this
problem, I'm going to explain both in public.

<!--more-->

First, the historically contingent part:

1. Alan Barrett tried to run a full conversion of NetBSD using
cvs-fast-export last December and failed (OOM).  He then engaged me
and we spent significant effort trying to reduce the program's working
set, but could not prevent OOM on either of the machines we were
using.  Because Alan was willing to work on this at some length, I
formed the idea that there was real demand for a full NetBSD
conversion.

2. The NetBSD repo is large and old.  I wanted a worst-possible-case
(or near worst-possible-case) to test the correctness of the tool
on. I knew there might be larger repositories out there (and now it
appears that Gentoo's is one such) but for obvious historical reasons
I thought NetBSD would be an exemplary near-worst case.  Thus, it
would be a worthy test even if the politics to get the result deployed
didn't pan out.

I have since been told that NetBSD actually has a git mirror of its
CVS repository produced with a two-step conversion: CVS -> Fossil ->
git.

This makes me nervous about the quality of the result. Repo
conversions produce artifacts due to ontological mismatches between
the source and target systems; a two-stage process will compound the
problems.  Which in turn gives rise exactly the kinds of landmines one
least wants - not obvious on first inspection but chronically
friction-causing down the road.

I'm not speaking theoretically about this; I'm currently dealing with
a major case of landmine-itis in the Emacs repository, which has
(coincidentally) just been scheduled for a full switch to git on Nov
11.  I've been working on that conversion for most of a year.

For a really high-quality conversion even a clean single-stage move
needs human attention and polishing.  This is why reposurgeon is
designed to amplify the judgment of a human operator rather than
attempt to fully mechanize the conversion.

I understand there is internal controversy within NetBSD over a full
switch to git.  I don't really want to get entangled in the political
part of the discussion.  However, as a technical expert on repository
conversions and their problems, I urge the NetBSD team to <em>move the
base repository to something with real changesets as soon as
possible.</em>

It doesn't have to be git.  Mercurial would do; even Subversion would
do, though I don't recommend it. I'm not grinding an axe for git here,
I'm telling you that the most serious, crazy-making traps for the
unwary lie in the move from a version-control system without full
coherent changesets to a VCS with one.  Once you have that conversion
done and clean, moving the repository content to any other such system
is relatively easy.

(Again, I'm not speaking theoretically - reposurgeon is the exact tool
you want for such cross-conversions.)

This is my offer: I have the tools and the experience to get you to
the changeset-oriented VCS of your choice. I can do a really good job,
better than you'll ever get from mechanical mirroring or a batch
converter, because I know all about common conversion artifacts and
how to do things like lifting old version references and
ignore-pattern files.

It looks like my tools are git-oriented because they rely on git
fast-import streams as an interchange format, but I'm not advocating
git per se - I'm urging you to <em>move somewhere with
changesets</em>.  It's a messy job and it wants an expert like me on
it, but it only has to be done once.  Afterwards, the quality of your
developer experience and your future technical options with regard to
what VCS you actually want to use will both greatly improve.

Related technical point: the architectural insight behind my tools is
that the git folks created something more generally useful than they
understood when they defined import streams.  Having an editable
transfer format that can be used to move content and metadata
relatively seamlessly between VCSes is as important in the long term
as the invention of the DVCS - possibly more so.

cvs-fast-export emits a fast-import stream not because I'm a git
partisan (I actually rather wish hg had won the mindshare war) but
because that's how you get to a sufficiently expressive interchange
format.
-- 
		<a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>


Home | Main Index | Thread Index | Old Index