Re: Moving NetBSD from CVS to svn|git|hg|fossil

To: Greg Troxel <gdt%ir.bbn.com@localhost>
Subject: Re: Moving NetBSD from CVS to svn|git|hg|fossil
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Wed, 27 Apr 2016 17:36:36 +0000
On Tue, Apr 26, 2016 at 08:09:39PM -0400, Greg Troxel wrote:
 > > Yes, except we already have existing practices for all these things,
 > > none of which are inherently affected by a change of infrastructure.
 > > Tying changes of long-established standard practice to an
 > > infrastructure change is not a good idea: it will cause problems from
 > > the procedure changes to look like problems with the new
 > > infrastructure, and also create unnecessary resistance to the
 > > infrastructure change.
 > 
 > I didn't propose doing that, and I basically agree with you.  NetBSD has
 > established practices and they are for the most part the right choices.
 > I definitely agree that changing process because of a tool move is best
 > avoided, and I see no reason to do that in NetBSD.

Right. Other people have posted at times indicating that they seem to
think it's open season, though.

 > >  > The fair part of your comment is that git culture has spawned a lot of
 > >  > "workflow" blog posts, some of which amount to arguing about which name
 > >  > should be used for which branch (in our context, whether 'master' is a
 > >  > release branch or -current).   These arguments are distractions; the
 > >  > real issues (which NetBSD has established practices) are:
 > >
 > > It's not just that; git has a lot of gratuitous moving parts and so
 > > when using git one has to pay a lot of gratuitous attention to how to
 > > manipulate the working parts. That is, IME, what 95%+ of git
 > > "workflow" talk is about.
 > 
 > I've read a lot of this stuff, and I mostly disagree.  I find it to be
 > mostly about branch naming and what to do on branches vs not.  But maybe
 > I filter out a lot of posts by people who don't understand the basics.

Much of that stuff is about branch manipulation because branch
manipulation is where the gratuitous moving parts are. The fascination
with rebase and Pravda-style history is different, although I expect
it has its origins in the same underlying technical issues and the
resulting need for branch manipulation.

 > git (and hg) also has the possibility of rebase, which cvs/svn do not,
 > and thus one can add a notion of expecting branches to be merged to be
 > fixed before merging (fixup commits, rebase to clean changeset without
 > wrong/fix, etc.).  I don't see that as gratuituous moving parts, but as
 > a feature that a group can choose to use or not; choosing to use it
 > makes things harder but results in a far more pleasing and readable
 > history.

...pleasing and readable and lying, making the history unreliable
(thus useless) for debugging or analysis. This seems to be an issue on
which agreement is impossible. I don't approve, I'm not going to
approve, and it seems to me that if this is going to become the norm
we might as well import the tree into a fresh repo and save all the
trouble of repository conversion.

Yes, I know when I dump a 40-part patchbomb into CVS it's effectively
been rebased. I don't approve of that either but it's the only thing
that can be done given CVS; also in that case the change series has at
least been tested in place rather than blindly merged.

Btw, what's your view of this: http://wiki.monotone.ca/DaggyFixes/?
I'm not sure what I think in general, but I'm pretty sure that the
orthodox git response would be "anathema".

 > > [releng]
 > > (However, none of the existing options store enough metadata that
 > > releng doesn't still have to handle some of its own. If someone gets
 > > around to writing bs(1) that might change.)
 > 
 > I think what remains is the doc/CHANGES addition.

Right, and I don't see any way in which that's different from the
current environment.

 > > > how do we control whether things that appear on HEAD/master/trunk have
 > > > to build and work (we don't, except for in-arrears pressure)
 > >
 > > There are reasons (that have nothing to do with limitations of CVS)
 > > that we don't do any more along these lines than we currently do.
 > > Nothing in a SCM migration changes those reasons.
 > 
 > If the reason isn't about CVS, I agree nothing has to change.

It isn't.

 > Right now
 > we have a lot of brokenness caused by partial changes appearing as
 > separate commits, and also by changes that have been put on HEAD without
 > being tested.  CVS branches are so awkward that they are very rarely
 > used.  One could expect a branch proposed for merge to build and pass
 > atf, for example, and even for some sort of CI server to do this
 > automatically.  On my project at work, I require branches that are going
 > to be mergd to build and pass tests, for example.  But that's a
 > separable culture change, not a tool issue, once we have a tool that
 > supports it, and I know it's controversial.

Steps along these lines are proposed fairly often, and the arguments
against are repeated frequently and boil down to (at least) the
following:

   - This is a volunteer project; if you make it hard to commit,
     people will stop bothering. If the net result of more process is
     that work that would otherwise have broken the world doesn't get
     done (or doesn't get merged) instead of getting fixed ahead of
     time, it's not a win. Now, this is NetBSD and people will and do
     accept/embrace things that help with quality control, but the
     general perception is that this kind of process does not.

   - If you hide away commits that have been made but not yet
     approved, you put a big roadblock in development. Nobody wants
     this; it also isn't safe if e.g. there's a critical security
     problem. People will develop workarounds, eventually leading to
     the next point:

   - If you don't hide away commits that have been made but not yet
     approved, gradually people in the know will shift towards using
     the unapproved head, in order to get at fixes that are needed but
     haven't been blessed yet. The net result of this is that the
     unapproved head becomes the de facto real head, and the approved
     head becomes a poorly maintained not-very-stable stable branch
     that nobody pays attention to, except maybe for some users who
     aren't in the loop... and the intent of the CI process is
     entirely subverted but everyone still has to pay for the
     resulting administrative complications.

   - Someone has to maintain the thing. This means not just keeping
     the CI goo running, but also curating the branches involved and
     keeping track of the changesets. Some of this can be automated
     but a lot of it can't; in particular every time one changeset is
     released you have to merge (or worse, rebase) the others that are
     pending and retest them. Someone has to do those merges, since
     they aren't in general going to succeed automatically. (And even
     if they do, I don't like the idea of robomerges happening
     routinely without someone crosschecking the results.)

   - We don't have enough test coverage for "passes an ATF run" to be
     all that useful as a merge criterion. Things will and do still
     break. CI process isn't going to change that. With vastly more
     test coverage it might, but (a) who's going to do the work?, and
     (b) who's going to develop a CI scheme that tests on all ports?,
     and (c) it's not feasible anyway if the full test run takes
     substantially longer than it already does.

   - For many of the organizations where this kind of process is
     beneficial, it's because they don't really understand stable
     branches: everyone is using head and breakage on head generates
     operational concerns. We don't have tihs problem. Also, we do
     have a stable branch scheme that works and I am strongly opposed
     to anything administratively that might result in weakening it.

   - It ain't broke. While having head not build is annoying, and a
     problem if you want to bisect, it doesn't happen all that often
     or for very long. (And a lot less now that there's a buildbot
     reporting regularly.) Meanwhile in general NetBSD head is very
     stable, especially when compared to other similar projects; my
     experience over the years has been that our head is noticeably
     more stable than FreeBSD's *stable* branches are.

There are some issues caused by CVS (e.g. between an import of a new
upstream version in external/ and the following merge commit,
generally the tree will neither build nor run) but these will
disappear automatically along with CVS and nobody will miss them.


 > > > The other workflow aspect is handling contributions from non-committers.
 > >
 > > Given that none of the currently available options has a workable
 > > scheme for handling changeset provenance, to begin with we cannot
 > > safely do anything other than only allow developers to push their own
 > > changesets. That way we know that if the changeset says "christos" it
 > > came from Christos and not from some random bozo who thought it would
 > > be fun to commit a backdoor and sneak it in by hiding it among a pile
 > > of other changesets.
 > 
 > I don't quite follow this.

The problem is this: changesets contain information identifying the
committer, but this information isn't attested in any way. Once in the
master repository, there's no useful way to check that the committer
shown in a changeset was actually the person who generated it.
Therefore, we have to check and enforce authorship at push time. (This
is what CVS already effectively does; you have to log in to get access
to the tree to commit.)

Without an enforcement mechanism if I have push access I can create a
changeset that claims it was committed by Christos and push it.
Presumably if I'm caught doing this I'll be expelled, maybe even
prosecuted; but without an enforcement mechanism I won't be --
forensic examination of system logs and connect times might identify
who did the push but that information is likely to have been discarded
by the time a problem comes to light.

However, in an environment where changesets can be flung around like
monkey poo the risk is even greater: I don't have to be malicious,
just inattentive. If some random third party sends me a large pile of
prefab commits and I push them to the master repo without carefully
examining the header on each one, I might accidentally let by one
whose committer metadata is falsified.

The risk of something mislabeled being allowed to sneak in increases
greatly with the number of external changesets being handled and the
number of people handling them. If handling external changesets is
routine there's even a risk that someone will accept one labeled
"christos" *because* they think it came from Christos.

The danger of course is that this is a way to slip in intentional
backdoors and confuse the trail to help avoid detection. Once the
backdoor is found in the repository, without a clear record of
changeset provenance it is no longer possible to determine where it
came from, and if it's labeled "christos" there will be people who
falsely blame him, and so on, and this is itself a substantial
secondary social risk.

And while you're at it, also consider the risk that since all these
DVCSes are based on obsolete and compromised hash functions, accepting
external changesets also carries the risk that one of them has been
maliciously prepared using a hash preimage attack. I am not sure what
the potential consequences of this are but I think they're pretty
broad.

 > I think I understand what you are getting at, is that if M's commit is
 > rebased to master and pushed, then it shows up in the repo with M as
 > author but there is no record that D pushed it.  Really we want some
 > record that M was the author and D approved pushing it to the official
 > repo.

Yes, and that if the chain is longer it still points back reasonably
reliably to the original source. And ideally that if the chain
contains falsified information it can be detected before it damages
someone's reputation.

There are ways to do this manually, but it's a security issue and we
need enforcement.

 > > We can think about relaxing this once we have more experience with the
 > > tool and the circumstances.
 > >
 > > I agree it would be highly desirable to be able to merge changesets
 > > posted by random passersby, provided they've been reviewed. (One
 > > further problem is that if you make it too easy to pull things in,
 > > they won't always get adequately reviewed.)
 > 
 > True, but I don't think the friction of CVS is really the right way to
 > solve this.  I agree that we do need to keep an eye out for the
 > unreviewed code problem.

It is not, but we also can't afford to open an attack vector, so in
the absence of a viable solution we have to be restrictive.

 > > However, the mechanism for doing so has to create a spoofing-resistant
 > > reference chain back to the person whom the changesets originally came
 > > from. This is not a trivial problem. While git does have some limited
 > > scheme for this, as of the last time I looked at it it pretty much
 > > didn't really work. (I forget the details why.) Also, just saying
 > > "signing" is begging the question -- that's how to get a scheme that
 > > doesn't work.
 > 
 > I'm not sure what you mean by spoofing-resistant, in terms of who is
 > prevented from spoofing what.  Right now we take code from mailinglist
 > posts by people with pseudonyms.  Are you trying to tie contributions
 > back to True Names as defined by some Government?  Or some other form of
 > identity?

No. I'm trying to prevent pseudonymous mailinglist folks from
impersonating committers.

 > Ot is your point that in CVS, the tool at the server records the
 > committer, whereas in git the pusher is checked to be on the ACL but
 > then what is pushed is not verified to have authorship from the same
 > entity?  So that if you and I had push access to a git repo, I could
 > make a commit with commiter/author "dholland%netbsd.org@localhost" and push it?
 > Is the concern that one authorized pusher can spoof another?

The concern is that without crosschecking and enforcement *anyone* can
potentially spoof an authorized pusher.

 > If the middle paragraph describes your concern, I wonder about a server
 > hook that verifies that all new commits in the first-parent ancestry
 > chain to the previous ref have a commiter that matches the pusher.  (We
 > already have a hook in the pkgsrc-wip git repo that denies pushes of
 > merge commits with wrong first-parent ancestry, so the one I describe
 > shouldn't be that hard to write, just perhaps annoying to configure for
 > alternate emails.)

That doesn't prevent a merged changeset from having spoofed
authorship.

-- 
David A. Holland
dholland%netbsd.org@localhost
References:
- Moving NetBSD from CVS to svn|git|hg|fossil
  - From: John Klos
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: David Holland
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: Greg Troxel
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: David Holland
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: Greg Troxel
Prev by Date: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Next by Date: changes in mercurial 3.7 and 3.8
Previous by Thread: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Next by Thread: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Indexes:
Home | Main Index | Thread Index | Old Index