Re: Moving NetBSD from CVS to svn|git|hg|fossil

To: David Holland <dholland-tech%netbsd.org@localhost>
Subject: Re: Moving NetBSD from CVS to svn|git|hg|fossil
From: Greg Troxel <gdt%ir.bbn.com@localhost>
Date: Tue, 26 Apr 2016 20:09:39 -0400

David Holland <dholland-tech%netbsd.org@localhost> writes:

> (raking this up, it's not *that* old yet)

No problem; me too.

>  > That's untrue.  workflow is a word that describes things like whether
>  > changes are tested before being applied to the definitive copy, when we
>  > create release branches and tags, and essentially how pullups are done.
>  > We have a way of doing that now which works with CVS, and I see no
>  > reason that can't work straightforwardly with most other tools.
>
> Yes, except we already have existing practices for all these things,
> none of which are inherently affected by a change of infrastructure.
> Tying changes of long-established standard practice to an
> infrastructure change is not a good idea: it will cause problems from
> the procedure changes to look like problems with the new
> infrastructure, and also create unnecessary resistance to the
> infrastructure change.

I didn't propose doing that, and I basically agree with you.  NetBSD has
established practices and they are for the most part the right choices.
I definitely agree that changing process because of a tool move is best
avoided, and I see no reason to do that in NetBSD.

>  > The fair part of your comment is that git culture has spawned a lot of
>  > "workflow" blog posts, some of which amount to arguing about which name
>  > should be used for which branch (in our context, whether 'master' is a
>  > release branch or -current).   These arguments are distractions; the
>  > real issues (which NetBSD has established practices) are:
>
> It's not just that; git has a lot of gratuitous moving parts and so
> when using git one has to pay a lot of gratuitous attention to how to
> manipulate the working parts. That is, IME, what 95%+ of git
> "workflow" talk is about.

I've read a lot of this stuff, and I mostly disagree.  I find it to be
mostly about branch naming and what to do on branches vs not.  But maybe
I filter out a lot of posts by people who don't understand the basics.
There is the issue of first-parent ancestry for merges, and getting that
right (avoiding git pull, which is a bug, and --no-ff on merges back to
parents).  This is definitely an area where the default behavior should
be different in git.  But it seems hg doesn't really do any better; it
just has a social convention that the merge commits some git people find
displeasing are ok, as I understand it.

git (and hg) also has the possibility of rebase, which cvs/svn do not,
and thus one can add a notion of expecting branches to be merged to be
fixed before merging (fixup commits, rebase to clean changeset without
wrong/fix, etc.).  I don't see that as gratuituous moving parts, but as
a feature that a group can choose to use or not; choosing to use it
makes things harder but results in a far more pleasing and readable
history.

My projects at work have been doing rebase to clean up branches pre
merge, and I think that's been a good thing.  But that's an environment
where I can expect everyone to learn how to do that to the point of
making them go to class for it, if I think it's a net win over a year.
FWIW, people have been entirely ok with this sort of rebase/cleanup
approach.  (I'm not suggesting making it mandatory in NetBSD.)

The other thing that gets discussed in git is the clean changeset
notion, but I think that's a good thing.

> Anyway, there's only one reasonable way to approach the issues you
> cite:
>
>  >   how do we cherry-pick changes from -current to release branches, and
>  >   how do we control when it happens
>
> Use the native cherry-pick functionality, done by member of releng
> according to standard procedures. Some of the details of the process
> can probably be improved because the native cherry-pick will be
> storing/handling more metadata than CVS does.

Agreed.  And "cherry-pick -x" probably, to insert the ref to the
original commit.

> (However, none of the existing options store enough metadata that
> releng doesn't still have to handle some of its own. If someone gets
> around to writing bs(1) that might change.)

I think what remains is the doc/CHANGES addition.

>  >   how do we control whether things that appear on HEAD/master/trunk have
>  >   to build and work (we don't, except for in-arrears pressure)
>
> There are reasons (that have nothing to do with limitations of CVS)
> that we don't do any more along these lines than we currently do.
> Nothing in a SCM migration changes those reasons.

If the reason isn't about CVS, I agree nothing has to change.  Right now
we have a lot of brokenness caused by partial changes appearing as
separate commits, and also by changes that have been put on HEAD without
being tested.  CVS branches are so awkward that they are very rarely
used.  One could expect a branch proposed for merge to build and pass
atf, for example, and even for some sort of CI server to do this
automatically.  On my project at work, I require branches that are going
to be mergd to build and pass tests, for example.  But that's a
separable culture change, not a tool issue, once we have a tool that
supports it, and I know it's controversial.

>  > The other workflow aspect is handling contributions from non-committers.
>
> Given that none of the currently available options has a workable
> scheme for handling changeset provenance, to begin with we cannot
> safely do anything other than only allow developers to push their own
> changesets. That way we know that if the changeset says "christos" it
> came from Christos and not from some random bozo who thought it would
> be fun to commit a backdoor and sneak it in by hiding it among a pile
> of other changesets.

I don't quite follow this.  Certainly we should not allow non-developers
to push anything to the official repository.  But right now we have

   tech-foo member M sends in a patch to fix something

   developer D reads it and thinks it is ok

   D applies it to their local tree

   (being careful, they do a release build and ATF run, or they don't)

   D writes a commit message, crediting M, and does cvs commit

with a DVCS, M can prepare a commit and the same thing can happen in
terms of review, except that D has to do less mechanical work in
applying and the work of writing a good commit message is pushed on M
(which I think is a good thing).

I think I understand what you are getting at, is that if M's commit is
rebased to master and pushed, then it shows up in the repo with M as
author but there is no record that D pushed it.  Really we want some
record that M was the author and D approved pushing it to the official
repo.

One semi-obvious way is to amend the commit message to have "Reviewed by
D".  Another is to have D change the commiter to their own name and put
"Contributed by M" in the message.  Another way is to have M's commit
(with M as author) on a branch, and use an explicit merge commit by D.
That records M's name/email and D's, which gets us to at least where we
are now.  Or maybe we add some logging of pushes to long-term branches
and publish that log.

> We can think about relaxing this once we have more experience with the
> tool and the circumstances.
>
> I agree it would be highly desirable to be able to merge changesets
> posted by random passersby, provided they've been reviewed. (One
> further problem is that if you make it too easy to pull things in,
> they won't always get adequately reviewed.)

True, but I don't think the friction of CVS is really the right way to
solve this.  I agree that we do need to keep an eye out for the
unreviewed code problem.

> However, the mechanism for doing so has to create a spoofing-resistant
> reference chain back to the person whom the changesets originally came
> from. This is not a trivial problem. While git does have some limited
> scheme for this, as of the last time I looked at it it pretty much
> didn't really work. (I forget the details why.) Also, just saying
> "signing" is begging the question -- that's how to get a scheme that
> doesn't work.

I'm not sure what you mean by spoofing-resistant, in terms of who is
prevented from spoofing what.  Right now we take code from mailinglist
posts by people with pseudonyms.  Are you trying to tie contributions
back to True Names as defined by some Government?  Or some other form of
identity?

Ot is your point that in CVS, the tool at the server records the
committer, whereas in git the pusher is checked to be on the ACL but
then what is pushed is not verified to have authorship from the same
entity?  So that if you and I had push access to a git repo, I could
make a commit with commiter/author "dholland%netbsd.org@localhost" and push it?
Is the concern that one authorized pusher can spoof another?

Or is it that someone who compromises the official repo can add
changesets with any author?

If the middle paragraph describes your concern, I wonder about a server
hook that verifies that all new commits in the first-parent ancestry
chain to the previous ref have a commiter that matches the pusher.  (We
already have a hook in the pkgsrc-wip git repo that denies pushes of
merge commits with wrong first-parent ancestry, so the one I describe
shouldn't be that hard to write, just perhaps annoying to configure for
alternate emails.)

Attachment: signature.asc
Description: PGP signature

Follow-Ups:
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: David Holland

References:
- Moving NetBSD from CVS to svn|git|hg|fossil
  - From: John Klos
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: David Holland
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: Greg Troxel
- Re: Moving NetBSD from CVS to svn|git|hg|fossil
  - From: David Holland

Prev by Date: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Next by Date: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Previous by Thread: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Next by Thread: Re: Moving NetBSD from CVS to svn|git|hg|fossil
Indexes:

Home | Main Index | Thread Index | Old Index