Re: git branches (was: Re: Reply to David Holland's notes and comments)

To: David Holland <dholland-tech%netbsd.org@localhost>
Subject: Re: git branches (was: Re: Reply to David Holland's notes and comments)
From: "Eric S. Raymond" <esr%thyrsus.com@localhost>
Date: Tue, 13 Jan 2015 00:51:26 -0500

David Holland <dholland-tech%netbsd.org@localhost>:
> (this part is important enough to warrant its own reply, I think)
> 
> On Wed, Jan 07, 2015 at 05:50:50AM -0500, Eric S. Raymond wrote:
>  > Apologies for the slightly belated reply; I'm not subscribed to this
>  > list yet and found David Holland's comments when checking the list
>  > archives to make sure my technical proposal had come through.
> 
> Oops; I assumed you would have by then. My fault.

I'm now subscribed.

>  > >So, because git doesn't have real branches (only git-branches) the
>  > >current conversion loses branch information. Is this limitation also
>  > >present in the git-fast-export format? If so, is there a way to avoid
>  > >throwing away branch information when converting to hg?
>  > 
>  > I don't understand what is "real" about CVS branches that isn't "real"
>  > about git branches.  Both are simply labels pointing to tip revisions
>  > in a tree.  Can you clarify what "branch information" you believe is
>  > being lost?
> 
> In git, a branch is a self-moving tag that names the head of the
> branch, that is, a branch is a version.
> 
> In CVS, Mercurial, Subversion, and virtually every other SCM I know
> of, branches are subgraphs, that is, sets of commits. E.g. in CVS when
> you create a branch, it gets a number, and then every commit on the
> branch is numbered relative to the branch number. And in Mercurial,
> the branch a commit is on is part of the commit metadata and thus a
> permanent property.  (I am told that Monotone handles branches more or
> less the same way as Mercurial, but have no direct experience.)
> 
> These are not isomorphic models; while a git branch induces a subgraph
> (all commits in the ancestry of the branch head) this subgraph is not
> the same subgraph people usually intend when talking about the commits
> on a branch. In particular, for a diverging branch (such as a release)
> it doesn't stop at the branch point; and for a converging branch (such
> as for feature development) once the branch is merged back into the
> trunk the induced subgraph contains the whole trunk in addition to the
> commits originally made on the branch.
> 
> And, as best I can tell if you create a feature branch, hack for a
> while, and then merge, there's no automated way to tell afterwards
> which of the various chains of commits were your feature commits on
> your feature branch, and which were something else. (Reading the
> commit messages by hand doesn't scale.)
> 
> This loses information. Losing information is bad.
> 
> Since I would like to avoid losing this information, particularly if
> converting the repository to something other than git, my questions
> are:
> 
>    (1) Am I missing something in git's branch model? I don't think so
>        but it's always possible.
> 
>    (2) Does this same limitation/problem affect git-fast-import
>        streams?
> 
>    (3) If so, is there a way to augment the data to avoid losing the
>        information in the case where the target is not git?
> 
> and implicitly also:
> 
>    (4) Do your tools avoid this problem?

Answering the last question first:

No, they don't.  They can't, because git and its fast input streams
don't store the right kind of metadata.

You are correct that "if you create a feature branch, hack for a
while, and then merge, there's no automated way to tell afterwards
which of the various chains of commits were your feature commits on
your feature branch, and which were something else". 

For this not to be true, git would have to treat branch names as
Mercurial does - in effect, as a per-node attribute that is normally
immutable once set.

However, I note that CVS doesn't handle branches this way either.  In
CVS, as in git "a branch is a self-moving tag that names the head of the
branch, that is, a branch is a version.".  There is nothing else there.
The same is in effect true of SVN, though the details are odd and
slightly different.

You have an illusion of what you call "true" branches in CVS because
of two facts about CVS's representation of revisions: it doesn't have
real merges (every revision except 1.1 has one unique parent; the
revision graph is a tree), and revisions are naturally ordered in such
a way that at any fork in the tree you always know which is the senior
branch.

These two properties together guarantee that it is always possible
to assign a unique correct branch name to every changeset at the time
a CVS repo is imported to git, using a trivial graph-coloring algorithm 
strarting at the branch tips. There is *no loss of information*; nothing
needs to be done to preserve this property.

The "loss of information" occurs not at repository translation time
but the first time you merge a branch that is not a fast-forward .  At
that point the DAG has a backward-facing join and becomes
non-treelike.  There is no rule to color the resulting graph that is
guaranteed to preserve the branch attributions on nodes in the merge
bubble as they were before the join.

This is the problem you are observing.  Better translation tools cannot
fix it; it is intrinsic to any VCS with true merges.  There are only two
options here: the VCS can either implement branches as stored and 
persistent node colors a la Mercurial, or accept that line-of-development 
information will be lost at merges a la git.

Therefore, the practical solution to your problem is that you should advocate 
for the conversion target to be Mercurial rather than git.  (Supposing you
succeed, we'll need to find an importer that maps import-stream branches
to Mercurial node colors.)

I am not a partisan in that dispute.  99% of the conversion process, up to
the point where we feed finished fast-import streams to an importer, will
be the same either way.
-- 
		<a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Follow-Ups:
- Re: git branches (was: Re: Reply to David Holland's notes and comments)
  - From: David Holland
- Re: git branches (was: Re: Reply to David Holland's notes and comments)
  - From: Dennis Ferguson

References:
- Reply to David Holland's notes and comments
  - From: Eric S. Raymond
- git branches (was: Re: Reply to David Holland's notes and comments)
  - From: David Holland

Prev by Date: hg mirror
Next by Date: Re: git branches (was: Re: Reply to David Holland's notes and comments)
Previous by Thread: git branches (was: Re: Reply to David Holland's notes and comments)
Next by Thread: Re: git branches (was: Re: Reply to David Holland's notes and comments)
Indexes:

Home | Main Index | Thread Index | Old Index