tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: irt: Re: Core statement on version control systems



If renames in Git are so problematic, how would the Git to Mercurial bridge be able to handle the situation?

I haven't heard of any upcoming restrictions on the use of Git as the front-end to the final and true master in Mercurial, yet your message may imply that Mercurial and only the Mercurial CLI, `hg rename`, would have to be used for file renames?

Or do we plan to set it in a way where in such case, using Git would result in a non-rename within Mercurial, yet using Mercurial, would result in a true rename?  Couldn't we simply disable the rename logic in Git to achieve a similar result, without having to use Mercurial as the true master?

Do we have a survey for the number of NetBSD developers who prefer Mercurial over Git?

If very few plan to use Mercurial, wouldn't we basically get the same deal out of Git, by simply disabling the rename logic, using one of the options suggested by Greg A. Woods in another message in this thread?

C.

On Sat, 29 Nov 2025 at 16:04, David Holland <dholland-tech%netbsd.org@localhost> wrote:
At the risk of bringing this thread back to life:

On Wed, Oct 08, 2025 at 11:52:00PM +0000, David Holland wrote:
 > That list is missing at least two critical points where git falls
 > short:
 >    - storing file rename data
 >    - storing the identity of long-running branches in the history

To elaborate a bit based on things other people posted:

 : Is there a publicly available rationale document?

Rationale for moving away from CVS? The rationale for that is obvious.

 : Is [handling renames correctly] a real requirement?

Yes, yes it is. The last time I got burned by git's "rename" support
was, in fact, around the time this thread was happening. I moved some
test reference outputs around at $WORK; then git got confused and
created a confusing merge state that threw away some of the changes.

The reason it's such a ticking bomb is not even so much that the
rename detection isn't itself reliable. (Although it's not, as Joerg
pointed out it's easily confused by small files that all have the same
license header.) The problem is that it applies the rename detection
fresh every time it looks at the files, and not on a single change
basis but from whatever perspective it happens to be asked. If you
rename a file, commit, then hack on it for a while and commit a few
times, and then someone tries to merge or rebase across your work in
one chunk (which is the normal way of doing things) ... git will
compare the state of the file before the rename to the state after you
did all the hacking, conclude they're different files, and lose track
of the rename. Then at best they get a merge conflict in the "deleted"
pre-rename file. Then _if_ they are on top of things _and_ they
recognize what happened, they can go redo their changes in the renamed
version. (But they can't just merge them, even if they'd otherwise
merge readily or even automatically.) Otherwise they're fairly likely
to just lose their changes. (And then if they're rebasing, the old
version is gone and they can't get it back.) However, that's the best
case scenario. Once things go off the rails like this, sometimes git
will apply the change to the wrong file instead. And while in theory
it shouldn't be possible, I'm fairly sure there are cases where the
changes just get thrown away entirely without any conflict being
triggered.

If you rename more than one file at a time there are many additional
and worse failure modes.

To be safe with git and rename you need to:

   - commit ONLY renames in rename changesets; all other changes go in
     the next commit (this is good practice anyway for other reasons
     but for git it's vital)

   - be very cautious about committing mass renames all at once;
     anytime you rename multiple files git can confuse them, but the
     fewer you commit at once the smaller the scope of the possible
     downstream chaos;

   - when merging or rebasing across a rename commit, always merge up
     to but not including the rename commit, merge with it as a
     separate step, then continue;

   - if you get an unexpected merge mess do git merge --abort or git
     rebase --abort first, then look through the history to see why
     and take appropriate steps, rather than pushing through;

   - be aware of when anyone else working on the same tree has made
     rename commits so you don't trip on them by accident.

If you don't follow these precautions you will get away with it often
(especially if you aren't doing large reorgs, or there isn't much
commit traffic in the tree) but sometimes you won't, and when you
don't it can be quite expensive. Especially in the cases that lead to
losing changes without realizing it.

 :: - storing the identity of long-running branches in the history
 :
 : Can you expand on this?

In Mercurial, Subversion, CVS, and nearly everything else, when you
make a branch and commit to it, the identity of the branch you commit
on is part of the commit. With git branches, this is not true. Git
branches are _only_ points in the commit graph, not subgraphs.

Now imagine that you've been working on a long-running devel branch
for a good while (as an extreme example, consider tls-maxphys, which
has been years) and you merge it.

In tools that track branches, you can look at the merge, and the
commits on the trunk before the merge, and the commits on the branch
before the merge, and you can tell which is which. (Except for CVS:
CVS doesn't understand merges, so you can only find the merge point by
interpreting commit messages. That's its own problem.)

In git, you can find the merge commit, and it has two ancestors, and
there's a diamond in the commit graph. But you can't tell from the
repository metadata which side of the diamond was the trunk and which
was the development branch. You have to read and interpret the commit
messages. You also can't tell which development branch it was.

Git advocates at this point rush to tell you that you didn't want to
do that, and you're wrong to have a long-running development branch,
and in general that you the developer should adapt to work around the
limitations of the tool, not expect the tool to support your work.

(Note that by "git advocates" I mean the people who crawl out of the
woodwork whenever anyone mentions some other tool to shout down the
idea that there might be any shortcomings in git.)

--
David A. Holland
dholland%netbsd.org@localhost



Home | Main Index | Thread Index | Old Index