Re: irt: Re: Core statement on version control systems

To: David Holland <dholland-tech%netbsd.org@localhost>
Subject: Re: irt: Re: Core statement on version control systems
From: Elliott Mitchell <ehem+netbsd%m5p.com@localhost>
Date: Mon, 8 Dec 2025 18:52:38 -0800

On Mon, Dec 08, 2025 at 05:57:34AM +0000, David Holland wrote:
> On Mon, Dec 01, 2025 at 02:43:20PM -0800, Greg A. Woods wrote:
> 
>  > I think I understand much better about what you are "complaining" about
>  > with Git's rename handling.
>  > 
>  > However you didn't really answer the direct question, i.e. is it a real
>  > requirement for the NetBSD repository?
>  > 
>  > I agree with all the points you made about taking care around renaming
>  > files and committing those changes.  I would posit that they make sense
>  > no matter how smart or dumb the VCS is.
> 
> Yes and no. You should always divide renames and content changes into
> separate commits; or at least, no tool I'm aware of readily handles
> the results if you don't, e.g. for annotate.
> 
> However, it's only git where you need to be super careful about
> merging or rebasing across renames, and only git where it's prudent to
> limit the number of files renamed in a single commit. hg just doesn't
> have those problems.

So you're saying Git makes this bad thing worse.  Certainly that is
suboptimal, but isn't the real solution to simply not do the bad thing?

> There's two parts to this issue: one is detecting renames by
> comparison at commit time vs. having them indicating explicitly with a
> "mv" command. I think it's much more reliable and robust to have
> explicit "mv" (and "cp" too), because if you rename a whole bunch of
> very similar files in one commit it's possible for the detection
> heuritsics to go off the rails. But in ordinary circumstances it
> doesn't make much difference.
> 
> The other part is whether renames are recorded in commit metadata or
> whether the information is thrown out, and the heuristics are
> re-engaged later to guess and maybe get unpredictably different
> results. _That's_ the problem with git's approach.

You've typed there are two issues, but I only see one issue repeated
twice.  Git does in fact have a `git mv` subcommand.  Git *never* overtly
tracks renames, the moment `git mv` has completed Git is already using
the heuristics to identify that the file has been renamed.
`git diff --staged` will tell you what the diff is going to look like
immediately.

As such you've simply typed the same thing twice in different ways.
Retyping the thing again doesn't make it any truer.

> Anyway, we have a thirty year backlog of pending tree reorganizations.
> While we've now done a fair number with cvs add/remove, that's mostly
> been limited to the most critical and least invasive changes, and
> mostly of 3rd party stuff where the history isn't a super important
> asset.
> 
> As I posted last weekend in another thread, cleaning up the mess with
> libc and kernel headers involves mass renaming, and does it right in
> the center of everything. That requires real rename support, and
> that's a big part of why it hasn't happened anytime in the past 15+
> years since I first started talking about it.

Yet both DragonflyBSD and FreeBSD have moved to Git with minimal issues.
OpenBSD has sent a very strong hint they're planning to move to Git.  I
could believe NetBSD's source tree has been rather poorly maintained and
thus needs mass renaming which Git doesn't work as well on.  That seems a
pretty outrageous claim though.

>  > Also it's trivial to embed the branch name into every commit message.
> 
> That doesn't help automated processes very much, unless it's _really_
> always there.

Other clients for the Git protocol may not implement the functionality or
may implement it differently, but Git has the option for hooks during
commit creation.  Several tools use these to add their footers to commits
so they can keep track as commits move around.  So yes it is entirely
possible to implement this if you desire.

>  > I find Git branches almost unusable without making branch base tags
>  > anyway, so much so I'm thinking of figuring out how to automate it.
> 
> git branches _are_ pretty well unusable for traditional branch tasks :-|

Yet in several ways they really do work well.  Keeping local and remote
branch names in separate namespaces mean you don't worry about
collisions.  You might name a branch "project2501", but instead locally
I might call it "kusanagi".

The does seem to match Real World use pretty well.  Once a branch has
left your local machine and been picked up by someone else, why would
they care what /you/ named it?  The commit message is supposed to
indicate what any given commit does, it doesn't matter what project it
was originally intended for.

I am reading what you've typed correctly and this is all strictly
theoretical?  You've mentioned how in presence of renamed `git merge` or
`git rebase` can fail.  Yet under these circumstances it gives you the
warning message "CONFLICT" *telling* you something problematic has
occurred.  If you're not careful when that occurs then that really seems
your own fault.

I can't help noticing you've spent all this time typing about Git using
rename detection, rather than explicit file tracking.  I'm aware this is
a real problem in theory, but doesn't seem troublesome in practice.

If the rename detection really was bad, why haven't I seen a single
report of genuine problems with it.  Truly *vast* amounts of hot air are
being generated, so why is that all I'm seeing?

While the projects out of Redmond, WA have a shorter history, they've got
*many* people working on them (vastly more commits than NetBSD).  Yet
they've somehow transitioned to Git without problems.  Google hasn't been
around nearly as long as NetBSD, but their issues with Git were
unrelated to rename detection.  Facebook^WMeta stuck with Mercurial for a
fair bit of time, but has now transitioned to Git.

If the rename detection was truly that problematic there should be a
*vast* number of reports of trouble.  Instead nothing.  Not a single
report of trouble with rename detection.  Everything was about other
issues, trying to scale gigantic monorepos.

The only conclusion I can reach is rename detection is a major problem in
theory, but practice indicates it is fine.  Notice Quicksort is
troublesome in theory, but the troubles are trivial to mitigate in
practice.

In theory, there is no difference between theory and practice.  In
practice, they're different.  I can readily believe the core team was
anxious about rename detection in Git and therefore chose that as the
primary.

In the world of distributed version control software there is a VERY
important point.  Distibuted version control software is really a
communication tool.  Instead of sending e-mail text messages, you're
trading around source code (or that was they intent, it can be abused to
handle many other things too).  Since it is a communication tool, your
first question should be, "Can I use this to communicate with other
developers?"

With Git the answer is an unequivocal "yes".  90% of recent graduates
will have used it.  As you go up in age the precentage drops, but seems
to remain pretty high through 50.

For Mercurial the answer is generally "no".  Less than 1% of recent
graduates have used it.  I get the feeling as you go up in age the
percentage may increase, but you won't see a greater share of Mercurial
than Git users until around 70.  You'll also see a lot more people with
CVS experience in this group too.

This actually seems a better justification for NetBSD catering more to
Mercurial users.  Due to the age and wide-range of hardware NetBSD caters
to, you *need* that crowd.  This though leaves the severe issue of
Mercurial users being a shrinking population.

On Mon, Dec 01, 2025 at 08:29:11PM +0100, Rhialto wrote:
> On Sat 29 Nov 2025 at 23:18:04 +0000, David Holland wrote:
> > Github pull requests are a fine way to send ten-line patches to random
> > projects you aren't part of, provided you've already signed up for
> > Github.
> 
> I would even claim that the pull request / merge request workflow for
> random projects you aren't part of even leaves a lot to be desired. At
> least compared to projects where you *are* part of.
> 
> When you're part of a project, you can directly clone from the upstream,
> make your branch, push it to the upstream repo, and create the pull
> request. Any changes that are made upstream are pull-able right away.
> 
> If you don't have those permissions, it gets a lot more annoying.  You
> must create a fork of the repo on github to your own account. If you
> already cloned the repo to your local machine, you now have to switch
> its upstream to the fork (or add a new upstream for it). Then you do the
> branch + push (to your fork) + pull request dance.

I'm unsure what you're really pointing at here.  I can think of two
things you could be complaining about and one of them is very much a
non-issue.

Have you gotten your mental model of distributed version control adjusted
right?  There really aren't "servers", everything is "peers" just with
some differing settings.  It is unlikely you allow outside connections to
your development machine(s).  Yet "servers" are simply drop-boxes which
hold copies and never compile nor unpack repository data.

Git calls these "remotes" for a reason.  You can configure 1 remote or
you can configure several remotes (hmm, this repository has 7).
`git clone` initially configures a single remote, but others can be
added.

If you initially cloned from the main repository, but now you need to
push to GitHub for a pull request you do:
`git remote add github git%github.com@localhost:${myuser}:${reponame}.git`

You then do `git push github ${branchname}`.  You can keep doing
`git fetch origin` (or `git pull origin`) without trouble.  You're simply
pulling/fetching from one place, then pushing to another.

> If you just wanted to ever make this one pull request, it stops here.
> But if you want to keep following the project, you now have a fork which
> gets outdated quickly. Github finally added a button to update a branch
> from the original repo, but you'll have to switch though all branches to
> keep up to date completely. That can be a lot of work... and even with
> the command line I am not aware of a simple way to update the whole repo
> (including all branches).

```
git fetch origin
for branch in foo bar baz
do	git rebase origin/main "$branch"
done
```

That what you're thinking of?  You might also want a
`git push -f github "$branch"` too.  From your message I get the feeling
you haven't done much with Git nor Mercurial, and this may not be your
discussion.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg%m5p.com@localhost  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Follow-Ups:
- Re: irt: Re: Core statement on version control systems
  - From: David Holland

References:
- irt: Re: Core statement on version control systems
  - From: Elliott Mitchell
- Re: irt: Re: Core statement on version control systems
  - From: David Holland
- Re: irt: Re: Core statement on version control systems
  - From: Martin Husemann
- Re: irt: Re: Core statement on version control systems
  - From: Elliott Mitchell
- Re: irt: Re: Core statement on version control systems
  - From: David Holland
- Re: irt: Re: Core statement on version control systems
  - From: Constantine A. Murenin
- Re: irt: Re: Core statement on version control systems
  - From: David Holland
- Re: irt: Re: Core statement on version control systems
  - From: David Holland
- Re: irt: Re: Core statement on version control systems
  - From: Greg A. Woods
- Re: irt: Re: Core statement on version control systems
  - From: David Holland

Prev by Date: Re: irt: Re: Core statement on version control systems
Next by Date: Re: irt: Re: Core statement on version control systems
Previous by Thread: Re: irt: Re: Core statement on version control systems
Next by Thread: Re: irt: Re: Core statement on version control systems
Indexes:

Home | Main Index | Thread Index | Old Index