tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Slightly off topic, question about git



On 2022-06-06 15:12, Mouse wrote:
[H]istory rewriting seems to be a favorite pastime of git users.
That's not a fault of git; that's a fault of how some people use
git.
Well, you could argue that it's a fault in git that it allows it.

If there is a way, then some people will use it that way.

But, if there isn't, some people will add it.  git rebase is very
little more than a loop containing git cherry-pick.

It is more, since this can be done without any hint that this actually happened.

Basically, what I've seen in git:


o       Some commit (1-Jan-2022)
|
|   o   Some branch commit (29-Dec-2021)
|   |
|   /
|  /
| /
|/
o        A commit (31-Dec-2021)
|
|


Now, how could "some branch commit" happen on Dec 29, when what it was based on was committed Dec 31 that same year? That's obviously not possible, and yet, that is what I see in git.

I assume (pretty sure actually) that the branch was created based on an earlier commit, and then rebased on to the later one. But it's not clear, nor is it clear which was the original commit it branched from. Sure, you could argue that this is not important, as it has been updated to be fully based on the newer commit, but I still find this disturbing, and there might be meta-understanding lost here. When the original branch was made, where was it made from? That might give some hint on why the branch was made, which is now lost. Not to mention the very strange view where the branch is older than where it stems from.

If this had been through cherry picking, in the normal sense, there would have been a commit with the cherry picked changes.

Same if a merge from another branch was brought in to update the code.

No VCS is ever truly never-change, [...]
Sure.  You can change history in CVS as well.  But you'll have to go
in there and much with the data that is beind.

And in git, that's significantly harder to do than it is in CVS (well,
as I recall CVS; it's been long enough since I used it that my memory
is fuzzy).  If you just change (say) a commit message in the underlying
data, the resulting repo will be corrupt and will be noticed as corrupt
by certain operations; git is built on a foundation of a
content-addressible data store, in which a data blob's name is its
SHA-1.  Pointers are to that SHA-1, so if you change the contents you
will change the SHA-1 (unless you can second-preimage SHA-1 to give the
content you want).

I think you missed my point. Like I said, if you go around and start mucking with the underlying actual data, be that in a database or filesystem, or whatever, then sure. You can do pretty much anything.

But with git, you don't have to do that. You can create this through the normal tools and interfaces. While in CVS you need to go under the radar of the VCS to do it.

It's not like the UI itself allows you to work that way.

If the UI supports cherry-picks, the UI allows it.  As I remarked
above, rebasing is very little more than just a bunch of cherry-picks.

I disagree. If you cherry pick something, then you make a new commit, and things are pretty clear and straight forward. The rebasing don't really do it that way.

And I submit that a VCS that doesn't support cherry-picks is
significantly crippled.

I would agree. But in fact, cherry picking is just a fancy way of saying you modified a file based on something already existing instead of writing it from scratch. A VCS that don't support pulling out content from a previous commit is more than just significantly crippled. I would say it wouldn't be working as a VCS.

And I sortof can see why people want to go that way, since with
distributed VCSs, it becomes much harder to have a linear history.
But they still want to kindof/sortof fake it.

Some people do, perhaps.  Personally, I have no problem with merges.
My own repos, even those which have only me working on them, typically
include "Merge work from multiple machines" commits.

Not really; history doesn't _have_ to be rewritten.  That's what
merge commits are for.  People just choose to rebase work instead of
merging.
It sortof have to.  Since if you've done various work, and others
have done various work on the same files, and both have done commits,
it might not be possible to merge as is.

Yes, merging can require manual assistance.  git includes tools to make
it easier to handle manual-assist merges; others exist as addons.

Merging in itself it no magic, or bad. It's just that your history gets broken when you have two different histories that needs to be merged. Merging source code isn't the problem. But history that is in conflict can never be cleanly solved. It has to be rewritten, and that is what I find distasteful.

The need for them is one of the prices of the distributed model, just
as needing to manually perform much the same operations before
committing is a price of the centralized model.

True. But forcing it to be resolved before doing a commit in a centralized model means that your history as well as your code does have a linear true history.

And so you'll have to rewrite parts that you already committed in
order to get things back to a coherent state.

Merging two changesets that affect the same portions of the same files
inevitably will require that in some cases.

Yes. But there is a difference between sorting it out before or after the commit.

This is a nasty problem when you have separate VCSs.  Well, it
becomes nasty because somewhere in the end, you still have a master
VCS, which holds the source of truth.  Distributed VCSs are not truly
distributed.  There is still just one master.

Only if the humans involved insist on seeing it that way.  There is no
technical reason that has to be true.  git lends itself very well to
the "sure, fork it and see whose fork the userbase prefers" model.  Is
that a strength or a weakness?  Each use case has to decide that for
itself.

We do tend to see it that way. NetBSD have one master. Sure, you can fork it if you want. But that is no longer NetBSD.

Trying to regard this as a popularity model drive is rather broken, I think.

If the repo in question is used to produce a product with a single
distribution channel, then there will inevitably be some kind of master
in the sense of the one used to produce the distribution.  But that's
inevitable in that case; it's an artifact of the use case, nothing
inherent to the underlying VCS.

Sure. If you want a completely fragmented world view, then a distributed VCS makes perfect sense. I didn't expect you to take that position, but I'm not going to try to change your mind. But I'm not in that camp. :-)

  Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index