I was looking at the git clone of the src repo (https://github.com/netbsd/src) and I noticed that there are lots of duplicate commits in there; some commits are even present 3 or 4 times. At first I thought this occurs only with very old commits, but it is the case for relatively recent ones as well. Normally this isn't so easy to see, but with gitk and these settings it is fairly obvious: choose menu View -> New View, select under References: All refs, All (local) branches, All tags, All remote-tracking branches. Lower down, select Strictly sort by date. If you dan scroll back just a few years of commits, you can find a bunch below the time "2017-04-10 23:53:37" Taking some random commits from 2017-03-22 23:37:41: c75b502dcf23b51c8d2504be7a9b5dd7823e4a09 Author: sevan <sevan> 2017-03-22 23:37:41 Committer: sevan <sevan> 2017-03-22 23:37:41 Parent: 20d6933e4ccdf0811b2b11f64dd019c016cea33e (On second through, it may be possible to have a NULL kfs_v in read and write) Child: fa4a1a6573dcb68fb2675cb80653b446a3231bb9 (KDTRACE_HOOKS is enabled by default in GENERIC.common, remove references in) Branch: remotes/origin/jdolecek_ncq d595117d197582e247e9d5d89ea2c3327feb9e3c Author: sevan <sevan%NetBSD.org@localhost> 2017-03-22 23:37:41 Committer: sevan <sevan%NetBSD.org@localhost> 2017-03-22 23:37:41 Parent: 058026589ba723ce74452748b5e78aa0a7cd15bc (On second through, it may be possible to have a NULL kfs_v in read and write) Child: b13c9c92f5f3fb3b6e010d31acd1b2a6bd1b1c22 (KDTRACE_HOOKS is enabled by default in GENERIC.common, remove references in) Branches: netbsd-9, remotes/origin/ad-namecache, remotes/origin/bouyer-xenpvh, remotes/origin/is-mlppp, remotes/origin/isaki-audio2, remotes/origin/jdolecek-ncq, remotes/origin/jdolecek-ncqfixes, remotes/origin/matt-nb8-mediatek, remotes/origin/netbsd-8, remotes/origin/netbsd-9, remotes/origin/perseant-stdc-iso10646, remotes/origin/pgoyette-compat, remotes/origin/phil-wifi, remotes/origin/prg-localcount2, remotes/origin/trunk, trunk Looking at the differences between these, I notice a different conversion of the author/committer name. Also it is on branch "jdolecek_ncq". The second one has improved the author/committer, mentions several branches, one of which is "jdolecek-ncq", with a dash rather than an underscore. With some other commits I saw, the branch names are "ROY" vs "roy". Around 1999-12-05 you can see triple commits (but there are too many branches and gitk doesn't show them, so analyzing that is more difficult). My guess here is that there was an incremental conversion, with improvements in author and branch name conversion along the way. But commits and branches from earlier processing stayed in the result, and hence the duplicates. Maybe it just needs a fresh conversion from the start to get rid of these duplicates. Or if that is not feasible, removal of the outdated branches from the origin repo would probably help a lot. But it is cool to be able to look back all the way to 1992 to the first commit! -Olaf. -- Olaf 'Rhialto' Seibert -- rhialto at falu dot nl ___ Anyone who is capable of getting themselves made President should on \X/ no account be allowed to do the job. --Douglas Adams, "THGTTG"
Attachment:
signature.asc
Description: PGP signature