tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Updates on Mercurial



On Thu, Jan 07, 2021 at 11:54:38AM +0000, Roy Marples wrote:
> On 24/12/2020 01:03, Joerg Sonnenberger wrote:
> > On Thu, Dec 24, 2020 at 12:40:54AM +0000, Roy Marples wrote:
> > > > I've decided that I'm done for now with analyzing the clone performance
> > > > for now. There are a few delays still in the process due to a
> > > > combination of the unusual history and somewhat high IO latency, but
> > > > they should no longer break anything and would disappear with a new
> > > > clonebundle.
> > > 
> > > So aside from getting Mercurial to perform better for us given our
> > > repository, what are the show stoppers for actually moving it to production
> > > and retiring CVS?
> > 
> > So the one big question that remains for me is whether we are willing to
> > do a second hg->hg migration at some point in the not so far future.
> > There are at least two big reasons for that, both breaking the hash
> > chain:
> > (1) "Bonsai" changesets
> > (2) SHA1 replacement
> > 
> > The former is a pretty significant change in the data model under some
> > discussion at the moment. It would address some of the big remaining
> > blocks in the clone time and also other topics. The latter should be
> > self-explanatory. The user impact of both is primarily the hash break.
> > It should be possible to fix up references in the commit messages
> > automatically, but if they leak into files that would be more tricky.
> 
> What is a bonsai changeset? Is there a link to this discussion?

The idea and name comes from Eden, Facebook's Mercurial
clone/fork/whatever. For historic reasons, hg keeps a list of all files
with their associate version for every changeset. That's the manifest.
The hash over the manifest text is part of the changeset and how
everything is linked together. Bonsai changesets on the other hand would
keep the changes and only the change directly in the changeset,
replacing the current list of touched files. It will make the actual
changeset somewhat larger, but on the other hand it allows storing
manifests in much more efficient forms. This matters because the
manifest processing, especially the generation of intermediary
snapshots, is expensive, i.e. for src it accounts for 20% of the total
unbundle time during clone.

> From what little I found googling for bonsai and mercurial, some work was
> done from 2017. If it's taking that long, can NetBSD still wait? But then
> it's not like we've been waiting long ;)

See earlier question about doing a hg->hg migration at some point in the
not so far feature. Essentially this would be rehashing the repository,
but otherwise not impact the normal operation.

> I think the SHA1 replacement is important enough to wait as it's now been
> proven to collide I think.

Yes and no. The collision situation primarily affects operation from
untrusted sources. Since we have a trusted root repository, it is mostly
a concern for (3rd party) mirrors. It will also apply when we start
signing revisions, but that is also only a nice-to-have feature.

Joerg


Home | Main Index | Thread Index | Old Index