Re: is the proof in the pudding?

To: "Perry E. Metzger" <perry%piermont.com@localhost>
Subject: Re: is the proof in the pudding?
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Wed, 30 Jul 2008 05:29:24 +0000
On Tue, Jul 29, 2008 at 01:47:53PM -0400, Perry E. Metzger wrote:
 > > Nonetheless, many people report that svn uses unreasonable amouts of
 > > memory and/or disk space, especially when compared to other tools.
 > 
 > That may in fact be the case. I'd prefer for us to make the decision
 > based on experience by a number of NetBSD testers rather than based on
 > third party statements. In particular, SVN has gotten much better with
 > time on memory usage, and it may be that the consumption levels are
 > not an issue for current developers in any case.

Yes, and what exactly is the problem with noting that this might be an
issue?

 > > > > weird branch/tag semantics,
 > > > 
 > > > I see nothing at all weird about the semantics.
 > >
 > > Perhaps then you should explain how tagging works and let everyone
 > > decide. Based on how it's been explained to me, I would describe it as
 > > weird.
 > 
 > It isn't very weird.
 > 
 > [...]

Thank you for the explanation... but yes, it is weird to clone the
tree for every tag.

 > > > > and has in the past earned a bad reputation for reliability.
 > > > 
 > > > I don't think that's true.
 > >
 > > That reputation is real and it earned it.
 > 
 > No, it is not. I don't know how to say this politely, so I'll say it
 > frankly. You've clearly never used SVN -- your questions reveal
 > that. You have no basis on which to make this claim.

I have used SVN, just not very much.

If the reliability problems never existed, then why are they mentioned
even in old SVN release notes? And if no release version ever had the
problems, then why was FSFS not added until SVN 1.1, and why were the
problems with misusing Berkeley DB not solved until after SVN 1.3 was
released?

These points are easily found with Google; perhaps they're wrong, but
something doesn't seem to add up.

 > Then I suggest you try it. You have no idea what it will actually be
 > like. It is unreasonable of you to judge without actually using it.

I'm not, here, judging. I'm collating judgments and reports that other
people (yes, NetBSD developers, since you seem to be concerned about
"third parties") have expressed.

(I have, however, myself judged Subversion based on reports from
people whose opinions I respect. Some of this might be leaking
through, although it's not supposed to be. But whatever. Deal.)

 > > (Also, according to my notes, rename is implemented as delete and
 > > add, so while tree history is maintained, file history doesn't cross
 > > the rename, so that's only sort of "retaining history".)
 > 
 > No, that's not true.

You are correct. The NetBSD developer (since you seem to be concerned
about reports from "third parties") who reported this to me is in
error.

As a side note, this *is* broken in Mercurial.

 > > Would you please stop assuming I have no idea what I'm talking about?
 > 
 > You keep making it clear you haven't used SVN or git, and you keep
 > making mistakes when describing their properties.

I have used both. Just, again, not that much. I have been, again,
collecting and collating the reports that other people (yes, NetBSD
developers, since you seem to be concerned about "third parties")
have made to me.

Some of these are potentially wrong. For that matter, some of the
points I've ascertained myself about systems I use heavily are
potentially wrong as well. The point of posting this material to a
public mailing list is to thrash it out and correct any mistakes.

 > > I've been using Mercurial for a number of projects for some time,
 > 
 > Mercurial is not SVN and is not git.

Mercurial is not git, but, fancy that, they use changeset hashes the
same way and for the same purpose.

 > > They are required for a distributed version control system, yes.
 > > However, we don't need a distributed version control system for
 > > NetBSD.
 > 
 > I'm not sure we do or that we don't. My own favorite is SVN, which is
 > not distributed, but there are very strong arguments my friends who do
 > distributed VCS make for the benefits, and the small amount of work
 > I've done with distributed systems makes me think they may in fact be
 > correct.

No, we do not. We would like a system that supports disconnected
operation and private branches and all that, but we already have a
centralized setup and there's no particular reason that we should
dismantle it in favor of a fully distributed model like that used by
the Linux kernel developers.

This means that we do not need a distributed VCS.

Admittedly, at present you cannot (AFAIK) get a non-distributed VCS
that supports disconnected operation and private branches well, so
when the dust settles we may choose a distributed VCS. But these
features do not require a distributed VCS and so we have no particular
need for one.

You may not be willing to recognize it, but the list of requirements I
posted *was* carefully worded.

 > I'd like to see the distributed experiment tried, if only so that we
 > know what we're missing before we say, in advance, that it is
 > worthless.

Also, please remember that some people in this discussion have
experience with a fully distributed model.

 > > Meanwhile, they create a number of usability problems, some of
 > > which have already been touched on: one can't remember them,
 > 
 > One usually doesn't *need* to remember them.

I don't know about you, but when using CVS I often need to remember a
file version number in short-term memory in order to switch from a
mail-reader window to a source tree window, or to suspend an editor
and type in a CVS command, or the like. This necessity does not go
away with more advanced systems.

 > > they are a hassle to type in when you don't have cut and paste,
 > 
 > That's false. Git, for example, accepts shortest unique substring. You
 > can usually just type five characters and it will deal fine. If you
 > had used git (I have, though not extensively), you would know this.

I have used Mercurial extensively, it has the same property, and it's
still a hassle.

 > > they're a (smaller) hassle to paste in even when you do, you can't
 > > determine ordering without digging in the SCM database,
 > 
 > But people generally don't use the hash numbers the way they seem to
 > use version numbers, 

Yes and no. They do post things like "Do you have changeset
0bca2f52e63e794076af?" or "Fixed in changeset "39af757fe4ce89a1ca61",
which is the same as saying "Fixed in -r1.35 of foo.c" except that
it's more trouble to check if you have the fix or not.

That, similar checks based on rcsid strings, and naming individual
revisions on the command line constitute most of what version numbers
are used for, AFAIK, so I don't know what else you might be thinking
of.

 > > There are also some technical considerations, like tying yourself to a
 > > particular hash function,
 > 
 > That's not a big deal. You could (in theory) upgrade a giant repo to a
 > new hash function pretty quickly. It would require anyone regularly
 > updating to do the upgrade as well, of course.

You could, except that you then either thenceforth use both hashes, or
break compat with all other copies of the repo. And you lose compat
with all those postings and bug reports that refer to changeset
39af757fe4ce89a1ca61. (Or worse, an underspecified partial hash code
ends up naming a different version and everyone gets confused.)

It is probably solvable, but it's not trivial, and it isn't something
you'd want to roll out on an emergency basis (see below).

 > > or what happens if a collision occurs.
 > 
 > You would know *very very fast* because operations would fail loudly.

They might or might not depending where the collision occurred, and
also on how naive the code is about crosschecking. And yes, they're
astronomically unlikely, but that's not the same as saying it won't
happen.

 > > If you need a distributed SCM system, these costs are worth paying;
 > > but if you don't, and we don't,
 > 
 > That's not clear. There is a reason, I think, that all new VCSes are
 > distributed, and it isn't just because it is trendy. People find
 > they're really very convenient, and that they make programmers more
 > productive.

It is because setting up a centralized repository server, and
centralized systems in general, is a big hassle. But we already have
those.

 > > As for an integrity check - they're hardly the only way to do that,
 > > and arguably not a good way either.
 > 
 > They seem like a fine way to do it, and they're built in. I see no
 > arguments against them.

They're not a good way because if someone breaks the hash function it
undermines the functioning of the system, not just the integrity
check. If this happens and you're using the broken hash function only
for integrity checking, you can switch to another hash function with
minimal trouble. To change the hash function that you use for *naming*
objects is not trivial, as noted above.

Naming and integrity checking are distinct operations; combining them
leads to combined failure, which is an unnecessary risk.

 > For those not in the know, git provides two kinds of ways of typing in
 > commands.
 > 
 > You can type, for any command:
 > 
 > git-foo
 > 
 > or
 > 
 > git foo
 > 
 > If it bugs you that all the git-foo commands are lying about, then
 > shove them in a subdir in /usr/libexec and you'll never see them.

It used to be that this wouldn't always work. To the best of my
recollection, anyway. Maybe that's a long time ago now. Again, if it
actually works to drop all the crap into /usr/libexec/git, this is a
nonissue.

 > > *None* of the problems any of the systems have are non-negotiable.  We
 > > can always make our mind up to import perl into base, or cope with
 > > /usr/src taking a couple gigabytes, or do without being able to diff
 > > between release trees,
 > 
 > Which system would prevent us from being able to diff release trees?
 > I'm unaware of *any* VCS with that property.

Some people have suggested using distinct repositories to hold
distinct release branches. With mercurial and also I believe with git
this prevents diffing between them.

The answer of course is "don't do that", but it's not clear if
Mercurial's support for named branches in a single repository is
adequate for releng purposes. (Git's is, AFAIK.)

 > BTW, our /usr/src takes 1.5G right now. Just the sources, not
 > including anything else. If you are angry about having a couple of gig
 > in /usr/src, the problem ins not the VCS.

# cd /usr/src
# du -sk .
992404  .

For small values of 1.5G? That's a very recent -current. 1.5G total is
maybe acceptable. 3G probably isn't. We need to know how big a real
git tree is.

 > > However, making such decisions requires an informed analysis of the
 > > tradeoffs,
 > 
 > I've often seen this sort of discussion in bureaucracies. It is
 > usually a way of deliberately killing a project.

Um, bollocks. Requirements analysis is a real and necessary step in
any nontrivial effort that you want to have succeed.

 > > darcs would require importing ghc into base. That is a complete
 > > nonstarter.
 > 
 > Why? 

Have you checked how big ghc is?

But, you know, whatever. Go build ghc and darcs and try the import and
get back to us.

 > BIKESHEDDING?

Yes. Bikeshedding. I've been trying to collect a list of requirements
(and how the available systems meet those requirements, or not)
because that's vital information for figuring out what is and is not
going to work in the long run and at what cost.

Some people have been helping me with this list.

Other people have been running svn2cvs and other conversion tools, or
working on ways to make the conversion tools work better, or fixing
corrupted history in the repository, or otherwise moving things along.

As far as I can tell, all you've been doing is calling for other
people to do things, without taking any cognizance of what they may
have already done or tried, and also trying to suppress the
requirements analysis.

This is not helpful. Bikeshedding may or may not be technically the
proper term.

Well, that's not quite fair. You've also corrected one error in my
list so far. That's of some value. Only trouble is, in the process
you've nearly persuaded me to walk away.

 > Who besides you opposes this?

Nobody opposes trying things. Nobody's said that you or anyone else
shouldn't try to persuade us that your favorite system has so many
advantages that we'd be well served by learning to live with the
flaws. (Even if that system is SCCS.)

I have said that I think darcs is a nonstarter, yes. If you would like
to take that as a challenge, go right ahead. It does have quite a few
interesting properties that might be significant advantages.

-- 
David A. Holland
dholland%netbsd.org@localhost
Follow-Ups:
- Re: is the proof in the pudding?
  - From: Rafal Boni
- Re: is the proof in the pudding?
  - From: Perry E. Metzger
- Re: is the proof in the pudding?
  - From: Alan Barrett
References:
- is the proof in the pudding?
  - From: Perry E. Metzger
- Re: is the proof in the pudding?
  - From: David Holland
- Re: is the proof in the pudding?
  - From: Perry E. Metzger
- Re: is the proof in the pudding?
  - From: Adam Hamsik
- Re: is the proof in the pudding?
  - From: David Holland
- Re: is the proof in the pudding?
  - From: Perry E. Metzger
- Re: is the proof in the pudding?
  - From: David Holland
- Re: is the proof in the pudding?
  - From: Perry E. Metzger
Prev by Date: Re: what's missing from CVS? extending CVS?
Next by Date: Re: preliminary version control requirements
Previous by Thread: Re: is the proof in the pudding?
Next by Thread: Re: is the proof in the pudding?
Indexes:
Home | Main Index | Thread Index | Old Index