tech-repository archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: is the proof in the pudding?



David Holland <dholland-tech%netbsd.org@localhost> writes:
> On Tue, Jul 29, 2008 at 01:47:53PM -0400, Perry E. Metzger wrote:
>  > > Nonetheless, many people report that svn uses unreasonable amouts of
>  > > memory and/or disk space, especially when compared to other tools.
>  > 
>  > That may in fact be the case. I'd prefer for us to make the decision
>  > based on experience by a number of NetBSD testers rather than based on
>  > third party statements. In particular, SVN has gotten much better with
>  > time on memory usage, and it may be that the consumption levels are
>  > not an issue for current developers in any case.
>
> Yes, and what exactly is the problem with noting that this might be an
> issue?

Nothing. I just don't want to pre-judge. Certainly if there is a
concern about that, we should test it carefully. We shouldn't decide
in advance not to even test.

>  > > > > weird branch/tag semantics,
>  > > > 
>  > > > I see nothing at all weird about the semantics.
>  > >
>  > > Perhaps then you should explain how tagging works and let everyone
>  > > decide. Based on how it's been explained to me, I would describe it as
>  > > weird.
>  > 
>  > It isn't very weird.
>  > 
>  > [...]
>
> Thank you for the explanation... but yes, it is weird to clone the
> tree for every tag.

It is also weird to touch every file in the system for every tag
(CVS). "Weird" isn't an issue. SVN is comfortable to use. The
semantics are not hard to understand.

>  > > > > and has in the past earned a bad reputation for reliability.
>  > > > 
>  > > > I don't think that's true.
>  > >
>  > > That reputation is real and it earned it.
>  > 
>  > No, it is not. I don't know how to say this politely, so I'll say it
>  > frankly. You've clearly never used SVN -- your questions reveal
>  > that. You have no basis on which to make this claim.
>
> I have used SVN, just not very much.
>
> If the reliability problems never existed, then why are they mentioned
> even in old SVN release notes?

Show me release notes after the thing went beta that say anything
about reliability.

> And if no release version ever had the problems, then why was FSFS
> not added until SVN 1.1,

That doesn't make any sense. The Berkeley db version didn't lose
data. FSFS wasn't an issue.

> and why were the problems with misusing Berkeley DB not solved until
> after SVN 1.3 was released?

What problems?

I do recall one thing, which was if you used a buggy version of
Berkeley DB (and the notes did tell you not to use that version of
BDB) you could have problems, but they weren't data loss problems.

> These points are easily found with Google;

Show me.

>  > Then I suggest you try it. You have no idea what it will actually be
>  > like. It is unreasonable of you to judge without actually using it.
>
> I'm not, here, judging.

Good. So hold off until we have tests in place. You can then play with
the thing and make up your own mind on issues like whether you find
the memory problem to be real or not.

>  > > (Also, according to my notes, rename is implemented as delete and
>  > > add, so while tree history is maintained, file history doesn't cross
>  > > the rename, so that's only sort of "retaining history".)
>  > 
>  > No, that's not true.
>
> You are correct. The NetBSD developer (since you seem to be concerned
> about reports from "third parties") who reported this to me is in
> error.
>
> As a side note, this *is* broken in Mercurial.

Well, then the person who champions Mercurial (if any) will have to
convince us that the problem is outweighed by the benefits (if there
is such a person, and if they succeed in getting the repository
imported.)

> Some of these are potentially wrong. For that matter, some of the
> points I've ascertained myself about systems I use heavily are
> potentially wrong as well. The point of posting this material to a
> public mailing list is to thrash it out and correct any mistakes.

Okay. However, my main concern is that there seems to be this idea
that we can judge these systems without using them and without
conducting real world tests. I don't think we can.

We can't simply write down a list of criteria, match them against
existing systems based on documentation or even hearsay, and then say
"well, clearly we want this one" or even "clearly we want none".

>  > > I've been using Mercurial for a number of projects for some time,
>  > 
>  > Mercurial is not SVN and is not git.
>
> Mercurial is not git, but, fancy that, they use changeset hashes the
> same way and for the same purpose.

Not quite identically, and the "not quite" is always an issue.

>  > > They are required for a distributed version control system, yes.
>  > > However, we don't need a distributed version control system for
>  > > NetBSD.
>  > 
>  > I'm not sure we do or that we don't. My own favorite is SVN, which is
>  > not distributed, but there are very strong arguments my friends who do
>  > distributed VCS make for the benefits, and the small amount of work
>  > I've done with distributed systems makes me think they may in fact be
>  > correct.
>
> No, we do not. We would like a system that supports disconnected
> operation and private branches and all that,

Well, as it turns out, that's what the distributed VCSes do. SVN (the
only non-distributed open source one of note) does not do those
things.

> but we already have a centralized setup and there's no particular
> reason that we should dismantle it in favor of a fully distributed
> model like that used by the Linux kernel developers.

The Xorg people also use a centralized setup, modeled on the way they
used CVS. However, they use git to do it. They didn't dismantle their
way of doing work, they kept it. They do not operate the way the Linux
kernel developers do.

It is perfectly possible to use these tools in a variety of workflows.

> This means that we do not need a distributed VCS.
>
> Admittedly, at present you cannot (AFAIK) get a non-distributed VCS
> that supports disconnected operation and private branches well,

Exactly.

> so when the dust settles we may choose a distributed VCS. But these
> features do not require a distributed VCS and so we have no
> particular need for one.

This makes no sense.

1) The features are only supported by distributed VCSes.
2) The fact that in theory a non-distributed VCS could support the
   features is immaterial if there are none that do at all. We can
   only use a real VCS, not one that doesn't exist.

I will repeat that I'm in favor of SVN, which does not support these
features (which I would like), but I'm not going to sneer at git.

>  > I'd like to see the distributed experiment tried, if only so that we
>  > know what we're missing before we say, in advance, that it is
>  > worthless.
>
> Also, please remember that some people in this discussion have
> experience with a fully distributed model.

I'm not suggesting a "fully distributed development model". I'm
suggesting we want to try out the distributed VCSes. That's quite
different.

>  > > or what happens if a collision occurs.
>  > 
>  > You would know *very very fast* because operations would fail loudly.
>
> They might or might not depending where the collision occurred, and
> also on how naive the code is about crosschecking. And yes, they're
> astronomically unlikely, but that's not the same as saying it won't
> happen.

If you're going to start worrying about astronomically small
probability events, then clearly you should start with the highest
probability ones first. I assume that means you are now living inside
a concrete bunker. You see, the odds of a hash collision are orders of
magnitude lower than that of a meteor striking you and killing you.

>  > > If you need a distributed SCM system, these costs are worth paying;
>  > > but if you don't, and we don't,
>  > 
>  > That's not clear. There is a reason, I think, that all new VCSes are
>  > distributed, and it isn't just because it is trendy. People find
>  > they're really very convenient, and that they make programmers more
>  > productive.
>
> It is because setting up a centralized repository server, and
> centralized systems in general, is a big hassle. But we already have
> those.

No, that's not the reason people find them convenient. They find them
convenient for a very wide variety of reasons. They let you do private
branches. They let you develop using the VCS effectively even though
you don't have commit access to the central repository. They let you
do commits even though you're on an airplane. They have far better
merge tracking. Those are just the most obvious benefits.

>  > > As for an integrity check - they're hardly the only way to do that,
>  > > and arguably not a good way either.
>  > 
>  > They seem like a fine way to do it, and they're built in. I see no
>  > arguments against them.
>
> They're not a good way because if someone breaks the hash function it
> undermines the functioning of the system,

Well, then you won't be any worse off than you would be with a system
without hashes.

Anyway, if someone breaks SHA-1 any time soon, worry about your bank
account first -- the certs you use to log in to it are SHA-1
protected.

>  > > *None* of the problems any of the systems have are non-negotiable.  We
>  > > can always make our mind up to import perl into base, or cope with
>  > > /usr/src taking a couple gigabytes, or do without being able to diff
>  > > between release trees,
>  > 
>  > Which system would prevent us from being able to diff release trees?
>  > I'm unaware of *any* VCS with that property.
>
> Some people have suggested using distinct repositories to hold
> distinct release branches. With mercurial and also I believe with git
> this prevents diffing between them.

That's not true on either. Diff works just fine.

Think about this rationally. At worst, you can always check out both
trees and do diff -r, correct? How could a VCS *prevent* you from
doing a diff?

> The answer of course is "don't do that", but it's not clear if
> Mercurial's support for named branches in a single repository is
> adequate for releng purposes. (Git's is, AFAIK.)

Then perhaps we don't use Mercurial.

>
>  > BTW, our /usr/src takes 1.5G right now. Just the sources, not
>  > including anything else. If you are angry about having a couple of gig
>  > in /usr/src, the problem ins not the VCS.
>
> # cd /usr/src
> # du -sk .
> 992404  .

perry@snark:/usr/src$ du -sh .
1.4G    .

Perhaps I've got some big core dump in there or something. In any
case, we're already around a gig even if we believe your number.

> For small values of 1.5G? That's a very recent -current. 1.5G total is
> maybe acceptable. 3G probably isn't. We need to know how big a real
> git tree is.

3G seems fine to me. That's so little disk space you can't even
measure the cost any more. On older machines, you can NFS mount the
repo if need be, or sshfs mount it, or something similar.

Sure, it is a consideration, but if it makes development far far
easier, it is a cheap price to pay.

>  > > However, making such decisions requires an informed analysis of the
>  > > tradeoffs,
>  > 
>  > I've often seen this sort of discussion in bureaucracies. It is
>  > usually a way of deliberately killing a project.
>
> Um, bollocks. Requirements analysis is a real and necessary step in
> any nontrivial effort that you want to have succeed.

We'll have to agree to disagree.

>  > > darcs would require importing ghc into base. That is a complete
>  > > nonstarter.
>  > 
>  > Why? 
>
> Have you checked how big ghc is?

So? Lets say that it worked out well. Maybe it would still be a good
idea.

> But, you know, whatever. Go build ghc and darcs and try the import and
> get back to us.

I don't know darcs and I'm not a partisan for it. I just won't rule it
out. If someone wants to try, let them make the case.

>  > BIKESHEDDING?
>
> Yes. Bikeshedding. I've been trying to collect a list of requirements
> (and how the available systems meet those requirements, or not)
> because that's vital information for figuring out what is and is not
> going to work in the long run and at what cost.

What I'm doing hasn't been "Bikeshedding" by any reasonable
definition. I'm not arguing over trivia.

I'm not even arguing against collecting informal requirements lists.

What I've argued against is CRUCIAL. You've said we should exclude
systems we haven't even tried out. Arguing against that isn't
"bikeshedding". If you feel it is, well, then ANYTHING is "bikeshedding"
and the term is meaningless.

> Some people have been helping me with this list.

As did I.

> As far as I can tell, all you've been doing is calling for other
> people to do things, without taking any cognizance of what they may

If you want to get personal, then I think your agenda is to prevent
any work from being done by ruling out all possibilities in advance
other than leaving the status quo.

If you would prefer not to get personal, then quit with the crappy ad
hominem attacks.

>  > Who besides you opposes this?
>
> Nobody opposes trying things.

Actually, you did. You stated, explicitly, that we should exclude
systems before we even try them.


Perry
-- 
Perry E. Metzger                perry%piermont.com@localhost


Home | Main Index | Thread Index | Old Index