Subject: Re: anoncvs problems
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Eric Haszlakiewicz <erh@jodi.nimenees.com>
List: current-users
Date: 02/06/2005 23:55:32
On Sun, Feb 06, 2005 at 10:43:08PM -0500, Thor Lancelot Simon wrote:
> On Sun, Feb 06, 2005 at 07:17:32PM -0500, Alec Berryman wrote:
> > 
> > No.  Perhaps I should have been more clear - yes, you've lost a lot if
> > you're using BDB, but if you're using FSFS you're only losing one
> > commit.  FSFS uses a file to represent each revision.  Yes, that will
> 
> Oof.  Tens of thousands of files, times one filesystem fragment per
> revision (many of our files have hundreds of revisions by now).  I can see
> where _that's_ heading (and it's sure not pretty) The BDB backend made our
> repository four times as big -- it sounds like FSFS would be much, much
> worse.

	That's a misleading estimate of the impact of FSFS.  The measure of
number of FSFS files is considerably less than CVS files * CVS revisions
because anything with the same commit message nearby in time should end
up being collected into a single svn revision.  Furthermore, it appears
that FSFS files use some kind of compresion scheme.  e.g. I did a test
checkin of a 25k text file, and it resulted in a 14k FSFS file.  On
the other hand, the overhead per-revision-per-file can be a bit high
(1k-2k in some random samples) and seems to be dependant on the number
of files in each directory that contains a file that was checked in.
	Also, FWIW, I did a comparison between FSFS and BDB for a small
(1500 revision) svn repository I have:
	BDB - 170 MB
	FSFS   75 MB
Clearly, fsfs is more space efficient than BDB.  Based on this, the on-disk
size growth of all the netbsd respositories might end up somewhere around:
4 * (75/170) = 1.76, and the in-memory size would be some unknown amount less.

> I will repeat my question of some months ago: can anyone actually give an
> example of a repository with anywhere near as many files and revisions as
> ours, with a couple of hundred active developers who semi-regularly
> check in, and well over 100 simultaneous checkouts at peak periods, that
> is managed by Subversion?
	Probably not, but just because no-one has done it doesn't mean it won't
work.  Just that we'd need to run some tests to see if it will, and if
there were some initial tests with BDB, it'd be a good idea to re-do them.

eric