Subject: Re: Chatted with Ben Collins-Sussman as OSCON
To: None <>
From: Alan Barrett <>
List: tech-repository
Date: 07/31/2007 08:33:09
On Mon, 30 Jul 2007, David Maxwell wrote:
> 2) On the topic of repository size, he said that native svn repos should
> not be significantly larger than native cvs repos. However, repos
> converted by cvs2svn suffer with every branch. Since cvs does not record
> branch creation time, cvs2svn does not use svn branching functionality
> during the conversion, but instead copies each file. Hence the resulting
> repo will be larger than needed by a copy of every file for every branch
> it is on.

That's "copies" as in "each file on the branch gets a small amount
of metadata containing a pointer to where it was copied from", not
"each file on the branch gets a complete copy of the text that it was
copied from".  The ideal situation, which cvs2svn does not do, would be
"the top level directory in the branch gets a small amount of metadata
containing a pointer to where it was copied from; individual files just
inherit that."

> I suggested that perhaps if we determined the branch information
> manually (and from the BRANCHES file) that someone might be able to
> modify cvs2svn to use that input and generate an optimal svn repo. He
> agreed that should be possible, but seemed surprised that we cared
> about the history that much. Apparently some other projects just
> converted the head of development and say 'look back in CVS for the
> old stuff'.

The other big problem is dealing with copied and renamed files.  Our
repository contains files that were renamed via "cvs add" + "cvs
remove", and files that were renamed via repository copy + "cvs remove".
Ideally, cvs2svn would be modified to detect that and DTRT.

Also, svn doesn't yet have true renames.  It simulates them via copy
+ delete, which makes some history harder to follow than it otherwise
could be.  I don't know when true renames might be added to svn.

--apb (Alan Barrett)