Re: Moving pkgsrc-wip away from SourceForge

To: Mayuresh <mayuresh%acm.org@localhost>
Subject: Re: Moving pkgsrc-wip away from SourceForge
From: Thomas Orgis <thomas.orgis%uni-hamburg.de@localhost>
Date: Mon, 6 Jul 2015 10:38:52 +0200

Am Mon, 6 Jul 2015 08:28:26 +0530
schrieb Mayuresh <mayuresh%acm.org@localhost>: 

> pkgsrc is a "packaging spec". Revision control requirements for such spec
> are much simpler than those for the package itself. pkgsrc is not a
> "development" per se. Hence the notions of distributed development etc.
> are an overkill for pkgsrc.

Let me chime in here with experience from a similar project: I happen
to be involved in Source Mage GNU/Linux, which is a source-based
distribution. Meaning: It is a large set of small files that contain
"packaging spec". Currently, that's around 41000 files, all together
372 MiB on disk (inflated because of file system sectors) in a git
checkout. The whole repository (the .git directory) is 169 MiB.

Ages ago, we switched from perforce (there were reasons) to git.

This was around August 2006. Wow, nearly 10 years ago. Git was the hot
new thing back then (now it's really too late to speak of it being a
fad), an entry to the concluding discussion thread is this:

http://lists.ibiblio.org/pipermail/sm-discuss/2006-June/014665.html

I need to say that I am still not really on good terms with git. I
personally prefer subversion, which came a long way regarding
performance and features since then (merging works fine enough since
1.8.x, and that's why I'm installing svn from pkgsrc on CentOS
systems;-). I like the consistent use of the file system metaphor
without the need to talk about tags and branches as anything different
than copies with differing paths. And modern svn actually can merge
between those effortlessly.

But: Git really won back then because of its technical merits. It was
the only tool able to efficiently handle the thousands of little files.
The whole repository in the working copy was even smaller than the svn
checkout which doubled all the files just to have the one unchanged
revision around. Subversion improved a lot since then, but git still
rules on the performance front, I guess.

Something like github was probably needed to get git to the masses, as
it really is harder to grok (for me, at least). For SVN you just need
to know what a filesystem is.

I just made a quick test just how bad svn is in comparison when simply
creating a repo with the current state. I'm checking in our 41000 to a
repo on the local SSD as I write this. The repo for this one revision
is at 38 MiB. Not that bad. The single top-level .svn directory with
the pristine copies is 171 MiB. Even larger than the .git directory,
which happens to contain the whole history. You cannot argue about that
technical feat. You get full history and commits while your internet is
down, without paying extra.

A run of `svn status` needs 0.8 seconds compared to `git status` with
0.3 seconds. This is with stuff on a SSD, asking an internet server for
each operation must be horrible. Doing `cvs status` on a freshly
extracted pkgsrc tarball needs … wow … lots of output for unchanged
files, redirecting to /dev/null … looking at all the files … ah,
finally: 3 minutes. I guess it works when you always call it for
individual package directories.

But an `cvs up` (from 2014Q3) takes … wait for it … 1:42 minutes (1:30
for the second run), faster than status. But anyway, that's a couple of
seconds with git, not much different with svn, I presume. How patient
are you lot?

Btw., I tested pkgsrc's cvs also with an SSD (everything probably cached
in RAM anyway), but on a node with an idle 6-core Sandy Bridge Xeon
doing the work, and, more importantly, wired Gigabit connection to the
internet. The other tests were with a 6-year-old Thinkpad and Wifi
connection to a domestic DSL router.

There might be one argument for staying with CVS, though. If you always
have internet, you don't have to pay anything extra for eventual
disconnected operation since CVS doesn't keep any other version of the
files on disk and always talks to the server. But it's _slow_ to do so.
I am really amazed how you endured with such a demanding source tree
with CVS for decades.

Anyhow, I see pkgsrc+wip with 103145 files; extracting pkgsrc tarball is a
heavy stress-test on IOPS. So, you can compare to our experience and
roughly use a factor of two to amplify any performance problems we
encountered with SMGL;-)

And if you really want to go for tried & tested, there is the revived
SCCS: http://sccs.sourceforge.net/ . You cannot beat that;-)

> A local repository to which you can commit sounds a fancy thing (and in
> turn extra baggage) than necessity for such a small spec under revision
> control. I am not sure what people mean by "do more off-line" in the
> context of pkgsrc, that CVS' model restricts them from doing -
> specifically in pkgsrc.

You test several package updates together (stuff with dependencies),
have the changes divided between several commits. After testing them
together, you push your commits. You have the atomic change to the
repository (the individual updates could break things without the
others following suit) and still have a clear history with the commits
and commit messages you thought of while editing the local packages
over a week. You even have your mistakes and reversions documented as
commits;-)

You could work on a remote branch (dunno how they work in CVS,
actually), of course. But in git you work on your local branch
naturally without bothering the server.

> IMHO git is suited for a bazaar model of development where the development
> is less centralized, more distributed, with merges being complex and
> challenging.

Yes and no. We at Source Mage do have a central repository. We push all
our stuff there, not publish our repo on github and send pull requests.
There once was (still is?) discussion about going to github, but with
mixed responses. I'm one of those with reservations about that move: A
project like SMGL/pkgsrc should be able to host its own version control
system. Even if folks want to use github to fork it and then make pull
requests, that should still be possible using the plain git protocol.
No need for the Web GUI of github for that. It's in the basic design of
git.

But in any case, I hope that simply sending in a plain-text patch will
always work to get a change accross.* That's the lowest barrier of
entry, really. Or is it easier for folks to click around github's
website and use a clumsy online text editing field than to call the diff
command? For folks using NetBSD/pkgsrc, of all sorts?


Alrighty then,

Thomas

* Please without bitching about the detailed patch format, as long as
  it can be applied to a working copy.

-- 
Dr. Thomas Orgis
Universität Hamburg
RRZ / Zentrale Dienste / HPC
Schlüterstr. 70
20146 Hamburg
Tel.: 040/42838 8826
Fax: 040/428 38 6270

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Follow-Ups:
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Mayuresh

References:
- Moving pkgsrc-wip away from SourceForge
  - From: Benny Siegert
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Greg Troxel
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Benny Siegert
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Joerg Sonnenberger
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Mayuresh
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Pierre Pronchery
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Jason Bacon
- Re: Moving pkgsrc-wip away from SourceForge
  - From: Mayuresh

Prev by Date: Re: Moving pkgsrc-wip away from SourceForge
Next by Date: Re: small chat server?
Previous by Thread: Re: Moving pkgsrc-wip away from SourceForge
Next by Thread: Re: Moving pkgsrc-wip away from SourceForge
Indexes:

Home | Main Index | Thread Index | Old Index