Subject: Distributing bulk package building (was mac68k Packages for 1.6)
To: None <port-mac68k@netbsd.org>
From: John Klos <john@sixgirls.org>
List: port-mac68k
Date: 10/31/2002 19:54:23
Hi,

> How about it:  Another distributed computing project, kindof a
> bin-builds@home...
>
> How tough a programming job would it be, to write a perl script/ daemon
> that would let a central server assign compilation tasks, and collect
> the results when done?  Those of us who are not too concerned about
> security, could let our boxes join the project.

This is something that I've talked about with a number of other people
about. Some framework for this is already in place, but it'd have a number
of requirements on the client end.

Namely, we imagined that it'd have to be secure, so it couldn't just be
anonymous. Therefore, ssh keys would need to be exchanged between the
server and clients. The clients would have to be responsible for
contacting the server when they are available for more work, as many
machines might be behind IP NAT or on non-static connections.

Disk space would be necessary, as doing bulk builds requires a
specifically set up environment which would make a typical machine not
very useful. Therefore, a chrooted environment would be recommended on any
machine that is not going to be dedicated to only doing bulk builds.

The work on the client end is pretty much done; there's very little logic
there. The client, when run, contacts the server, is told what package to
build, auto-fetches any binary packages on which the package is dependent,
builds the package, uses scp to copy back the finished binary package,
then asks for another package. If the server doesn't have anything, the
client sleeps for an hour and queries again later.

The server is much more complicated. It needs to build a dependency tree
of the latest pkgsrc (there's already code to do that in the bulk package
scripts), then it needs to parse out jobs in order of complexity:

packages without dependencies
packages with no dependencies but that are used to build other packages
packages with no dependencies but that are necessary to run other packages

packages with dependencies that are not used to build other packages
packages with dependencies that are used to build other packages
packages with dependencies that are necessary to run other packages

There has to be logic to allow for the weighing of machines - a 68060
would have more weight than a 68030, and a machine with more memory would
have more weight than one with less... Then logic for deciding how long to
wait for a client to return work before deciding that the client is no
longer a part of the build team, and so on.

This all should be in a big database, as I'd like to imagine that one
pkgsrc tree and one set of code can be used to manage as many slow
architectures as NetBSD has (VAX is a good example of a port that needs
this). This would also be good for faster architectures where a dedicated
buld machine isn't set up: any updated packages can be automatically built
by the next machine very shortly after they get updated in the tree.

So here are some of the basic ideas. Who wants to start helping to
organise and code the server? Who's good with database code?

BTW - I have a 10 processory Quadra cluster that I have been building for
more than a year now which would be dedicated to package building using
this client / server setup. I'll have to put up pictures of it soon - it's
very cool looking...

Thanks,
John Klos
Sixgirls Computing Labs