tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Distributed bulk building for slow machines



Essentially, a sandbox would be created on worker machines where sshd is run inside of the sandbox with ssh keys which give the master the ability to send commands and rsync files. The master would iterate through the package list and remotely run a "make package" for each, noting any failures, then possibly sync files afterwards.

This one I'm not so sure of, as this would prevent users sitting behind NAT boxes from participating as clients or slaves. It could perhaps be better if the clients periodically reported in their status, and if a client isn't heard from in a long while, the corresponding work could be released and farmed out to another client.

Between port forwarding and IPv6, I don't think this is a problem. If someone is stuck behind NAT and can't get a port forwarded, perhaps we can figure something out. But because we're talking about building binaries which will be officially offered on NetBSD servers, it'd have to be ssh (or otherwise encrypted).

I had a separate smtp-like client/server protocol in mind which
could be used for reporting liveness/status and also to pull new
work from the master.

So long as file transfer is always done via ssh... I can't think of any way someone could compromise security by telling a worker to build some other package than what it's supposed to.

Similar to the other comment in this thread, this sounds to me like a bad idea. If we're using slow machines, there's no gain to be had by duplicating work. Instead, keep one machine building a given package until it's finished or not heard from in a while.

I don't want to duplicate any work. Perhaps I was misunderstood. What I was getting at is if there are 1,000 packages which are dependencies for other packages, we'd need 1,001 machines before we'd have to worry about that 1,001st machine building a package for which a dependency doesn't already exist, and therefore it tries to build that dependency itself.

But it sounds like this isn't anything we need to worry about because distbb takes care of all of that.

I think that there are a sufficient number of "independent" or "initial dependencies" packages that you need a large number of clients before you run out of initial work to farm out to the clients, and you have to leave clients idling waiting for results.

Exactly.

One thing to keep in mind, though, is this: all the machines
participating in such a distributed effort need to run the same
base OS release, and also run only the official release for the
duration of its contribution.

That's why I think a sandbox would be preferred. It might also be nice to trick the building tools into reporting whatever version of the kernel we want, based on the OS version in the sandbox, not based on the booted kernel. (If I remember correctly, this is possible, but I forget how)

Thought also needs to be given to how the (presumed local) pkgsrc repository is to be kept in sync, and what it would mean that there might be slight version skews even if all were updated to e.g. 2010Q3, since the update may have been done at different times. In my thoughts this gives rise to the possibility of a request from the master saying "build package so-and-so version x.y.z" and the client ending up saying "sorry, my pkgsrc doesn't have version x.y.z of package so-and-so (yet)".

In my imagined world, the master would tell the slave to cd to /usr/pkgsrc/lang/perl5 and make package, not make perl-5.12.2nb1. Not sure how distbb does it.

Part of the rationale behind this entire thing is that packages will be built slowly, so the tree would get updated often relative to the number of packages built. It'd be almost assumed that some archs would never finish. Therefore, there'd be times when builds would be stopped, the pkgsrc tree updated, and building would be restarted.

Imagine, for instance, what might happen if there's a security issue in perl. Will we wait for the entire 10,000 packages to be finished, then restart building? Heck no. We'd update the pkgsrc tree, make sure all of the "priority" packages which depend on perl are rebuilt, then either restart or resume the bulk build.

Then there's of course also the issue of ... um... security; up till now packages on ftp.netbsd.org have been built by developers, and a change to allow anonymous contributions (not a given...) could be considered to be problematical. Therefore, it's a question of whether we need some administration of who contirbutes cycles, and whether there needs to be some token- based authentication scheme for a client to be able to contribute.

I'd assume that all machines would be in the control of NetBSD developers and we'd exchange ssh keys on one of the NetBSD project servers.

If someone has a good amount of usable system resources, we'd probably have to deputize them and have them agree to the same kinds of things we developers agree to. We can figure that out when we get there.


Also... note that trying to build some packages are prone to cause panics on at least m68k hosts, the message seen on my hp300 systems are (from memory) "out of address space", and I beleive this has to do with the pmap implementation. So when doing distributed bulk builds for m68k, some packages need to be marked "no way" (and/or someone needs to take a hard look at re-doing the m68k pmap implementation...) I seem to recall that some of the clisp implementations would cause this, as will trying to build icu, which is needed for lang/parrot, so no parrot (or parrot-based perl6) for m68k unless this problem is fixed...

This is more justification for sandboxes. I've updated my m68k kernels based on Michael Hitch's recommendations to mitigate pmap issues for the moment, but the builds should reflect a generic OS version number and the userland in which they're built should be the same as what people download from ftp.NetBSD.org. Yes, we don't want to panic machines. More importantly, though, is that the underlying problem gets fixed sometime soon!

John


Home | Main Index | Thread Index | Old Index