Re: Distributed bulk building for slow machines

To: john%ziaspace.com@localhost
Subject: Re: Distributed bulk building for slow machines
From: Havard Eidnes <he%NetBSD.org@localhost>
Date: Thu, 06 Jan 2011 13:53:20 +0100 (CET)

> I'm interested in setting up a simple shared bulk build
> system for slower architectures which allows for
> developers' systems anywhere on the Internet to
> participate.

I've been thinking about such a setup and how some of the
features of it might be, but so far nothing more has come out of
it.  Let me therefore use this opportunity to air my thoughts.

> My basic setup would be:
>
> 1) A binary package tree on the main master server for
> any given architecture which starts empty and which is
> rsynced over ssh in both directions (either via cron or
> after the creation of a new package)

Yep.

> 2) A list of priority packages which is made before a
> bulk (old fashioned or pbulk) build is started

This could be a good idea.  Either this, or cause the priority
list to affect the scheduling of packages to build.

> 3) A method to use a fast machine to create a
> dependency tree which is used after the priority
> package list is finished

Yep, this is really important, even if the dependency graph
calculation is parallelized (ref. later entries in this thread).
The time it takes to compute the dependency graph is significant,
and even if it's parallelized, if the individual machines are
slow enough (as some of them in your list definately are), the
gains by doing this on a more modern machine would be huge.

> Essentially, a sandbox would be created on worker
> machines where sshd is run inside of the sandbox with
> ssh keys which give the master the ability to send
> commands and rsync files. The master would iterate
> through the package list and remotely run a "make
> package" for each, noting any failures, then possibly
> sync files afterwards.

This one I'm not so sure of, as this would prevent users sitting
behind NAT boxes from participating as clients or slaves.  It
could perhaps be better if the clients periodically reported in
their status, and if a client isn't heard from in a long while,
the corresponding work could be released and farmed out to
another client.

I had a separate smtp-like client/server protocol in mind which
could be used for reporting liveness/status and also to pull new
work from the master.

> It is expected that there are so many packages which
> are dependencies of others that unless we had an
> incredibly large number of volunteer machines we'd
> never have two machines trying to build the same
> package at the same time. Therefore, I'm not too
> worried about the package selection logic because we
> can just use the results of a dependency tree from the
> old bulk build scrips.

Similar to the other comment in this thread, this sounds to me
like a bad idea.  If we're using slow machines, there's no gain
to be had by duplicating work.  Instead, keep one machine
building a given package until it's finished or not heard from
in a while.

I think that there are a sufficient number of "independent" or
"initial dependencies" packages that you need a large number of
clients before you run out of initial work to farm out to the
clients, and you have to leave clients idling waiting for
results.


One thing to keep in mind, though, is this: all the machines
participating in such a distributed effort need to run the same
base OS release, and also run only the official release for the
duration of its contribution.

Thought also needs to be given to how the (presumed local) pkgsrc
repository is to be kept in sync, and what it would mean that
there might be slight version skews even if all were updated to
e.g. 2010Q3, since the update may have been done at different
times.  In my thoughts this gives rise to the possibility of a
request from the master saying "build package so-and-so version
x.y.z" and the client ending up saying "sorry, my pkgsrc doesn't
have version x.y.z of package so-and-so (yet)".

Then there's of course also the issue of ... um... security; up
till now packages on ftp.netbsd.org have been built by
developers, and a change to allow anonymous contributions (not a
given...) could be considered to be problematical.  Therefore,
it's a question of whether we need some administration of who
contirbutes cycles, and whether there needs to be some token-
based authentication scheme for a client to be able to
contribute.



Also... note that trying to build some packages are prone to
cause panics on at least m68k hosts, the message seen on my hp300
systems are (from memory) "out of address space", and I beleive
this has to do with the pmap implementation.  So when doing
distributed bulk builds for m68k, some packages need to be marked
"no way" (and/or someone needs to take a hard look at re-doing
the m68k pmap implementation...)  I seem to recall that some of
the clisp implementations would cause this, as will trying to
build icu, which is needed for lang/parrot, so no parrot (or
parrot-based perl6) for m68k unless this problem is fixed...


Best regards,

- Håvard

Follow-Ups:
- Re: Distributed bulk building for slow machines
  - From: John Klos
- Re: Distributed bulk building for slow machines
  - From: Aleksey Cheusov
- RE: Distributed bulk building for slow machines
  - From: Larson, Timothy E.

References:
- Distributed bulk building for slow machines
  - From: John Klos

Prev by Date: Re: Distributed bulk building for slow machines
Next by Date: RE: Distributed bulk building for slow machines
Previous by Thread: Re: Distributed bulk building for slow machines
Next by Thread: RE: Distributed bulk building for slow machines
Indexes:

Home | Main Index | Thread Index | Old Index