Subject: easy mechanism for parallel builds
To: None <tech-pkg@netbsd.org>
From: Geert Hendrickx <geert.hendrickx@ua.ac.be>
List: tech-pkg
Date: 06/15/2005 16:32:27
There has been a discussion about parallel build algorithms here a while
ago.  Someone proposed a mechanism to split up the pkg dependency tree
in "independent" parts, and to distribute these independent trees across
the participating machines.  I think this could be done much easier.  

The entire cluster should have write access to a central ftp package
repository, and to a "TODO" list.  This TODO list consists of a variable
for each package, with possible values "todo", "doing", "done", and
"broken" (if the build failed).  e.g.: 

graphics/freetype2=	done
x11/gtk2=		doing
www/firefox=		todo
www/xpdf=		todo

(simplified -- say firefox depends on freetype2 and gtk2, and xpdf
depends only on freetype2)

When a build machine is ready, it fetches the TODO list, and scans for
the first "todo" package.  In this case, firefox.  It checks the
dependencies: freetype2 and gtk2.  freetype2 is "done" (i.e. the binary
package is available in the repository), but gtk2 is "doing" (another
machine is busy building it).  So it shouldn't start building this
dependency, and it can't start building firefox either because not all
dependencies are ready yet.  So it skips over to the next "todo"
package: xpdf.  All its dependencies (freetype2) are done, so it can
download and install them, and start building xpdf.  xpdf must then be
marked as "doing", so that no other machine starts building the same
package, or any package depending on it.  firefox will be picked up in
another iteration, as soon as gtk2 is uploaded and marked as "done".  

This mechanism requires no advanced algorithms, is easy to implement
(the TODO file could simply be a shell-script where the "doing" and
"done" assignments are just appended to the end), and it allows adding
and removing build machines from the cluster at all times (when a
machine is taken out, it should simply mark its current "doing" package
again as "todo").  

The only extra requirement would be some kind of locking, so that no
build machine can fetch the TODO list as long as another one is parsing
it.  The other machine should first upload the modified TODO list (with
a new package in "doing" state), before the next machine can parse it.  

What do you think?  

GH

-- 
:wq