Subject: SoC Part I: pbulk
To: None <tech-pkg@netbsd.org>
From: Joerg Sonnenberger <joerg@britannica.bec.de>
List: tech-pkg
Date: 05/16/2007 18:27:53
--sdtB3X0nJg68CQEu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi all,
attached is a summary of the parallel bulk build system. Feel free to
ask for clarifications or enhancements.

Joerg

--sdtB3X0nJg68CQEu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="pbulk.txt"

The parallel bulk build system
==============================

Overview
--------

For pbulk, three different phases are run. The phase are
tree-scanning/prebuild, build and post-build.

The pbulk system is modular and allows customisation of each phase. This
is used handle to full vs. limited bulk builds, but also to handle
environental differences for parallel builds.

Tree-scanning and prebuild phase
--------------------------------

The heart of the tree-scanning code consists of the pbulk-index and
pbulk-index-item make targets. For full bulk builds, a list of all
directories is compiled and the pbulk-index target called in each.
Optional parallel scanning can be done using a client/master mode where
the output is forwarded to the master over a socket.

The entries are sorted by global scanning order, aka SUBDIR list in the
main Makefile and the category SUBDIRs. Duplicate entries for a PKGNAME
are ignored. The output of the build specifies the variables used by
pbulk for building, filtering the upload and creating the reports.

After all packages and dependencies have been extracted, the global
dependency tree is built by resolving the listed dependencies. Packages
with missing dependencies are marked as broken. The directories in the
_ALL_DEPENDS are used as hints, but can be overriden.

For partial builds, two different mechanisms could be used. I'm not sure
which is the better. For both a list of directories to scan is given.
For the first idea, pbulk-index is called that gives all possible
packages to create. Those are filtered by a pattern. The second approach
is to list the options directly and call pbulk-index-item instead.

Dependencies are resolved for partial builds as well, but missing
dependencies are searched by calling pbulk-index in the given
directories. Those that fulfill the patterns are adding to the list and
the process is repeated.

In preperation for the tree-scanning, ${PREFIX} will be removed and
recreated from bootstrap kit. Alternatively, pkg_delete -r \* followed
by checking for unlisted files and directories could be used, which is a
lot slower though. The bootstrap kit is prefered for the build phase
anyway.

Build phase
-----------

Based on the dependency tree, the master process creates an internal job
list and hands out the information from the tree-scanning (e.g. which
parameters to set) to clients on request. The client is pulling for jobs
and reports back when the build is done (or failed).

For normal bulk builds, the ${PREFIX} will be removed before the build,
all dependencies added via pkg_add and the package removed after it was
successfully packaged. Depending on the environment, pkg_add can use FTP
or PACKAGES be resynced before. Similiar requirements apply for writing
back the build log. Recommented configuration is a mostly read-only
system.

Post-build phase
----------------

Once the build is done, four different tasks are left:
- gather all failed packages, upload the build logs and send mail to the
admin
- create the pkg_summary file for all packages
- create signatures for the index/all packages
- upload all (unrestricted) packages

This might need human intervention.

--sdtB3X0nJg68CQEu--