pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Setting up bulkbuild



* On 2023-09-03 at 21:00 BST, Jan-Benedict Glaw wrote:

 It's now in the "Scanning..." phase and manages to "scan" about 250
packages per day, with a total of nearly 20k. So it'll be scanning for
about three months, give or take. Are there ways to speed this up?

The general approaches to speeding up the scan phase, regardless of operating system, are:

1. Run multiple pbulk-scan processes:

  There is no native support for this as none of my proposed patches
  have been accepted, but as a quick hack you can literally just run
  more copies of the "pbulk-scan -c ..." process, ideally up to as many
  CPUs as you have.

  If you only have a single CPU then you may still find running 2
  processes makes things faster, otherwise given you are running virtual
  machines, simply spin up more VMs.

  If you want to go the whole way and support chrooted scans and builds
  then my full patchset is available here:

    https://github.com/NetBSD/pkgsrc/compare/trunk...TritonDataCenter:pkgsrc:feature/pbulk/trunk

2. Enable options cache:

  This one is very straight-forward, set PBULK_CACHE_DIRECTORY=/var/tmp
  or similar in your mk.conf and pbulk will re-use any options that have
  already been calculated, as these are quite expensive.

3. Enable reuse_scan_results:

  If you are never going to change any configuration for your builds
  then it may be safe to set reuse_scan_results=yes in pbulk.conf and
  any subsequent scans will be faster.  However this is not suitable if
  you make any changes, and so I generally have this turned off.

4. Reduce forked commands:

  The biggest impact to scan speed is the fact that running 'bmake
  pbulk-index' in every single pkgsrc directly recomputes a whole bunch
  of variables that require running external commands.  This is seen
  most clearly by running e.g. dtrace when a pbulk-scan is happening and
  all you see is the same commands being executed over and over and over
  again.
For the latter I have a few approaches to improving it.

4a. Pre-compute builtin variables:

  If you are using the same OS for each build, then you can pre-compute
  the results of builtin variables to avoid having to recalculate them
  each time.  For example, for my SmartOS builds I have the following
  file included into my mk.conf:

    https://github.com/TritonDataCenter/pkgbuild/blob/master/include/varcache/20210826.mk

  Note that this is somewhat similar to the existing bsd.makevars.mk,
  however that is not used for scans as there is no work area, and an
  interesting fact is that bsd.makevars.mk actually makes things slower
  due to its use of shell!  A stark warning to avoid forks if ever there
  was one.

4b. Hardcode system variables:

  This one requires modifying mk/bsd.prefs.mk as the variables must be
  set early, but for example I use this on my macOS builds to avoid
  things like uname being called for every single 'make' invocation:

    https://github.com/TritonDataCenter/pkgsrc/commit/0084220e1b283401093db7efc4cdab08453dfa47

  Obviously things like this need careful updating every time you change
  anything on the host OS.

4c. Avoid expensive GCC calculations:

  One of my major objections to the whole GCC_REQD thing is that it
  slows everything down, due to having to run 'pkg_admin pmatch' for
  every single version listed in GCC_REQD for every package, along with
  pointless `gcc -dumpversion` commands to get version strings we don't
  use.

  I avoid some of this by hardcoding _GCC_REQD in my SmartOS version of
  the bsd.prefs.mk patch:

    https://github.com/TritonDataCenter/pkgsrc/commit/702ba199aadc56f5f58b7b3cb6880e430dd59266

  but there's definitely more to do here.  For example we don't even
  bother using MAKEFLAGS for some of the computed variables in gcc.mk!

As a guide, with these approaches in place, most of my scans take around 10 minutes, though of course that's on modern hardware not VAX.

--
Jonathan Perkin   -   mnx.io   -   pkgsrc.smartos.org
Open Source Complete Cloud   www.tritondatacenter.com


Home | Main Index | Thread Index | Old Index