Re: A new approach to package building

To: Taylor R Campbell <campbell%mumble.net@localhost>, tech-pkg <tech-pkg%netbsd.org@localhost>
Subject: Re: A new approach to package building
From: Jason Bacon <jtocino%gmx.com@localhost>
Date: Sat, 8 Jun 2024 19:01:40 -0500

On 6/8/24 09:03, Taylor R Campbell wrote:

Date: Mon, 3 Jun 2024 15:17:54 -0500
From: Jason Bacon <jtocino%gmx.com@localhost>

Last time I used pbulk (which was a while ago), there was a tradeoff
between running multiple package builds in parallel vs using multiple
cores for a single build.  Things may have changed since then, but a
quick look at the pbulk docs didn't reveal anything.

There was an option for using multiple servers that would help, but
processors would still be left idle much of the time, while builds sat
in the queue.


I think most users have settled on using pbulk with MAKE_JOBS=1 or 2
and many parallel workers, plus tweaking with PBULK_WEIGHT to get
large packages like firefox and libreoffice started as early as
possible.  In my anecdotal experience that leaves all CPUs pretty well
utilized for bulk builds from scratch.

Incremental bulk builds don't do as well -- e.g., if a leaf package
like firefox is updated on a branch, then you only get MAKE_JOBS
parallelism, not parallelism across package builds.  But then if
someone updates a major dependency like cairo or something, the CPU
utilization is good again.


This makes sense.  I was thinking more of package testing, which is
basically the same as incremental bulk builds.  If we need to build 5
packages, and one of them is LLVM, then we want to maximize MAKE_JOBS
for LLVM, but increase the number of workers when LLVM is not building.
This is where LPJS comes in.

Note that LPJS could either be part of an alternative to pbulk, or a
back-end to pbulk to improve hardware utilization.  Both options should
be explored.


That sounds reasonable to explore, with the caveat that getting
MAKE_JOBS wired up to LPJS _inside a package build_ might be a lot of
work; you'll have to teach bmake, and gmake, and meson, and ninja, and
whatever else, to all talk to the job server the same way that pbulk
does.


Some complexity, yes, but doable, and only has to be solved once +
maintenance.

All that said: the pbulk scan and resolve logic gives you the graph I
believe you're looking for (and pbulk scan itself is parallelized
across workers, rather effectively in my anecdotal experience).


As I recall, pbulk scan takes a loooooong time.  It might not be
necessary to actually build the entire graph ahead of time, though.  A
more dynamic approach, just descending the tree for one package at a
time might suffice.  Packages already present from a previous build can
be skipped.

I'll have to add job dependencies to LPJS before this is feasible.
That's a ways off, after the core functionality is well-debugged, but I
wanted to start chewing on this.

Thanks to everyone for the input...

	J

--
Life is a game.  Play hard.  Play fair.  Have fun.

Follow-Ups:
- Re: A new approach to package building
  - From: Jonathan Perkin

Prev by Date: Re: gdal 3.9.0, C++17, charconv, gcc7
Next by Date: daily pkgsrc CVS update output
Previous by Thread: Re: A new approach to package building
Next by Thread: Re: A new approach to package building
Indexes:

Home | Main Index | Thread Index | Old Index