Subject: Re: pkgsrc on SMP machines
To: Hubert Feyrer <feyrer@cs.stevens.edu>
From: Lars Nordlund <lars.nordlund@hem.utfors.se>
List: tech-pkg
Date: 12/17/2005 13:40:57
On Sat, 17 Dec 2005 04:10:12 +0100 (CET)
Hubert Feyrer <feyrer@cs.stevens.edu> wrote:
> I'm sort of bored of repeating this again and again.

This is an example of the level of parallelism that exist in pkgsrc.

On my machine I have 395 packages installed. But I am not using KDE, so
most of the KDE packages are not installed. Now suppose I want to
install meta-pkgs/kde3 on my system. And also suppose that each package
takes 4 seconds to build. :-)

I do not have the machines required to run real tests right now.

# cd /usr/pkgsrc/meta-pkgs/kde3; make parallel

will give me the dependency tree in makefile syntax. I save it to a
file and use sed to change build commands to '@echo $@; sleep 4'.

make -j32  0.11s user 0.14s system 0% cpu 28.210 total
make -j16  0.12s user 0.15s system 0% cpu 28.218 total
make -j8  0.09s user 0.14s system 0% cpu 32.189 total
make -j4  0.13s user 0.10s system 0% cpu 44.226 total
make -j2  0.09s user 0.13s system 0% cpu 1:12.28 total
make  0.09s user 0.10s system 0% cpu 2:08.33 total

Do you want to wait 28 seconds for KDE, or more than 2 minutes? :-)

I agree that none of the packages in the KDE dependeny chain are built
on 4s. There is a "dependency waist" with kdelibs3 and kdebase3. These
will (probably, hard to tell from this test) be built alone.

There is also some time involved in generating the Makefile. On my
machine, with the current selection of already installed packages, it
takes:
make parallel  1100.64s user 711.73s system 129% cpu 23:19.47 total


Q: Why generate the makefile on the fly all the time?
A: To get an updated view of the current machine. If leaves are built
which are already installed, the install step will fail and the
top-level make will consider that part of the build tree a failure.

Q: How come it takes so long time to generate the Makefile?
A: It is the check if a package is installed which take the most of the
time. Furthermore, due to implementation details this is done twice for
every package involved.

Q: How can this be adapted for normal users?
A: Hook it into pkg_chk or pkg_select. Let users mark a bunch of
packages they want built and when they have selected them all, run
fetch-list to download them and then create a dummy meta-pkg that
depends on the selected packages. Launch a make parallel in there and
wait.

Q: Why run fetch-list? Why can't I let the pkgsrc system download the
distfiles when they are needed?
A: There is a problem with multiple packages wanting to download the
same distfile. Some kind of lock mechanism must be implemented.

Q: I think make parallel takes too much time!
A: On packages with all their dependancies installed, or few
dependencies to begin with, it is quick. In archivers/zoo it takes:
     make parallel  1.18s user 0.54s system 125% cpu 1.367 total
On "longer" packages like KDE it takes a long time. But I argue that
the total build time will be cut even more when being able to make use
of extra CPUs during the actual build.

Q: What does a generated makefile look like?
A: Like this (long line cut):
$ make parallel
all: archivers/zoo
archivers/zoo: 
        sh -c "cd /usr/pkgsrc/archivers/zoo/../../archivers/zoo; \
	env MAKEFLAGS= /usr/bin/make -X install clean"

Q: But I want free CPUs all around the globe to compile code for me!
A: There are no silver bullits.


Best regards,
	Lars Nordlund