pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkgsrc scanning performance benchmarks

On 12/2/2016 04:12, Joerg Sonnenberger wrote:
On Thu, Dec 01, 2016 at 07:54:27PM -0600, John Marino wrote:
I do not believe the pkgsrc framework is 28 times more complex than the
Ports Collection framework.  It's just much more inefficient.  I know such
statements rankle some pkgsrc devs, but numbers don't lie.

If you compare Apples and Oranges, numbers do lie. It might surprise
you, but it is a well known fact that the tree scanning i.e. as part of
the bulk build is a very time consuming component.

This is a direct Apples-to-Apple comparison. Both trees are given the same exact task to do. It's on pkgsrc if it uses a poor implementation to do the same task.

There have been hacks proposed in the past to replace the make extraction,
but none of the proposals actually work properly, because they disable
important functional parts. This *is* a case where pkgsrc is actually
significantly more complex than ports.

Architectionally, there are three bigger parts that slow things down as
far as the scan phase is concerned:
(1) Finding the builtins and computing the resulting versions.
(2) Reducing patterns by merging ranges.
(3) Include recursion via

I think there are more mundane causes that are compounded as well. But for the sake of argument, all this means is the architecture is fundamentally flawed with regard to performance. This has been demonstrated by ports, namely achieving the same result but a magnitude faster.

The first part could be optimized to avoid needless recomputation for a
bulk build, but it is requires figuring how a reliable caching
mechanism and reviewing the side effects of existing files.
AFAIK, Ports doesn't have anything like this or at least only for very
isolated items.

apples-to-apples, right? As you said, FPC has the needed recalc too, but 27000 times (12,000 times more than pkgsrc).

The second part is done with the help of some external scripts because
doing it in make internally is pretty much impossible. A single
monolithic program would be faster than the repeated pkg_admin pmatch
calls, but I don't think the total time spend on this justifies the

I suspect that pkg_admin (which incidentally severely limits portability of pkgformat) is one of the prime culprits. Thus this technical requirement of pkgsrc might be unique and cause what I would call an unacceptable performance hit.

anyway the solution would be the opposite -- not require an external program -- because that causes multiple problems in addition to a grand performance hit.

The last one is far more tricky. The includes hit a number of
scalability problems in make; some of them might be fixable in the
implementation, but many are likely unavoidable without actually
introdcuing e.g. lists into the core language. The history of
mk/ should be illuminating. Ports doesn't have this
problem due to the flat dependencies. There have been discussions about
potential ways to improve the situation. One change in the past was to
improve the include guards as found by cube. See mk/

I've see a single change remove minutes from a full tree scan. The numbers I've shown here are improved from using a full scan with another method because defines variables in the internal make.conf (mk.conf) predefines several variables to avoid unnecessary spawning (e.g. uname -s). Without those tricks, the scan would be much longer on both package systems.

So yes it's worth making a policy to look at each implementation in the terms of performance and revamp existing one and deny slow proposed ones. Even small ones matter.

I think 5x slower for an architecture you want is reasonable, but 28x slower is beyond reasonable.

As I said, I hoped these apples-to-apples number would finally shed light on this, but step 1 to sanity is to admit you have a problem.


This email has been checked for viruses by Avast antivirus software.

Home | Main Index | Thread Index | Old Index