pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkgsrc scanning performance benchmarks



> Am 02.12.2016 um 11:12 schrieb Joerg Sonnenberger <joerg%bec.de@localhost>:
> 
> On Thu, Dec 01, 2016 at 07:54:27PM -0600, John Marino wrote:
>> - a killer: using ${MAKE} to call itself to get the value of a makefile
>> variable.  As an example, even grepping for a constant is a couple of
>> magnitudes faster than ${MAKE} -C ${dir} -VMYVAR
> 
> "I do not understand what is going on, but I claim that it is
> unnecessary and wrong“.

Someone is coming with data and gets flamed in response.

FWIW, I started working on a pkgsrc scanning library in Go a while ago. The main performance optimization I used back then (and which could probably be brought into pkgsrc’s scanning too) was this:

Most variable checks do not involve any processing by make. I grepped the Makefile (only) for a declaration of the variable and checked if it was a simple „FOO=bar“ line, without metacharacters such as $. In this case, I used the value directly, in all other cases, I forked make. This simple heuristic was used in about 3/4 of all lookups and much faster.

> 
>> I do not believe the pkgsrc framework is 28 times more complex than the
>> Ports Collection framework.  It's just much more inefficient.  I know such
>> statements rankle some pkgsrc devs, but numbers don't lie.
> 
> If you compare Apples and Oranges, numbers do lie. It might surprise
> you, but it is a well known fact that the tree scanning i.e. as part of
> the bulk build is a very time consuming component.

It surprises no one.

Jonathan Perkin has done some work eliminating the lowest hanging fruit in scanning. I suspect that more gains can be had by looking carefully at how buildlink files are evaluated.

FWIW, scan performance improvements would be very welcome. Current scan times are bordering on ridiculous.


> There have been hacks proposed in the past to replace the make extraction,
> but none of the proposals actually work properly, because they disable
> important functional parts. This *is* a case where pkgsrc is actually
> significantly more complex than ports.

The trick is recognizing when to use the full make invocation.

> Architectionally, there are three bigger parts that slow things down as
> far as the scan phase is concerned:
> (1) Finding the builtins and computing the resulting versions.

If this is a significant time of the scan (I have not checked), one way to fix this would be:

After bootstrapping, evaluate all the builtin.mk files (there are a hundred or so) once and write the resulting variables into mk.conf.

> The second part is done with the help of some external scripts because
> doing it in make internally is pretty much impossible. A single
> monolithic program would be faster than the repeated pkg_admin pmatch
> calls, but I don't think the total time spend on this justifies the
> cost.

Again, we need to measure first, then fix things.


Home | Main Index | Thread Index | Old Index