tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Extracting versions from pkgsrc tree taking hours - how to address?



On Sun, Sep 09, 2012 at 07:46:51PM +0100, David Brownlee wrote:
 > There are various tools which need to extract PKGNAME or similar
 > information from a pkgsrc tree, typically to compare against installed
 > versions or a set of binary packages.
 > 
 > The canonical way to do this is to run "make show-var
 > VARNAME=PKGNAME", which for a sample package on my Thinkpad T500
 > (pkgtools/lintpkgsrc) takes around 0.65 seconds.
 > 
 > For a full pkgsrc+wip tree (~14000 packages) this process would
 > probably take around 2 hours 45 minutes.

It's worse than that. Packages with deep bl3 trees take a lot longer.

valkyrie% time make show-var VARNAME=PKGNAME
lintpkgsrc-4.85
0.110u 0.083s 0:00.19 100.0%    0+0k 0+2io 0pf+0w

valkyrie% time make show-var VARNAME=PKGNAME
eog-2.32.1nb9
0.913u 0.725s 0:01.62 100.6%    0+0k 0+3io 0pf+0w

That's with a warm cache, too; on a machine that doesn't have enough
RAM to hold the pkgsrc tree it's a lot worse.

 > Some tools try to mitigate this by embedding partial Makefile parsers
 > (lintpkgsrc is a particularly gregarious example) and produce a result

I think you mean "egregious"

 > in around 4 1/2 minutes (somewhat over 30 times faster), though the
 > knotty perl code that does this is difficult to maintain and would
 > make the world a better place by its absence.
 > 
 > Aleksey has a pkg_micro_src_summary which can extract PKGNAME and
 > similar in a subset of cases.
 > 
 > Caching information about the pkgsrc tree could help, but troublesome
 > given a pkgdir can generate different PKGNAMEs based on just about
 > anything from installed packages, mk.conf, environment variables to
 > phase of the moon (*).

It would be a good thing to have a standard cache mechanism, and to
make things like lintpkgsrc and pbulk use it, to reduce the amount of
repetitive grinding they all do.

 > Some questions/thoughts:
 > 
 > summary information format:
 > - If anything does generate summary information for pkgsrc tree is
 > there any reason not to use the same pkg_summary format used for
 > binary packages

Good question... where's that format documented?

 > - If not, I'll at least update lintpkgsrc to ingest and
 > excr--...export in that format
 > 
 > generating summary information:
 > - Should we have a specific in tree tool for generating pkgsrc summary
 > information, which longerterm lintpkgsrc and other tools should be
 > depending upon (even if they only use the output directly and there is
 > no cache).

Probably.

At the very least there should be a single library for doing ad hoc
makefile parsing rather than having the logic cut and pasted into a
lot of tools.

Fixing make to go faster is highly desirable... but not at all
trivial. :(

 > speeding up parsing:
 > - Would it make sense to adjusting pkgsrc to help quick mechanical
 > parsing, for example if we moved to a default of defining PKGNAME and
 > deriving DISTNAME from PKGNAME,
 > - then the majority of pkgsrc Makefiles could be "quick parsed" on the
 > rule "contains PKGNAME= with no $', the remaining being passed through
 > make as normal

I'm not convinced this would actually help with this particular
problem. However, it might be tidier to work with in the long run...

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index