tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: texlive install scripts are unbearably slow



On Fri, Sep 02, 2022 at 03:40:07PM +0000, Taylor R Campbell wrote:
 > > Date: Fri, 2 Sep 2022 14:44:27 +0100
 > > From: Jonathan Perkin <jperkin%mnx.io@localhost>
 > > 
 > > This has bugged me for a while.  I spend quite a bit of time on pkgsrc 
 > > performance to make it as fast as possible, but I'm told there's no fix 
 > > for the fact that texlive-collection-all (a meta package that literally 
 > > only runs pkg_add) takes nearly 3 hours to build simply due to the 
 > > texlive install scripts rebuilding various databases.
 > > 
 > > Is there really nothing we can do to solve this?  I don't know tex at 
 > > all, and don't know why these can't be provided by individual packages 
 > > and combined at install time, or some other caching mechanism.
 > > 
 > > If someone can point me to a potential solution I'm more than happy to 
 > > work on it.
 > 
 > I suspect the main problem is that the INSTALL script builds a cache
 > that takes O(n) time for n packages, so it takes O(n^2) time overall.
 > At least part of it, if not all of it, is mktexlsr (rebuilding output
 > of `ls -R') in print/kpathsea/INSTALL.  There might be something
 > similar for fonts, or mktexlsr might do that too, not sure.

A few years back someone did something that improved this situation a
lot, but I was never clear on what. (Before then it was far worse.)

 > My idea was to teach pkg_add to batch up the install scripts that do
 > this kind of caching so that this runs only once at the end.

Won't help (for builds at least) unless there's a way to know to save
up the duties across pkg_add runs and flush them once when
everything's done. :-(

(and then also it needs to be able to know that O(n) runs of mktexlsr
at once are equivalent to just one)

I think this can be done without needing to add explicit support to
pkg_install as follows:

(1) add some machinery that the mktexlsr package can use to cause two
extra scripts to go into its pkgdb dir:
	$PKGDB/mktexlsr-2020/REBUILD.dirty
	$PKGDB/mktexlsr-2020/REBUILD.clean

Running the former creates $PKGDB/mktexlsr-2020/REBUILD.needed;
running the latter runs mktexlsr and deletes this file. And maybe runs
it unconditionally if run with -f.

(2) add some machinery that texlive/package.mk can use to cause the
generated INSTALL and DEINSTALL scripts to run
$PKGDB/mktexlsr-2020/REBUILD.dirty.

(3) also add a way for pkgsrc packages to cause
$PKGDB/mktexlsr-2020/REBUILD.clean to be run before building.
(This is not needed for tex but there are other rebuild cases we'll
want it for.)

(4) Probably the hooks should also have names rather than be
global-per-package. mktexlsr isn't good for anything else, but one
could imagine indexing tools where there are multiple indexes to
rebuild and it's not desirable to rebuild them all every time. This
doesn't really require more than sticking the names into REBUILD.dirty
and having REBUILD.clean accept them.

(5) I'm not sure how to set up the pkgsrc-level machinery so that
referring to nonexistent or misspelled rebuild hooks will fail
immediately instead of at pkg_add time. Maybe it's sufficient to have
the client packages do .include "../../print/mktexlsr/rebuild.mk" to
centralize the definitions.


TBH I think it would be better to add explicit support to pkg_install
than to continue abusing autogenerated install/deinstall scripts for
yet more purposes, but it's a lot more work that way plus a bunch of
compat headaches.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index