pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wip/pkgchkxx: a complete reimplementation of pkg_chk and pkg_rolling-replace



On 7/23/23 01:55, Greg Troxel wrote:
PHO <pho%cielonegro.org@localhost> writes:

I just created a new package wip/pkgchkxx. This is a complete
reimplementation of pkgtools/pkg_chk and pkgtools/pkg_rolling_replace
in C++17.

WOW!  As much as I like to hate on C++, this is very cool and I'm
looking forward to trying it.

Yeah, please try it!

I wonder -- why did you decide on C++17 vs 14 vs 11?  This is a
foundational tool for package management, and I'm not really clear where
C++17 isn't ok with the base system compiler.  Is it just ancient LTS
(RHEL/CentOS)?  Also NetBSD 8?

In the beginning of my work I tried hard to stick with C. Yes, plain old C. But I quickly got tired of its low-level nature. I am a Haskeller who really doesn't like to code something complicated in a language just a little bit convenient than ASM. So I decided to go with C++11, which is still not as non-portable as Haskell, but... C++11 without Boost is really a nightmare, and of course I didn't want the tool to depend on Boost! So I switched to C++14 and realized that std::shared_ptr and std::filesystem didn't exist in that version. This is how I ended up on C++17. I knew people would hate me for requiring C++17 for the tool but I thought it's okay as long as it stayed in wip :D

We currently have a pkglint problem, where the main pkglint needs go,
and there is deficient pkglint for non-go machines.  That's different
in a key way, which is that pkglint is only really needed where people
do pkgsrc development, and pkg_rr is needed on any machine where people
build packages.

Yeah, I don't intend to replace pkg_chk or pkg_rr with my implementation for that reason.

True, but the biggest time sink is pkg_chk for -u.

* pkgchkxx(8) takes options fully compatible with pkg_chk(8).

You didn't say, and probably this is future, but the biggest problem
right now in the pkg_rr/pkg_chk world is failing to deal with
multiversion packages like pyNN-foo, where it basically errors if NN is
not PYTHON_VERSION_DEFAULT.  That requires pkg_chk to output both
PKGPATH and some version kv objects, and carry that around as the thing
that needs doing, rather than just PKGPATH.

My implementation does something better than the original code, but I can't remember what changes I made, which is why I didn't say anything about it.

* pkgrrxx(8) takes options fully compatible with pkg_rolling-replace(8).
* Unlike pkg_rr, you can run pkgrrxx as a non-root user. It makes use
   of ${ROOT_CMD} whenever it needs a root access.

Amusingly I have an extra "sudo" in one place in my installed copy.  I
should get that committed, as I think we're going to be keeping the
shell implementation for a while, especially depending on how burdensome
the language requirement is.

Yup, probably forever.

* "pkgrrxx -u" runs roughly N times faster than "pkg_rolling-replace
   -u" where N is the number of CPUs you have.

I assume you mean mean the wall clock time of pkg_rr itself, not builds
that it calls.

Right, that's what I mean.

* "pkgchkxx -aur -b" runs 11x faster than "pkg_chk -aur -b" when
   pkg_summary file is available.

I don't follow this.  I didn't think pkg_-chk used pkg_summary, and
pkg_summary is generally collected metadata for a bunch of binary
packages.  But this is about comparing installed packages to the source
tree.

It's about comparing installed packages to the set of available binary packages. When pkg_summary is available it just needs to read the summary, otherwise it has to decompress binary packages and see what are in them.

* "pkgchkxx -l" runs 185x faster than "pkg_chk -l" when pkg_summary
   file is available, and runs 24.8x faster when it's unavailable (and
   needs to scan archives).

"archives"?

I meant binary packages.

[be afraid and backup]

Fair enough, but is there anything that this does other than "make
replace package clean" and pkg_admin set?  It really should be pretty
low risk.

Yup, the only thing it does is to "make replace" and "pkg_admin set". But if it replaces packages in a wrong order, your system will surely go unusable.

Also do not question the value of faster pkg_rr when the most
time-consuming part of its job is to actually rebuild packages!

I don't follow this.

If you mean "even if the most time is the actual rebuild, it's still a
big improvement for pkg_rr to be faster", I agree.  Indeed for a pkg_rr
run that is going to take 24 hours, it doesn't really matter.  But it
does matter for things that are going to take 5 minutes.

Yeah that's exactly what I mean.

The other thing that is wrong with pkg_rr (and always has been, but it
has been increasingly painful) is that many packages have lots of
build-only dependencies.  pkg_rr goes over the installed set and
extracts the dependencies (which are recorded) but doesn't extract the
build deps (which need a make invocatino) until building each one.
However it would probably be better to get the TOOL_DEPENDS and
BUILD_DEPENDS dependencies up front, and this can be parallized.

My implementation parallelizes it. I mean it doesn't extract build dependencies up front, but it at least invokes make(1) parallelly in the "checking if xxx has new depends" phase.

Also in the brave new world of cross, it would be nice to be able to
somehow DTRT about TOOL/BUILD when running pkg_rr for a cross build.  I
have so far not contemplated or tried.

I haven't considered that. Don't know what to do in that case.


Home | Main Index | Thread Index | Old Index