tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: infrastructure change needed for python modules



(again to all concerned people, hit the wrong reply first)

Am Thu, 25 Nov 2021 23:03:33 +0100
schrieb Thomas Klausner <wiz%netbsd.org@localhost>: 

> I've also started a discussion about best practice here:
> 
>      https://discuss.python.org/t/bootstrap-suggestions/12202  

You get some hero-points from me for that. I see that you have trouble
getting through. Maybe one needs to emphasize that pkgsrc is no
_python_ distro but a distro of _everything_ that wants to be able to
do minimal bootstraps from source, but still manage its own binary
package format.

Fully binary Linux distros cheat a lot about the bootstrapping … they
bootstrap with earlier binaries. That might be why people don't really
understand the issues when you want to stay pure with the source (hello
mrustc!).

Also the importance of having control over the separate time-consuming
and system-modifying (installing into DESTDIR) parts … this gets lost
for people who see the use case of a user just wanting some magic to
get the package and all its deps without much worrying about how.

We might have to build wheels and unpack them to convert to pkgsrc
binary packages, if wheels is what everyone assumes you want as result
of a build. Testing just disappears. I guess tests should be less
important, if there weren't the numerous C bindings of python that can
break in funny ways with the compiler infrastructure.

On the upside: If we could get the clean bootstrap process for pip
running, it would be nice if we could employ its help to actually
define the pkgsrc packages. After all, it's mostly a translation of the
metadata the Python world already organizes.

Mabe even having a mechanism to snapshot the state of PiPI matching a
pkgsrc release (except for specific explicitly packaged modules  that
have differing versions for a reason) and then having a way to install
Python packages outside of the curated space of pkgsrc (we never vet
them all!) via pip and ad-hoc package creation from the PyPI metadata
in a reproducible manner. So installing packages in a controlled way,
managed by pkgsrc, but without having them all actually manually
packaged in pkgsrc CVS.

Disclaimer: My use case is freezing a pkgsrc release and installing the
versions from that release, even if I do that years later, so that I
also can reasonably reproduce the identical setup on another machine at
later time.

I'd love such an integration also for the other packaging ecosystems.
We  do package bits of the rust, go, perl, python, ruby, npm, etc.
world, but we never could package and verify sanity of all of them — as
indeed large parts will be broken anyway.

I am wondering about the same issue for source-based Linux distros.
Duplicating the work of the package indices is insanity, yet we're
attempting it. But we want some control, pinning versions that at least
have been tested to work together, etc. So far I only encountered
tooling that makes it easier to create a package with the help of these
indices, but you still have to create all that packages beforehand. The
sheer number of upstream software modules/libraries with
interdependencies just wears one out. The manual churn kills all joy
for package managers.

There are 341,427 projects on PyPI (CPAN has ‘just’ 43,020
distributions). We're never going to explicitly package that. Maybe
that's also why the Python community doesn't get the idea that anyone
would want to package even parts of it outside pip prefixes.

Giving tooling for behind-the-scenes packaging to the user for modules
that are no dependency of other packages in the system to still have
everyting under control would be really nice. You'd need some extra
magic for pkgin pyXY-foo/1.2.3 (yes, maybe even with choosing versions)
to map to the actual upstream module and its environment.

You think such a thing is feasible/desirable at all? The alternative is
to install basic stuff only from pkgsrc and then have either per-user
CPAN, R, Python, etc. prefixes or have the admin mess around with common
ones on top of pkgsrc, with different tracking mechanism. For the
per-user case it should be fine that e.g. pip always pulls the latest
and greatest, for the admin it may be more troublesome to keep things
consistent and repeatable in the sense I indicated above for my
use-case.

Right now, I am just trying to provide a base of python, with things
like NumPy built in a controlled manner. The users can install the heap
of (hopefully mostly pure-python) modules in their HOMEs. If many users
have similar requirements, this is wasteful, of course. But then,
people often use anaconda and/or containers, bypassing the whole
business of central software installations and making things a lot more
wasteful to the point that our central storage is overwhelmed by having
to deliver each user's separate heavy software stack each time a compute
job starts. Big Data starts with piles of software in little pieces
that amount to a heap of unmanageable data themselves.

A question for me indeed may be how pkgsrc stacks up not only against
Spack, but also against the anaconda distribution. I wonder how much
human-hours the people behind that need to spend on building their
packages.


Alrighty then,

Thomas

-- 
Dr. Thomas Orgis
HPC @ Universität Hamburg

-- 
Dr. Thomas Orgis
HPC @ Universität Hamburg


Home | Main Index | Thread Index | Old Index