[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: [changed subject to] pkgsrc in scientific computing
You present some interesting ideas...
I've given some thought to many of the issues you mention below like
Intel compilers, MKL, commercial MPI implementations, etc. and decided
(for now) to keep them separate from our pkgsrc use.
In the enterprise Linux environment, pkgsrc easily serves to solve one
As you're probably aware, but I'll state for everyone's benefit, the
standard RHEL/CentOS Yum repository is meant to ensure system stability
and long-term binary compatibility for commercial enterprise software,
not support running the latest open source. Hence, it provides older
versions of tools and libraries that are back-patched for security
holes, but otherwise not updated for years in some cases. It's also a
very small package collection compared to Pkgsrc, Debian packages,
FreeBSD Ports, MacPorts, etc.
Trying to build the latest scientific apps against Yum RPMs is therefore
In my view, what pkgsrc does best is allow us to very easily manage the
latest open source apps in an environment that's virtually independent
of the Yum repos. This is easily done by boostrapping with the
Running ldd on most of our pkgsrc binaries reveals that the only Yum
libraries they use are libc and libm.
In a nutshell:
1) We use pkgsrc to quickly deploy mainstream versions of open source
software like R, genomics tools, etc. built with the stock GCC suite.
2) Commercial applications are supported by Yum, as intended.
3) For the few open source apps that need to be highly optimized (e.g.
WRF weather model), we still do a few caveman installations using ICC,
MKL, etc. This seems to be how most software is installed in HPC, by
the way, so imagine the time savings pkgsrc could provide in theory.
I attempted to bootstrap a pkgsrc tree with ICC recently, and decided I
couldn't justify the time it would take to make it work well. I think
this would be the route to use if one wanted to incorporate
closed-source components like MKL into pkgsrc packages, though.
On the other hand, an option in the R package to use something like
openblas/goto or atlas would be worth pursuing in my view. This should
be fairly easy and it would be mostly portable.
I've also been working with another packager on developing MPI packages
that install in $PREFIX/openmpi, $PREFIX/mpich, etc. so that they can
coexist. The same install prefix is used for libraries and apps that
depend on them, so you could have, for instance, multiple fftw packages
installed under those same prefixes.
In my experience, I.T. man-hours are the scarcest and most costly
resource, even in HPC. Most of our users don't benefit in any
meaningful way from the speedup that come from using ICC, MKL, etc.
Running software from ordinary GCC-based pkgsrc packages on a cluster or
grid reduces months or years of computation to hours or days, and
another 20% speedup isn't worth even a modest investment of our time.
The cost of the extra core-hours is a fraction of the cost of our time
to optimize every build, plus we can usually deploy things much sooner
using existing packages.
So, my focus is on creating portable pkgsrc packages that can be quickly
deployed on our CentOS clusters and at the same time can be leveraged by
users of NetBSD, Dragonfly BSD, Darwin, etc. I think the larger the
collection of scientific packages becomes, the more people in the
research community will be encouraged to join the cause, which will
ultimately benefit everyone, regardless of which OS they use.
On 7/7/15 4:24 PM, Thomas Orgis wrote:
Am Tue, 07 Jul 2015 15:28:49 -0500
schrieb Jason Bacon <jwbacon%tds.net@localhost>:
On our HPC clusters, I simply bootstrap a whole new tree about every 6
to 12 months to make newer software versions available.
Ah, so we're indeed on the same page there. We are deploying our first
big setup based on pkgsrc for common software, but always thinking
about other stuff on top.
One tricky thing is how to handle differing compilers, especially since
C++ and Fortran modules are not compatible between them. One solution
is simply not to use those and write wrappers over C in your own code,
but do you happen to deal with getting stuff like HDF5 from pkgsrc with
The we got differing MPI implementations. Various commercial software
on top. We want to offer the whole deal and are having endless debates
on how to do it best. Perhaps we should at some point have a longer
discussion with you, too. Now, we're really busy getting a fresh system up
and running, of course with an elaborate structure of environment
Older trees are
left in place so researchers can finish up projects using the same
version of a package, but eventually deprecated.
We will never delete user software for the lifetime of the system (unless
there is a _really_ nasty security risk from just having it around).
But well, we won't carry all old versions onto the next setup.
I have a lot of scientific packages in wip and more coming, but too
little time to devote to it.
Ah, so you helped us getting some of tha geography stuff going? ;-)
Btw.: I wonder if it makes sense having thousands of TexLive packages
in pkgsrc. It's such a huge collection of packages that actually comes
with its own package manager. It lends itself well to installation in a
separate prefix anyway. In our world, there are various separate
packages in addition to pkgsrc anyway. Pkgsrc takes the place of the
normal GNU/Linux userspace, on top of which specialist software is
One of my colleagues here is learning to
package and may join pkgsrc-wip soon.
Yes, if this really works out for us in the long term, I might start
contributing packages, too. Though, there probably always will be
standalone packages we build in-house. I see the need for pkgsrc with
the wildly interdependent stuff.
There's a lot of work to be done
in categories like math and biology, though. Fortran support needs some
work as well.
Do you have R built with proper BLAS (perhaps even Intel MKL?), and
possibly MPI from pkgsrc? Folks are using this software more and more,
as the field of application of HPC clusters widens.
But, well, let's continue that on a separate thread perhaps, in some
weeks when I can breathe again (*preparing yet another compute node
Jason W. Bacon
If a problem can be solved,
there's no need to worry.
If it cannot be solved, then
worrying will do no good.
Main Index |
Thread Index |