Hi,
I intend to merge the wip/OpenBLAS and wip/openblas-devel packages into
a single wip/openblas that shall be included in pkgsrc proper so that
also I can start submitting patches for packages to optionally use it
(that is numpy, R, octave, …).
There is one basic question to have a decision on: Do we install
multiple variants of the library for single-threaded and multi-threaded
builds? And if we install only one variant, which one should be the
default?
The wip/OpenBLAS package currently does a rather normal install that
includes only one library build and some other files around it
(headers, build helpers like cmake files, bust most importantly
probably cblas.h), while wip/openblas-devel installs bare library
files, but multiple builds of them, namely a serial libopenblas.so and
a parallel libopenblasp.so. Parallel means either using pthreads
directly or OpenMP.
One could extend the wip/openblas-devel approach to build all three
variants of the library with differing names, but that would mean that
the installed cmake file can only point to one of them (not sure anyone
uses it) and also it makes the dependency of other packages on OpenBLAS
difficult. Last but not least, the expected speedup of using something
like OpenBLAS or MKL may contain the use of all CPU cores when you
throw big matrices around. For us here, it is one way to make people
use the 16 cores of a compute node even if they only run a ‘simple’ R
script.
As you have to decide at build-time which BLAS to use, the availability
of multiple variants doesn't help end-users who use R, octave, numpy
builds. People who build their own applications linking libopenblas
explicitly could also rather easily build that library themselves. It
is basically as complicated as typing `make`.
So, I would like to keep the approach of wip/OpenBLAS, offering a
choice of single/pthread/openmp build. The question now is: Which one
should be the default? This is complicated by the fact that the
parallel builds are not compatible with all workflows. Certain ways of
parallelizing code at higher levels (multiprocessing in Python, R) have
possible horrible failure modes with pthread and openmp builds, the
latter being GCC not liking forks with active OpenMP contexts, which is
understandable, but a roadblock that Intel MKL apparently avoids.
Modern Python has modes of parallelization that do not interfere with
OpenMP in OpenBLAS (forkserver), and the case of parallelization in R
is a mess anyway. We intend to provide R with OpenMP in OpenBLAS and
the Rmpi package, which should be able to be used safely and is not
significantly slower than the other (horribly memory-inefficient) ways
of parallelizing R scripts.
But when you are building binary packages for an unspecific userbase,
you might prefer the safe route to only install a serial build. That
gives speedups to around factor 8 using SIMD, which is still nice. And
it's safe in any application. An HPC site with highly parallel machines
will have to build its own packages and select pthread or openmp
respectively. I think this is not a big burden.
Debian and derived distributions have this mechanism of dropping-in
alternate libblas/liblapack per user choice via symlinks at runtime.
This would not fly with us on multiuser systems where different users
might have different preferences. We would need to provide separate
install prefixes of pkgsrc with the differing library setup anyway.
So … may I have some input on this two-pronged question:
1. Build one or multiple libopenblas?
2. If one, which one should be default: single/serial, pthread, or
openmp?
Alrighty then,
Thomas
PS: You see that I slipped with the naming of the non-multithreaded
variant. I used single before, but probably the name should be changed
to serial.