Hi, I intend to merge the wip/OpenBLAS and wip/openblas-devel packages into a single wip/openblas that shall be included in pkgsrc proper so that also I can start submitting patches for packages to optionally use it (that is numpy, R, octave, …). There is one basic question to have a decision on: Do we install multiple variants of the library for single-threaded and multi-threaded builds? And if we install only one variant, which one should be the default? The wip/OpenBLAS package currently does a rather normal install that includes only one library build and some other files around it (headers, build helpers like cmake files, bust most importantly probably cblas.h), while wip/openblas-devel installs bare library files, but multiple builds of them, namely a serial libopenblas.so and a parallel libopenblasp.so. Parallel means either using pthreads directly or OpenMP. One could extend the wip/openblas-devel approach to build all three variants of the library with differing names, but that would mean that the installed cmake file can only point to one of them (not sure anyone uses it) and also it makes the dependency of other packages on OpenBLAS difficult. Last but not least, the expected speedup of using something like OpenBLAS or MKL may contain the use of all CPU cores when you throw big matrices around. For us here, it is one way to make people use the 16 cores of a compute node even if they only run a ‘simple’ R script. As you have to decide at build-time which BLAS to use, the availability of multiple variants doesn't help end-users who use R, octave, numpy builds. People who build their own applications linking libopenblas explicitly could also rather easily build that library themselves. It is basically as complicated as typing `make`. So, I would like to keep the approach of wip/OpenBLAS, offering a choice of single/pthread/openmp build. The question now is: Which one should be the default? This is complicated by the fact that the parallel builds are not compatible with all workflows. Certain ways of parallelizing code at higher levels (multiprocessing in Python, R) have possible horrible failure modes with pthread and openmp builds, the latter being GCC not liking forks with active OpenMP contexts, which is understandable, but a roadblock that Intel MKL apparently avoids. Modern Python has modes of parallelization that do not interfere with OpenMP in OpenBLAS (forkserver), and the case of parallelization in R is a mess anyway. We intend to provide R with OpenMP in OpenBLAS and the Rmpi package, which should be able to be used safely and is not significantly slower than the other (horribly memory-inefficient) ways of parallelizing R scripts. But when you are building binary packages for an unspecific userbase, you might prefer the safe route to only install a serial build. That gives speedups to around factor 8 using SIMD, which is still nice. And it's safe in any application. An HPC site with highly parallel machines will have to build its own packages and select pthread or openmp respectively. I think this is not a big burden. Debian and derived distributions have this mechanism of dropping-in alternate libblas/liblapack per user choice via symlinks at runtime. This would not fly with us on multiuser systems where different users might have different preferences. We would need to provide separate install prefixes of pkgsrc with the differing library setup anyway. So … may I have some input on this two-pronged question: 1. Build one or multiple libopenblas? 2. If one, which one should be default: single/serial, pthread, or openmp? Alrighty then, Thomas PS: You see that I slipped with the naming of the non-multithreaded variant. I used single before, but probably the name should be changed to serial. -- Dr. Thomas Orgis Universität Hamburg RRZ / Basisinfrastruktur / HPC Schlüterstr. 70 20146 Hamburg Tel.: 040/42838 8826 Fax: 040/428 38 6270
Attachment:
smime.p7s
Description: S/MIME cryptographic signature