pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: BLAS: Does cblas interface work with BLAS types other than netlib



On Mon, Sep 26, 2022 at 12:29:52PM +0530, Mayuresh wrote:
> Finally, I find that the choice at the time of compiling dlib-cpp doesn't
> matter, as long as the build process knows CBLAS exists. The one at the
> time of compiling the final executable gives me MT and the speed up
> required.

For those interested in BLAS or DLIB or just MT performance, some
observations:

On NetBSD 9.3 amd64 VPS with 4 cores, for a certain deep learning job here
are some readings.

With openblas_pthread

    NUM_THREADS=1
        14666.77 real     14661.62 user         2.86 sys
    NUM_THREADS=2
        13104.87 real     14918.68 user      8052.21 sys
    NUM_THREADS=4
        12417.81 real     15674.54 user     24096.69 sys

With openblas_openmp

    OMP_NUM_THREADS=1
      14548.69 real     14539.60 user         3.65 sys
    OMP_NUM_THREADS=2
      13785.67 real     15064.01 user       302.84 sys
    OMP_NUM_THREADS=4
      13621.12 real     15465.07 user      1320.44 sys

With stock blas in dlib (i.e. without openblas), which is single threaded

    12261 real

So
    - pthread openblas seems to be performing better than openmp for MT
      but spending much higher sys time

    - single threaded stock blas of dlib seems to be performing better
      than either single or multi-threaded openblas in either mode

Workload details:

    - A convolutional neural network with window size 5x5, 1.5 million
      samples, 6 dense layers, 10000 iterations

-- 
Mayuresh


Home | Main Index | Thread Index | Old Index