tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD # actual cores



On 1/5/24 12:41, Greg Troxel wrote:
Jason Bacon <jtocino%gmx.com@localhost> writes:

Yes, we can, for the purpose of py-joblib and many other scenarios.

How about we just make py-joblib return 1, always?

We already have a system of command-line arguments like `make -j42'
and make/environment variables like MAKE_JOBS=42 to configure how many
jobs a program should try to use.  It's better for programs to respect
that than to rummage around and try to find extra CPU resources
they're not supposed to be using -- maybe it's better to discourage
those shenanigans.

I think you're looking at cores used for a program build.  How would
this affect a python application using joblib at runtime?

Joblib, and most parallelization tools, allow the user to control the
number of cores using the API and/or environment variables.  Typically,
the total number of hardware cores/hyperthreads is used as a default.
See OMP_NUM_THREADS as an example.

Anyway, I don't think it's a good idea for us to decide for every user
how they should utilize available resources.  This is contrary to the
Unix "trust the user" tenet.  I would make all potentially useful
information available and let users decide for themselves what to do
with it.  Withholding information that we think they might misuse will
only encourage them to go somewhere else.

I think there's more agreement here than not.  Taylor is, as I see, only
suggesting that by default, programs not saturate all cores.
Essentially, honor make's "-j1" as a default, and let people turn it up.

I suspect everyone is fine with APIs that return what they are
documented to return.  The issue is default behavior, and when someone
writes a program, they often overestimate its importance compared to a
non-author, and thus we end up with "X should use all resources"
defaults.

(We're seeing this in either go  or rust builds; I have been having
issues with compilers having execssive numbers of running threads,
leading to MAKE_JOB=4 with 8 cores lead to most of 32 competing
threads.   I need to track down the details.)

Yeah, bad behavior is common among scientific apps as well.  The SLURM
scheduler uses some additional measures to try and confine jobs to the
number of cores they requested, to the extent that it's possible, but
there are many low-quality scientific apps that unintentionally bypass
normal checks.

Some good points have been made about the complexity of the issue, but
after giving it some though, I don't see any down side to simply
exposing the physical core count.  More complex cases like asymmetric
CPU packages will require more information in order to optimize jobs
fully, but I think that's a separate issue.

As for something like py-joblib, physical core count is an improvement
over either existing alternative on NetBSD:

1. Use only 1 core, which defeats the purpose of py-joblib (a
parallelization library)
2. Use all available hyperthreads, which is more likely to oversubscribe
the machine than the physical core count

My first comment on py-joblib, BTW, would be "if you want performance,
use a compile language rather than parallelize interpreted scripts".
The reality, though, is that a python package might be the only tool
available for important research analyses, so we just have to cringe and
support them as well as possible.  Wasting computing resources stinks,
but it beats delaying life-saving medical research.

There are lots of cases where "X should use all resources" is correct,
like Joe scientist running an analysis on his private workstation, or on
an HPC cluster where the job allocates entire nodes without any
requirements for a specific number of cores.  The scheduler might assign
a node with any number of cores, and we want the job to utilize all of
them so it finishes as quickly as possible and makes the node available
to the next user.

Cheers,

	J


Home | Main Index | Thread Index | Old Index