tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PATCH libatomic

On 08.05.2020 02:14, Thor Lancelot Simon wrote:
> On Fri, May 08, 2020 at 01:51:16AM +0200, Kamil Rytarowski wrote:
>> A runtime detection could be a part of ifunc (is it ready for NetBSD?).
>> The standard C/C++ feature is to detect whether atomic operations are
>> real (lock-free) through atomic_is_lock_free(). This is a feature, not a
>> bug (as claimed by some people). atomic_is_lock_free() can be overloaded
>> in libatomic and detect CPU type in runtime and redirect either to real
>> CPU intrinsic of lock-free fallback.
> Not without performance penalty for every atomic operation, unless you propose
> to do this by binary patch as is done in the kernel.
> Thor

There is atomic penalty, but it is the contract and design of this (C
and C++) feature. Atomics can be legitimately lock-free or non-lock-free
and this is a feature.

I consider this performance problem as already solved as whenever
performance or lock-free implementation is important in a third party
software, there are predefined constants, such as ATOMIC_LLONG_LOCK_FREE
to detect whether atomic operations for certain types
never/always/sometimes block.

Furthermore, in C we can use atomic operations on arbitrary large and
libatomic locking fallback is needed always.

On 08.05.2020 02:48, wrote:
> ifuncs are pretty cool in enabling to do this everywhere, not just in a
> kernel with binary patching.
> It lets you write a "resolver function" which is run by in the
> first run of the function, and it returns which variant of the function
> should be used.

GNU already uses similar logic and detects CPU capabilities in runtime
(in libgcc). GNU ifunc was rejected for AArch64:

 * The problem that we are trying to solve is operating system deployment
 * of ARMv8.1-Atomics, also known as Large System Exensions (LSE).
 * There are a number of potential solutions for this problem which have
 * been proposed and rejected for various reasons.  To recap:
 * (1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
 * if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
 * However, not all Linux distributions are happy with multiple builds,
 * and anyway it has no effect on main applications.
 * (2) IFUNC.  We could put these functions into, and have
 * a single copy of each function for all DSOs.  However, ARM is concerned
 * that the branch-to-indirect-branch that is implied by using a PLT,
 * as required by IFUNC, is too much overhead for smaller cpus.
 * (3) Statically predicted direct branches.  This is the approach that
 * is taken here.  These functions are linked into every DSO that uses them.
 * All of the symbols are hidden, so that the functions are called via a
 * direct branch.  The choice of LSE vs non-LSE is done via one byte load
 * followed by a well-predicted direct branch.  The functions are compiled
 * separately to minimize code size.

Detection of LSE Atomics is done this way:

static void __attribute__((constructor))
init_have_lse_atomics (void)
  unsigned long hwcap = getauxval (AT_HWCAP);
  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;

There is still some room for improvement and we could have kernel
assisted libatomic and win some performance. However, I propose to just
jump on the same ship as LLVM and GCC developers rather than reinventing
the wheel. Doing it better for all old and upcoming CPUs is
unfortunately not realistic.

Attachment: signature.asc
Description: OpenPGP digital signature

Home | Main Index | Thread Index | Old Index