NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/56412: lwp_dtor() causes cross-call storm



>Number:         56412
>Category:       kern
>Synopsis:       lwp_dtor() causes cross-call storm
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 20 15:25:00 +0000 2021
>Originator:     Jason Thorpe
>Release:        NetBSD 9.99.82 (and many releases prior)
>Organization:
RISCy Business
>Environment:
NetBSD the-ripe-vessel 9.99.82 NetBSD 9.99.82 (GENERIC) #0: Tue May 18 17:05:45 UTC 2021  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
The pool cache infrastructure provides a weak memory type-stability model for objects that use it. This feature is relied upon by kern_mutex.c and kern_rwlock.c in order to check if a lock owner is currently running on a CPU.

In order to ensure that mutex_oncpu() and rw_oncpu() are no longer referencing an LWP object that is about to be freed back to the system (and thus lose its type-stable property), lwp_dtor() performs an xc_barrier(0).

The problem is that lwp_dtor() is called once for each LWP in a page that's being released back to the system.  This is necessary in order to properly tear down the LWP object, but is NOT needed to ensure the type stability relied upon by mutex_oncpu() and rw_oncpu(); only **one** xc_barrier() is needed before then calling the destructor for each LWP object.

The upshot of the current implementation is that freeing a page that just happened to back LWP objects causes a brief cross-call storm.  On systems with a small number of CPU, this is probably not very noticeable.  However, on a system with a large number of CPUs, this could constitute an intermittent performance problem whenever the system comes under even slight memory pressure.
>How-To-Repeat:
This was noticed during code inspection; constructing a reproducer is left as an exercise for the reader.
>Fix:
Provide a mechanism to register a pre-DTOR hook for the pool cache layer to invoke.



Home | Main Index | Thread Index | Old Index