tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: RFC: PERCPU_LIST to fix PR kern/52515
Quick summary of the problem:
percpu_alloc(9) may move existing per-CPU objects to different
locations in memory. This means you can't reliably store pointers
to the per-CPU objects anywhere.
psref(9) currently uses queue.h LIST, with a per-CPU list head --
but LIST stores back-pointers into the list head, which breaks when
percpu_alloc(9) moves the list head.
Possible solutions. I'm leaning toward (6), to open-code the linked
list operations for this special purpose, with compile-tested patch
attached. This changes the text of psref.h, but shouldn't change the
ABI. Comments?
1. Make psref(9) store a pointer to a separately kmem_alloc-allocated
list head. (This is what Nakahara-san considered but rejected.)
struct psref_cpu {
- struct psref_head pcpu_head;
+ struct psref_head *pcpu_head;
};
Problem: We need to initialize these before psref_acquire/release.
We could initialize it in psref_class_create, but for various
applications of psref, we may not know how many CPUs there are at
that time.
To fix this properly we need to teach percpu(9) how to initialize
and finalize per-CPU objects as CPUs come online, which as a bonus
would make it work with dynamic CPU attachment and detachment. And
that's a lot of work.
2. Make percpu(9) always store pointers into separately kmem_alloc-
allocated memory, like userland TLS and pthread keys do.
Problem: It's a bit of work to basically rewrite subr_percpu.c, and
may have performance implications. I started last night, spent an
hour rewriting it, and didn't finish before bedtime came.
3. Invent a new variant of LIST called PERCPU_LIST that doesn't
require back-pointers into the head. (This is what Nakahara-san
suggested before I vanished into a EuroBSDcon vortex.)
Problem: This is almost the same as LIST -- supports essentially all
the same operations with the same performance characteristics --
and has very limited utility. So unless we find other applications
that need exactly this too, I'm reluctant to expose it as a
general-purpose kernel API right now.
It can't be a drop-in replacement for LIST, because we have
LIST_REMOVE(obj, field) but we need MHLIST_REMOVE(head, obj, field)
in order to allow the head to move.
And we can't just add LIST_REMOVE_MAYBEHEAD(head, obj, field) or
something to LIST, because there's no way to distinguish with
standard LIST whether an object is at the head of the list or not,
without knowing where the head *was* in memory when the object (or
one of its predecessors) was inserted.
4. Teach the percpu(9) API to move objects with an operation other
than memcpy. This would look like
percpu_t *percpu_alloc(size_t);
+ percpu_t *percpu_alloc_movable(size_t,
+ void *(*)(void *, const void *, size_t));
with percpu_alloc(sz) := percpu_alloc_movable(sz, &memcpy). Then
for psref(9), we could use LIST_MOVE to migrate the old list head
to the new list head and fix the one back-pointer that would have
been invalidated.
Problem: This is again an operation of limited utility, since most
per-CPU objects (like counters) don't need special operations to
move in memory. And it would compete with extending the API for
dynamic CPU initialization/finalization operations, so I'm not sure
I want to preemptively step on the toes of that API design space at
the moment.
5. Violate the abstraction of LIST in psref(9), as I proposed as a
stop-gap measure in
<https://mail-index.netbsd.org/tech-kern/2017/09/19/msg022345.html>.
Problem: It's gross, makes maintenance of the LIST implementation
more difficult, makes understanding the psref(9) code more
difficult, breaks QUEUEDEBUG, &c.
6. Just open-code it in subr_psref.c. If we ever find more
applications of this idiom, we can always upgrade it without
trouble to an explicit PERCPU_LIST abstraction.
Problem: We don't get to share any common LIST code. This seems
like a small price for a single-use gizmo like this.
Index: sys/sys/psref.h
===================================================================
RCS file: /cvsroot/src/sys/sys/psref.h,v
retrieving revision 1.2
diff -p -u -r1.2 psref.h
--- sys/sys/psref.h 16 Dec 2016 20:12:11 -0000 1.2
+++ sys/sys/psref.h 2 Oct 2017 13:47:30 -0000
@@ -69,7 +69,8 @@ struct psref_target {
* written only on the local CPU.
*/
struct psref {
- LIST_ENTRY(psref) psref_entry;
+ struct psref *psref_next;
+ struct psref **psref_prevp;
const struct psref_target *psref_target;
struct lwp *psref_lwp;
struct cpu_info *psref_cpu;
Index: sys/kern/subr_psref.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.7
diff -p -u -r1.7 subr_psref.c
--- sys/kern/subr_psref.c 1 Jun 2017 02:45:13 -0000 1.7
+++ sys/kern/subr_psref.c 2 Oct 2017 13:47:30 -0000
@@ -78,8 +78,6 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c
#include <sys/queue.h>
#include <sys/xcall.h>
-LIST_HEAD(psref_head, psref);
-
static bool _psref_held(const struct psref_target *, struct psref_class *,
bool);
@@ -103,7 +101,7 @@ struct psref_class {
* Not exposed by the API.
*/
struct psref_cpu {
- struct psref_head pcpu_head;
+ struct psref *pcpu_first;
};
/*
@@ -135,7 +133,7 @@ psref_cpu_drained_p(void *p, void *cooki
const struct psref_cpu *pcpu = p;
bool *retp = cookie;
- if (!LIST_EMPTY(&pcpu->pcpu_head))
+ if (pcpu->pcpu_first != NULL)
*retp = false;
}
@@ -194,7 +192,9 @@ psref_check_duplication(struct psref_cpu
bool found = false;
struct psref *_psref;
- LIST_FOREACH(_psref, &pcpu->pcpu_head, psref_entry) {
+ for (_psref = pcpu->pcpu_first;
+ _psref != NULL;
+ _psref = _psref->psref_next) {
if (_psref == psref &&
_psref->psref_target == target) {
found = true;
@@ -250,7 +250,11 @@ psref_acquire(struct psref *psref, const
#endif
/* Record our reference. */
- LIST_INSERT_HEAD(&pcpu->pcpu_head, psref, psref_entry);
+ if ((psref->psref_next = pcpu->pcpu_first) != NULL)
+ psref->psref_next->psref_prevp = &psref->psref_next;
+ psref->psref_prevp = NULL; /* head of list */
+ pcpu->pcpu_first = psref;
+
psref->psref_target = target;
psref->psref_lwp = curlwp;
psref->psref_cpu = curcpu();
@@ -297,12 +301,33 @@ psref_release(struct psref *psref, const
/*
* Block interrupts and remove the psref from the current CPU's
- * list. No need to percpu_getref or get the head of the list,
- * and the caller guarantees that we are bound to a CPU anyway
- * (as does blocking interrupts).
+ * list.
*/
s = splraiseipl(class->prc_iplcookie);
- LIST_REMOVE(psref, psref_entry);
+ if (psref->psref_next != NULL)
+ psref->psref_next->psref_prevp = psref->psref_prevp;
+
+ /*
+ * If it's at the head of the list, we must modify the head
+ * explicitly. We can't use a back-pointer into the head
+ * because percpu_alloc may move the per-CPU objects around in
+ * memory.
+ */
+ if (psref->psref_prevp == NULL) { /* head of list */
+ struct psref_cpu *pcpu = percpu_getref(class->prc_percpu);
+ KASSERTMSG((pcpu->pcpu_first == psref),
+ "psref %p marked as head of list but head of list is %p",
+ psref, pcpu->pcpu_first);
+ pcpu->pcpu_first = psref->psref_next;
+ } else { /* middle of list */
+ KASSERTMSG((*psref->psref_prevp == psref),
+ "psref %p prevp points at %p",
+ psref, *psref->psref_prevp);
+ *psref->psref_prevp = psref->psref_next;
+ }
+
+ psref->psref_prevp = (void *)1; /* poison */
+ psref->psref_next = (void *)2; /* poison */
splx(s);
/* If someone is waiting for users to drain, notify 'em. */
@@ -353,7 +378,11 @@ psref_copy(struct psref *pto, const stru
pcpu = percpu_getref(class->prc_percpu);
/* Record the new reference. */
- LIST_INSERT_HEAD(&pcpu->pcpu_head, pto, psref_entry);
+ if ((pto->psref_next = pcpu->pcpu_first) != NULL)
+ pto->psref_next->psref_prevp = &pto->psref_next;
+ pto->psref_prevp = NULL; /* head of list */
+ pcpu->pcpu_first = pto;
+
pto->psref_target = pfrom->psref_target;
pto->psref_lwp = curlwp;
pto->psref_cpu = curcpu();
@@ -474,7 +503,9 @@ _psref_held(const struct psref_target *t
pcpu = percpu_getref(class->prc_percpu);
/* Search through all the references on this CPU. */
- LIST_FOREACH(psref, &pcpu->pcpu_head, psref_entry) {
+ for (psref = pcpu->pcpu_first;
+ psref != NULL;
+ psref = psref->psref_next) {
/* Sanity-check the reference's CPU. */
KASSERTMSG((psref->psref_cpu == curcpu()),
"passive reference transferred from CPU %u to CPU %u",
Home |
Main Index |
Thread Index |
Old Index