tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: PERCPU_LIST to fix PR kern/52515



Quick summary of the problem:

   percpu_alloc(9) may move existing per-CPU objects to different
   locations in memory.  This means you can't reliably store pointers
   to the per-CPU objects anywhere.

   psref(9) currently uses queue.h LIST, with a per-CPU list head --
   but LIST stores back-pointers into the list head, which breaks when
   percpu_alloc(9) moves the list head.

Possible solutions.  I'm leaning toward (6), to open-code the linked
list operations for this special purpose, with compile-tested patch
attached.  This changes the text of psref.h, but shouldn't change the
ABI.  Comments?


1. Make psref(9) store a pointer to a separately kmem_alloc-allocated
   list head.  (This is what Nakahara-san considered but rejected.)

	struct psref_cpu {
-		struct psref_head	pcpu_head;
+		struct psref_head	*pcpu_head;
	};

  Problem: We need to initialize these before psref_acquire/release.
   We could initialize it in psref_class_create, but for various
   applications of psref, we may not know how many CPUs there are at
   that time.

   To fix this properly we need to teach percpu(9) how to initialize
   and finalize per-CPU objects as CPUs come online, which as a bonus
   would make it work with dynamic CPU attachment and detachment.  And
   that's a lot of work.

2. Make percpu(9) always store pointers into separately kmem_alloc-
   allocated memory, like userland TLS and pthread keys do.

  Problem: It's a bit of work to basically rewrite subr_percpu.c, and
   may have performance implications.  I started last night, spent an
   hour rewriting it, and didn't finish before bedtime came.

3. Invent a new variant of LIST called PERCPU_LIST that doesn't
   require back-pointers into the head.  (This is what Nakahara-san
   suggested before I vanished into a EuroBSDcon vortex.)

  Problem: This is almost the same as LIST -- supports essentially all
   the same operations with the same performance characteristics --
   and has very limited utility.  So unless we find other applications
   that need exactly this too, I'm reluctant to expose it as a
   general-purpose kernel API right now.

   It can't be a drop-in replacement for LIST, because we have
   LIST_REMOVE(obj, field) but we need MHLIST_REMOVE(head, obj, field)
   in order to allow the head to move.

   And we can't just add LIST_REMOVE_MAYBEHEAD(head, obj, field) or
   something to LIST, because there's no way to distinguish with
   standard LIST whether an object is at the head of the list or not,
   without knowing where the head *was* in memory when the object (or
   one of its predecessors) was inserted.

4. Teach the percpu(9) API to move objects with an operation other
   than memcpy.  This would look like

	percpu_t *percpu_alloc(size_t);
+	percpu_t *percpu_alloc_movable(size_t,
+	    void *(*)(void *, const void *, size_t));

   with percpu_alloc(sz) := percpu_alloc_movable(sz, &memcpy).  Then
   for psref(9), we could use LIST_MOVE to migrate the old list head
   to the new list head and fix the one back-pointer that would have
   been invalidated.

  Problem: This is again an operation of limited utility, since most
   per-CPU objects (like counters) don't need special operations to
   move in memory.  And it would compete with extending the API for
   dynamic CPU initialization/finalization operations, so I'm not sure
   I want to preemptively step on the toes of that API design space at
   the moment.

5. Violate the abstraction of LIST in psref(9), as I proposed as a
   stop-gap measure in
   <https://mail-index.netbsd.org/tech-kern/2017/09/19/msg022345.html>.

  Problem: It's gross, makes maintenance of the LIST implementation
   more difficult, makes understanding the psref(9) code more
   difficult, breaks QUEUEDEBUG, &c.

6. Just open-code it in subr_psref.c.  If we ever find more
   applications of this idiom, we can always upgrade it without
   trouble to an explicit PERCPU_LIST abstraction.

  Problem: We don't get to share any common LIST code.  This seems
   like a small price for a single-use gizmo like this.
Index: sys/sys/psref.h
===================================================================
RCS file: /cvsroot/src/sys/sys/psref.h,v
retrieving revision 1.2
diff -p -u -r1.2 psref.h
--- sys/sys/psref.h	16 Dec 2016 20:12:11 -0000	1.2
+++ sys/sys/psref.h	2 Oct 2017 13:47:30 -0000
@@ -69,7 +69,8 @@ struct psref_target {
  *	written only on the local CPU.
  */
 struct psref {
-	LIST_ENTRY(psref)		psref_entry;
+	struct psref			*psref_next;
+	struct psref			**psref_prevp;
 	const struct psref_target	*psref_target;
 	struct lwp			*psref_lwp;
 	struct cpu_info			*psref_cpu;
Index: sys/kern/subr_psref.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.7
diff -p -u -r1.7 subr_psref.c
--- sys/kern/subr_psref.c	1 Jun 2017 02:45:13 -0000	1.7
+++ sys/kern/subr_psref.c	2 Oct 2017 13:47:30 -0000
@@ -78,8 +78,6 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c
 #include <sys/queue.h>
 #include <sys/xcall.h>
 
-LIST_HEAD(psref_head, psref);
-
 static bool	_psref_held(const struct psref_target *, struct psref_class *,
 		    bool);
 
@@ -103,7 +101,7 @@ struct psref_class {
  *	Not exposed by the API.
  */
 struct psref_cpu {
-	struct psref_head	pcpu_head;
+	struct psref		*pcpu_first;
 };
 
 /*
@@ -135,7 +133,7 @@ psref_cpu_drained_p(void *p, void *cooki
 	const struct psref_cpu *pcpu = p;
 	bool *retp = cookie;
 
-	if (!LIST_EMPTY(&pcpu->pcpu_head))
+	if (pcpu->pcpu_first != NULL)
 		*retp = false;
 }
 
@@ -194,7 +192,9 @@ psref_check_duplication(struct psref_cpu
 	bool found = false;
 	struct psref *_psref;
 
-	LIST_FOREACH(_psref, &pcpu->pcpu_head, psref_entry) {
+	for (_psref = pcpu->pcpu_first;
+	     _psref != NULL;
+	     _psref = _psref->psref_next) {
 		if (_psref == psref &&
 		    _psref->psref_target == target) {
 			found = true;
@@ -250,7 +250,11 @@ psref_acquire(struct psref *psref, const
 #endif
 
 	/* Record our reference.  */
-	LIST_INSERT_HEAD(&pcpu->pcpu_head, psref, psref_entry);
+	if ((psref->psref_next = pcpu->pcpu_first) != NULL)
+		psref->psref_next->psref_prevp = &psref->psref_next;
+	psref->psref_prevp = NULL;	/* head of list */
+	pcpu->pcpu_first = psref;
+
 	psref->psref_target = target;
 	psref->psref_lwp = curlwp;
 	psref->psref_cpu = curcpu();
@@ -297,12 +301,33 @@ psref_release(struct psref *psref, const
 
 	/*
 	 * Block interrupts and remove the psref from the current CPU's
-	 * list.  No need to percpu_getref or get the head of the list,
-	 * and the caller guarantees that we are bound to a CPU anyway
-	 * (as does blocking interrupts).
+	 * list.
 	 */
 	s = splraiseipl(class->prc_iplcookie);
-	LIST_REMOVE(psref, psref_entry);
+	if (psref->psref_next != NULL)
+		psref->psref_next->psref_prevp = psref->psref_prevp;
+
+	/*
+	 * If it's at the head of the list, we must modify the head
+	 * explicitly.  We can't use a back-pointer into the head
+	 * because percpu_alloc may move the per-CPU objects around in
+	 * memory.
+	 */
+	if (psref->psref_prevp == NULL) {	/* head of list */
+		struct psref_cpu *pcpu = percpu_getref(class->prc_percpu);
+		KASSERTMSG((pcpu->pcpu_first == psref),
+		    "psref %p marked as head of list but head of list is %p",
+		    psref, pcpu->pcpu_first);
+		pcpu->pcpu_first = psref->psref_next;
+	} else {				/* middle of list */
+		KASSERTMSG((*psref->psref_prevp == psref),
+		    "psref %p prevp points at %p",
+		    psref, *psref->psref_prevp);
+		*psref->psref_prevp = psref->psref_next;
+	}
+
+	psref->psref_prevp = (void *)1; /* poison */
+	psref->psref_next = (void *)2;	/* poison */
 	splx(s);
 
 	/* If someone is waiting for users to drain, notify 'em.  */
@@ -353,7 +378,11 @@ psref_copy(struct psref *pto, const stru
 	pcpu = percpu_getref(class->prc_percpu);
 
 	/* Record the new reference.  */
-	LIST_INSERT_HEAD(&pcpu->pcpu_head, pto, psref_entry);
+	if ((pto->psref_next = pcpu->pcpu_first) != NULL)
+		pto->psref_next->psref_prevp = &pto->psref_next;
+	pto->psref_prevp = NULL;	/* head of list */
+	pcpu->pcpu_first = pto;
+
 	pto->psref_target = pfrom->psref_target;
 	pto->psref_lwp = curlwp;
 	pto->psref_cpu = curcpu();
@@ -474,7 +503,9 @@ _psref_held(const struct psref_target *t
 	pcpu = percpu_getref(class->prc_percpu);
 
 	/* Search through all the references on this CPU.  */
-	LIST_FOREACH(psref, &pcpu->pcpu_head, psref_entry) {
+	for (psref = pcpu->pcpu_first;
+	     psref != NULL;
+	     psref = psref->psref_next) {
 		/* Sanity-check the reference's CPU.  */
 		KASSERTMSG((psref->psref_cpu == curcpu()),
 		    "passive reference transferred from CPU %u to CPU %u",


Home | Main Index | Thread Index | Old Index