tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

panic: "(ln->la_flags & LLE_VALID) != 0" failed



I recently upgraded to netbsd-9, and I've been seeing this panic every
couple days, sometimes more than once a day:

panic: kernel diagnostic assertion "(ln->la_flags & LLE_VALID) != 0" failed: file "/home/riastradh/netbsd/9/src/sys/netinet6/nd6.c", line 2412

This is at:

https://nxr.netbsd.org/xref/src/sys/netinet6/nd6.c#2426

(The line number is slightly different in HEAD, but I think the logic
is essentially the same.)

I suspect what happened is:

1. Thread 0 issued nd6_lookup which:
   (a) acquired IF_AFDATA_RLOCK(ifp),
   (b) looked up lle and acquired LLE_WLOCK(lle), and then
   (c) released IF_AFDATA_RLOCK(ifp); meanwhile,

2. Thread 1 did something which called lltable_unlink_entry without
   holding LLE_WLOCK, perhaps llentries_unlink either via
   lltable_purge_entries or via lltable_prefix_free ->
   htable_prefix_free.  lltable_unlink_entry -> htable_unlink_entry
   clears LLE_VALID.

3. Thread 0 chokes on the cleared LLE_VALID.

Since thread 0 no longer holds IF_AFDATA_*LOCK, thread 1 can take it
and proceed, and since thread 1 _doesn't need_ LLE_*LOCK, the fact
that thread 0 is holding it doesn't prevent thread 1 from unlinking
lle.

I haven't proven that lltable_purge_entries or lltable_prefix_free
happened at the time of the panic -- perhaps they are a red herring.
Anecdotally the system seems to start dropping packets for a few
seconds before it panics.  I'm not the only one who has seen this
symptom.  Has anyone dug into this?


The attached patch changes llentries_unlink to acquire LLE_WLOCK
before calling lltable_unlink_entry, and changes lltable_unlink_entry
to assert that the LLE_WLOCK is held before modifying the lle in case
there are other code paths I haven't found that need LLE_WLOCK but
lack it.  Haven't tested it yet.

(Unclear whether *_link_entry needs the same treatment -- the two
callers, in_lltable_create and in6_lltable_create, both acquire
LLE_WLOCK immediately after lltable_link_entry but could call it
immediately before, I think.)

Does this sound plausible?
From d5190af30272ef99c07d2e239b4a8def01507055 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Sun, 19 Apr 2020 00:13:02 +0000
Subject: [PATCH] Ensure we hold LLE_WLOCK around unlinking the table entry.

Candidate fix for

panic: kernel diagnostic assertion "(ln->la_flags & LLE_VALID) != 0" failed: file "/home/riastradh/netbsd/9/src/sys/netinet6/nd6.c", line 2412.
---
 sys/net/if_llatbl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/sys/net/if_llatbl.c b/sys/net/if_llatbl.c
index 143f8241e74b..44346c4df562 100644
--- a/sys/net/if_llatbl.c
+++ b/sys/net/if_llatbl.c
@@ -231,6 +231,8 @@ static void
 htable_unlink_entry(struct llentry *lle)
 {
 
+	LLE_WLOCK_ASSERT(lle);
+
 	if ((lle->la_flags & LLE_LINKED) != 0) {
 		IF_AFDATA_WLOCK_ASSERT(lle->lle_tbl->llt_ifp);
 		LIST_REMOVE(lle, lle_next);
@@ -303,8 +305,11 @@ llentries_unlink(struct lltable *llt, struct llentries *head)
 {
 	struct llentry *lle, *next;
 
-	LIST_FOREACH_SAFE(lle, head, lle_chain, next)
+	LIST_FOREACH_SAFE(lle, head, lle_chain, next) {
+		LLE_WLOCK(lle);
 		llt->llt_unlink_entry(lle);
+		LLE_WUNLOCK(lle);
+	}
 }
 
 /*


Home | Main Index | Thread Index | Old Index