NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54009: "l->l_pcu_cpu[id] == NULL" panic on aarch64



>> I guess the cause is the lack of memory barrier.
>> Will the following patches fix it?
>
>The bug annoyed me so much that I turned that server off.
>But I recently turn it back on to test 9.0_BETA.
>
>Two tor relays running on the server are still in a ramp up phase
>and it will take about a month to get them running at full speed.
>Once they run at a full speed, a chance of hitting the panic will
>be much higher.


With only this verification patch applied, it was confirmed to be false positive.

cvs -q diff -aup .
Index: subr_pcu.c
===================================================================
RCS file: /src/cvs/cvsroot-netbsd/src/sys/kern/subr_pcu.c,v
retrieving revision 1.21
diff -a -u -p -r1.21 subr_pcu.c
--- subr_pcu.c	16 Oct 2017 15:03:57 -0000	1.21
+++ subr_pcu.c	29 Aug 2019 05:53:35 -0000
@@ -336,6 +336,13 @@ pcu_load(const pcu_ops_t *pcu)
 		s = splpcu();
 		curci = curcpu();
 	}
+#if 1
+	if (l->l_pcu_cpu[id] != NULL) {
+		printf("false positive?: l->l_pcu_cpu[id] == NULL? id=%u, l=%p, l->l_pcu_cpu[id]=%p\n", id, l, l->l_pcu_cpu[id]);
+		__asm __volatile ("dsb sy");
+		printf("check again: l->l_pcu_cpu[id] == NULL? id=%u, l=%p, l->l_pcu_cpu[id]=%p\n", id, l, l->l_pcu_cpu[id]);
+	}
+#endif
 	KASSERT(l->l_pcu_cpu[id] == NULL);
 
 	/* Save the PCU state on the current CPU, if there is any. */


[    46.812281] false positive?: l->l_pcu_cpu[id] == NULL? id=0, l=0xffffffc004ba2300, l->l_pcu_cpu[id]=0xffffffc000a29580
[    46.812281] check again: l->l_pcu_cpu[id] == NULL? id=0, l=0xffffffc004ba2300, l->l_pcu_cpu[id]=0x0

It's almost certainly a memory barrier problem.
I'll commit the fix. If you still reproduce it, please let me know.

Thanks,
-- 
ryo shimizu


Home | Main Index | Thread Index | Old Index