NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54009: "l->l_pcu_cpu[id] == NULL" panic on aarch64



The following reply was made to PR kern/54009; it has been noted by GNATS.

From: Ryo Shimizu <ryo%nerv.org@localhost>
To: Alexander Nasonov <alnsn%yandex.ru@localhost>
Cc: Ryo Shimizu <ryo%nerv.org@localhost>, gnats-bugs%NetBSD.org@localhost,
    kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
    netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/54009: "l->l_pcu_cpu[id] == NULL" panic on aarch64
Date: Thu, 05 Sep 2019 18:08:40 +0900

 >> I guess the cause is the lack of memory barrier.
 >> Will the following patches fix it?
 >
 >The bug annoyed me so much that I turned that server off.
 >But I recently turn it back on to test 9.0_BETA.
 >
 >Two tor relays running on the server are still in a ramp up phase
 >and it will take about a month to get them running at full speed.
 >Once they run at a full speed, a chance of hitting the panic will
 >be much higher.
 
 
 With only this verification patch applied, it was confirmed to be false positive.
 
 cvs -q diff -aup .
 Index: subr_pcu.c
 ===================================================================
 RCS file: /src/cvs/cvsroot-netbsd/src/sys/kern/subr_pcu.c,v
 retrieving revision 1.21
 diff -a -u -p -r1.21 subr_pcu.c
 --- subr_pcu.c	16 Oct 2017 15:03:57 -0000	1.21
 +++ subr_pcu.c	29 Aug 2019 05:53:35 -0000
 @@ -336,6 +336,13 @@ pcu_load(const pcu_ops_t *pcu)
  		s = splpcu();
  		curci = curcpu();
  	}
 +#if 1
 +	if (l->l_pcu_cpu[id] != NULL) {
 +		printf("false positive?: l->l_pcu_cpu[id] == NULL? id=%u, l=%p, l->l_pcu_cpu[id]=%p\n", id, l, l->l_pcu_cpu[id]);
 +		__asm __volatile ("dsb sy");
 +		printf("check again: l->l_pcu_cpu[id] == NULL? id=%u, l=%p, l->l_pcu_cpu[id]=%p\n", id, l, l->l_pcu_cpu[id]);
 +	}
 +#endif
  	KASSERT(l->l_pcu_cpu[id] == NULL);
  
  	/* Save the PCU state on the current CPU, if there is any. */
 
 
 [    46.812281] false positive?: l->l_pcu_cpu[id] == NULL? id=0, l=0xffffffc004ba2300, l->l_pcu_cpu[id]=0xffffffc000a29580
 [    46.812281] check again: l->l_pcu_cpu[id] == NULL? id=0, l=0xffffffc004ba2300, l->l_pcu_cpu[id]=0x0
 
 It's almost certainly a memory barrier problem.
 I'll commit the fix. If you still reproduce it, please let me know.
 
 Thanks,
 -- 
 ryo shimizu
 


Home | Main Index | Thread Index | Old Index