NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58666: panic: lock error: Reader / writer lock: rw_vector_enter,357: locking against myself



The following reply was made to PR kern/58666; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Havard Eidnes <he%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
	Chuck Silvers <chs%NetBSD.org@localhost>
Subject: Re: kern/58666: panic: lock error: Reader / writer lock:
	rw_vector_enter,357: locking against myself
Date: Sun, 8 Sep 2024 12:04:36 +0000

 > Date: Sun, 08 Sep 2024 13:42:24 +0200 (CEST)
 > From: Havard Eidnes <he%NetBSD.org@localhost>
 >=20
 > Doesn't look like addr2line wants to play ball:
 >=20
 > fxxxx# /usr/tools/bin/i486--netbsdelf-addr2line -e ./netbsd.2.gdb -i pmap=
 _ctor+0xb7
 > ??:0
 
 Huh, I guess something was fixed in binutiles between netbsd-10 and
 HEAD.  Anyway, gdb gives this to us.
 
 > fxxxx# gdb -q netbsd.2.gdb
 > ...
 > (gdb) info line *(pmap_ctor+0xb7)
 > Line 2702 of "/usr/src/sys/arch/x86/x86/pmap.c"
 >    starts at address 0xc049e659 <pmap_ctor+174>
 >    and ends at 0xc049e664 <pmap_ctor+185>.
 
 This is:
 
    2688 static void
    2689 pmap_pdp_init(pd_entry_t *pdir)
    2690 {
 ...
    2702 	memset(PAGE_ALIGNED(pdir), 0, PDP_SIZE * PAGE_SIZE);
 
 https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c?r=3D1.423#2702
 
 Together with the uvm_fault_internal(map, va, flags) line in the stack
 trace, we see this is a null pointer dereference:
 
 uvm_fault_internal(ce5820e8,0,2,0,ffffffff,ffffffff,4000,1723,f4023cd4,ce58=
 20e8) at uvm_fault_internal+0xcf
 
 > OK, this isn't disass from pmap_ctor+0xb7, but I thought this
 > would be easier to read (marked +0xb7 with <--):
 >=20
 > (gdb) disass pmap_ctor
 > Dump of assembler code for function pmap_ctor:
 > [...]
 >    0xc049e659 <+174>:     mov    $0x1000,%ecx   <-- +0xb7
 >    0xc049e65e <+179>:     mov    %ebx,%edi
 >    0xc049e660 <+181>:     xor    %eax,%eax
 >    0xc049e662 <+183>:     rep stos %eax,%es:(%edi)
 
 What you marked is 0xae=3D174, not 0xb7=3D183.  0xb7=3D183 is the REP STOS
 instruction, which is to say, memset.
 
 In other words, this is consistent with my hypothesis that:
 
 1. the fault is memset of null
 
 2. the pointer which is null came from pool_get(&pmap_pdp_pool,
    PR_WAITOK) which should never return null, unless...
 
 3. the backing allocator for pmap_pdp_pool fails and returns null
    despite being passed PR_WAITOK, which...
 
 4. can happen if this is a PAE kernel (as it is) so the backing
    allocator is pmap_pdp_alloc which calls uvm_km_alloc without
    UVM_KMF_WAITVA, and...
 
 5. RAM is currently short so uvm_km_alloc would have to sleep and wait
    for the pagedaemon to free pages before it can return, in which
    case it returns null instead of sleeping because pmap_pdp_alloc
    didn't pass UVM_KMF_WAITVA.
 
 The patch attempts to fix this by passing UVM_KMF_WAITVA in
 pmap_pdp_alloc so it sleeps instead of failing in this case.
 
 I also started to review the tree for other cases of uvm_km_alloc that
 don't pass either UVM_KMF_WAITVA or UVM_KMF_NOWAIt, and, hoo boy,
 there's a lot of potential issues here in obscure corners like acorn32
 which I have no hope of testing myself.
 


Home | Main Index | Thread Index | Old Index