NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/58666: panic: lock error: Reader / writer lock: rw_vector_enter,357: locking against myself
The following reply was made to PR kern/58666; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Havard Eidnes <he%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost,
Chuck Silvers <chs%NetBSD.org@localhost>
Subject: Re: kern/58666: panic: lock error: Reader / writer lock:
rw_vector_enter,357: locking against myself
Date: Sun, 8 Sep 2024 12:04:36 +0000
> Date: Sun, 08 Sep 2024 13:42:24 +0200 (CEST)
> From: Havard Eidnes <he%NetBSD.org@localhost>
>=20
> Doesn't look like addr2line wants to play ball:
>=20
> fxxxx# /usr/tools/bin/i486--netbsdelf-addr2line -e ./netbsd.2.gdb -i pmap=
_ctor+0xb7
> ??:0
Huh, I guess something was fixed in binutiles between netbsd-10 and
HEAD. Anyway, gdb gives this to us.
> fxxxx# gdb -q netbsd.2.gdb
> ...
> (gdb) info line *(pmap_ctor+0xb7)
> Line 2702 of "/usr/src/sys/arch/x86/x86/pmap.c"
> starts at address 0xc049e659 <pmap_ctor+174>
> and ends at 0xc049e664 <pmap_ctor+185>.
This is:
2688 static void
2689 pmap_pdp_init(pd_entry_t *pdir)
2690 {
...
2702 memset(PAGE_ALIGNED(pdir), 0, PDP_SIZE * PAGE_SIZE);
https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c?r=3D1.423#2702
Together with the uvm_fault_internal(map, va, flags) line in the stack
trace, we see this is a null pointer dereference:
uvm_fault_internal(ce5820e8,0,2,0,ffffffff,ffffffff,4000,1723,f4023cd4,ce58=
20e8) at uvm_fault_internal+0xcf
> OK, this isn't disass from pmap_ctor+0xb7, but I thought this
> would be easier to read (marked +0xb7 with <--):
>=20
> (gdb) disass pmap_ctor
> Dump of assembler code for function pmap_ctor:
> [...]
> 0xc049e659 <+174>: mov $0x1000,%ecx <-- +0xb7
> 0xc049e65e <+179>: mov %ebx,%edi
> 0xc049e660 <+181>: xor %eax,%eax
> 0xc049e662 <+183>: rep stos %eax,%es:(%edi)
What you marked is 0xae=3D174, not 0xb7=3D183. 0xb7=3D183 is the REP STOS
instruction, which is to say, memset.
In other words, this is consistent with my hypothesis that:
1. the fault is memset of null
2. the pointer which is null came from pool_get(&pmap_pdp_pool,
PR_WAITOK) which should never return null, unless...
3. the backing allocator for pmap_pdp_pool fails and returns null
despite being passed PR_WAITOK, which...
4. can happen if this is a PAE kernel (as it is) so the backing
allocator is pmap_pdp_alloc which calls uvm_km_alloc without
UVM_KMF_WAITVA, and...
5. RAM is currently short so uvm_km_alloc would have to sleep and wait
for the pagedaemon to free pages before it can return, in which
case it returns null instead of sleeping because pmap_pdp_alloc
didn't pass UVM_KMF_WAITVA.
The patch attempts to fix this by passing UVM_KMF_WAITVA in
pmap_pdp_alloc so it sleeps instead of failing in this case.
I also started to review the tree for other cases of uvm_km_alloc that
don't pass either UVM_KMF_WAITVA or UVM_KMF_NOWAIt, and, hoo boy,
there's a lot of potential issues here in obscure corners like acorn32
which I have no hope of testing myself.
Home |
Main Index |
Thread Index |
Old Index