NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58666: panic: lock error: Reader / writer lock: rw_vector_enter,357: locking against myself



> Date: Sun, 08 Sep 2024 13:42:24 +0200 (CEST)
> From: Havard Eidnes <he%NetBSD.org@localhost>
> 
> Doesn't look like addr2line wants to play ball:
> 
> fxxxx# /usr/tools/bin/i486--netbsdelf-addr2line -e ./netbsd.2.gdb -i pmap_ctor+0xb7
> ??:0

Huh, I guess something was fixed in binutiles between netbsd-10 and
HEAD.  Anyway, gdb gives this to us.

> fxxxx# gdb -q netbsd.2.gdb
> ...
> (gdb) info line *(pmap_ctor+0xb7)
> Line 2702 of "/usr/src/sys/arch/x86/x86/pmap.c"
>    starts at address 0xc049e659 <pmap_ctor+174>
>    and ends at 0xc049e664 <pmap_ctor+185>.

This is:

   2688 static void
   2689 pmap_pdp_init(pd_entry_t *pdir)
   2690 {
...
   2702 	memset(PAGE_ALIGNED(pdir), 0, PDP_SIZE * PAGE_SIZE);

https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c?r=1.423#2702

Together with the uvm_fault_internal(map, va, flags) line in the stack
trace, we see this is a null pointer dereference:

uvm_fault_internal(ce5820e8,0,2,0,ffffffff,ffffffff,4000,1723,f4023cd4,ce5820e8) at uvm_fault_internal+0xcf

> OK, this isn't disass from pmap_ctor+0xb7, but I thought this
> would be easier to read (marked +0xb7 with <--):
> 
> (gdb) disass pmap_ctor
> Dump of assembler code for function pmap_ctor:
> [...]
>    0xc049e659 <+174>:     mov    $0x1000,%ecx   <-- +0xb7
>    0xc049e65e <+179>:     mov    %ebx,%edi
>    0xc049e660 <+181>:     xor    %eax,%eax
>    0xc049e662 <+183>:     rep stos %eax,%es:(%edi)

What you marked is 0xae=174, not 0xb7=183.  0xb7=183 is the REP STOS
instruction, which is to say, memset.

In other words, this is consistent with my hypothesis that:

1. the fault is memset of null

2. the pointer which is null came from pool_get(&pmap_pdp_pool,
   PR_WAITOK) which should never return null, unless...

3. the backing allocator for pmap_pdp_pool fails and returns null
   despite being passed PR_WAITOK, which...

4. can happen if this is a PAE kernel (as it is) so the backing
   allocator is pmap_pdp_alloc which calls uvm_km_alloc without
   UVM_KMF_WAITVA, and...

5. RAM is currently short so uvm_km_alloc would have to sleep and wait
   for the pagedaemon to free pages before it can return, in which
   case it returns null instead of sleeping because pmap_pdp_alloc
   didn't pass UVM_KMF_WAITVA.

The patch attempts to fix this by passing UVM_KMF_WAITVA in
pmap_pdp_alloc so it sleeps instead of failing in this case.

I also started to review the tree for other cases of uvm_km_alloc that
don't pass either UVM_KMF_WAITVA or UVM_KMF_NOWAIt, and, hoo boy,
there's a lot of potential issues here in obscure corners like acorn32
which I have no hope of testing myself.


Home | Main Index | Thread Index | Old Index