kern/56952: UVM deadlock in madvise vs. munmap

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/56952: UVM deadlock in madvise vs. munmap
From: dholland%NetBSD.org@localhost
Date: Wed, 3 Aug 2022 20:20:01 +0000 (UTC)

>Number:         56952
>Category:       kern
>Synopsis:       UVM deadlock in madvise vs. munmap
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 03 20:20:00 +0000 2022
>Originator:     David A. Holland
>Release:        NetBSD 9.99.97 (20220602)
>Organization:
>Environment:
System: NetBSD valkyrie 9.99.97 NetBSD 9.99.97 (VALKYRIE_LOCKDEBUG) #1: Wed Jun 22 23:56:00 EDT 2022  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE_LOCKDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:

I have a few times hit a deadlock while running some database stress
tests, and today caught it with UVM_PAGE_TRKOWN enabled.

The dead state is as follows:

Thread 1 is in madvise(MADV_DONTNEED) and is holding a read lock on
the process's map. It is waiting in putpages to chuck one of the pages.

Thread 2 is in uvm_fault_internal; it is holding the page and trying
to get a read lock on the map.

Thread 3 is in munmap; it is waiting for a write lock on the map, and
that converts this into a deadlock.

(This is all in one process.)

Taylor constructed the following narrative for how it got this way
(any transcription errors are my fault):

<Riastradh> Presumably you have an object foo which is mapped at
   0xdeadbee000 in the address space
<Riastradh> 1. Someone tried to read from page 0xdeadbef000, say,
   which is the range [0x1000, 0x2000) in foo.
<Riastradh> They consulted the map which determined that range in foo.
<Riastradh> They released the map lock, then allocated a page and
   punched it into foo, and they want to reacquire the map lock to
   punch it into the pmap.
<Riastradh> 2. Someone else tried to madvise(MADV_DONTNEED) some
   range, say [0xdeadbee000, 0xdeadbf6000), in foo, and chuck all the
   pages.
<Riastradh> Took the map read lock to that 0xdeadbef000 is mapped to
   foo@0x1000, entered genfs_io_chuck_all_the_pages or whatever, and
   then started waiting for the page that (1) allocated for
   foo@0x1000.
<Riastradh> Except I got the order wrong again and this last player
   actually started first, but whatever.
<Riastradh> 3. At the same time, someone else tried to unmap
   0xdeadbef000, which requires taking a _write_ lock.
<Riastradh> which threw a wrench in the whole thing
<Riastradh> So, one obvious possibility is: make uvm_map_clean drop
   the map lock while doing genfs_io_chuck_all_the_pages.
<Riastradh> (pgo_put)


>How-To-Repeat:

>Fix:

Oof.

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: Re: kern/56669 (crash at MegaRAID SAS 9341-8i)
Previous by Thread: bin/56951: mtree for ./etc/localtime missing link= attribute
Next by Thread: Re: kern/56950 (Race in vnode klist destruction?)
Indexes:

Home | Main Index | Thread Index | Old Index