Another UVM wire page count underflow panic

To: current-users%netbsd.org@localhost
Subject: Another UVM wire page count underflow panic
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Sat, 16 Mar 2019 02:11:10 +0700

Kamil sent me a pointer to this ...

https://syzkaller.appspot.com/bug?id=71f9271d761f5b6ed517a18030dc04f0135e6179

That's a syzkaller generated test that is triggering the new KASSERT()
added to detect the (lower layer) wired count on a pmap entry underflowing
(going from 0 to 65535).

They also have a C code sequence which reproduces this issue, which
is built out of syscall(SYS_...) but to paraphrace it in regular
sys calls (nb: this code only exists in this e-mail, don't expect it
to compile...)

	mmap(0x20000000, 0x1000000, PROT_READ|PROT_WRITE,
		MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0);
	munmap(0x20ffc000, 0x4000);

ie: mmap a big area, and then unmap the last 16KB of that.   This setup
might be important to trigger the bug, but I cannot see how.   I haven't
tried omitting it, but it really doesn't seem to be important.

	id = shmget(IPC_PRIVATE, 0x4000, 0x232);

Not sure what the 0x232 maps into in terms of flags, but I also don't
think it matters.   Their code actually has a 4th arg, 0x20ffc000,
which is maybe used on linux?   We don't use it...

	shmat(id, 0x20ffe000, SHM_RDONLY|SHM_RND);

that is, map in the page as the last page of what used to me the mmap()
block (which was now just unmapped VA space).

	shmctl(id, SHM_LOCK, NULL);

Wire down the shm segment.

	munlockall();

unwire all wired memory

	shmctl(id, SHM_UNLOCK, NULL);
	shmctl(id, SHM_UNLOCK, NULL);

unwire the shared memory segment.   Twice.

It is those last 4 calls (and the mixing of an mlock(0 related call
with shm*() calls, which creates the problem.

What's happening this time is:

shmctl(SHM_LOCK) does 2 things (asied from boilerplate, finding the
entry, verifying permission, blah blah) in shm_memlock (kern/sysv_shm.c)

                        error = uvm_obj_wirepages(shmseg->_shm_internal,
                            0, size, NULL);
                        error = uvm_map_pageable(&p->p_vmspace->vm_map,
                            shmmap_se->va, shmmap_se->va + size, false, 0);

(with checking for error != 0 etc).

The first of those grabs a reference to the pages in the region,
increasing their wired count.

The second finds the UVM map entry for the object, makes that wired,
and as its wired count is just moving from 0 to 1, calls uvm_obj_wirepages()
to wire doen the pages.

That's correct I think, and reflects a difference bewteen shm and mlock().
One munlock() undoes any number of mlock(), but each SHM_LOCK needs a
matching SHM_UNLOCK to undo the wiring.   Hence, the page wire count
needs to increase in mlock() only when it is the first mlock() for the
region (uvm_map_pageable() takes care of that).   But for SHM_LOCK
we need one wired_count for each SHM_LOCK - so one uvm_obj_wirepages()
for each SHM_LOCK.   Or at least that is how I presume it is intended to
work based upon the way the code is written - our man page says nothing
about it, one way or the other, and POSIX has no SHM_*LOCK operations in
its shmctl() function - they're an extension.

Anyway, I have no issues with what happens there.

Then munlockall() goes through the uvm map, unwiring every region
that has been wired, including the one that is backing the shm page.
That clears the UVM map entry wired_count entry (ie: sets it to 0)
and decrements the wired_count on teh pmap entries at the lower layer.
(which causes them to go from 2 to 1 - shm still has its reference).

Then shmctl(SHM_UNLOCK) unwires the region.   Its code is (effectively)

		if (shmseg->shm_perm.mode & SHMSEG_WIRED) != 0) {
                        uvm_obj_unwirepages(shmseg->_shm_internal, 0, size);
                        error = uvm_map_pageable(&p->p_vmspace->vm_map,
                            shmmap_se->va, shmmap_se->va + size, true, 0);
                        if (error)
                                return EIO;
                        shmseg->shm_perm.mode &= ~SHMSEG_WIRED;
		}

(again from shm_memlock() in kern/sysv_shm.c)

The uvm_obj_unwirepages() drops the reference count on all the
pages (pmap entries) that back the shared memory segment.  That will
make them go from 1 to 0.

uvm_map_pageable() unwires the UVM map entry.  If it was not wired,
it returns an error, and that is what has happens here, as the entry
was already unwired by the munlockall() operation.

The test program (the crash reproducer) ignores the error.

Then the next shmctl(SHM_UNLOCK) - this one does through the same
code, finds SHMSEG_WIRED still set, and calls uvm_obj_unwirepages()
again, which (would have, before the new KASSERT) changed those
pages wired_count from 0 to 65525.   Not we panic() instead.  And
that is what happened.

I think the problem here is that the shm_memlock() code is misinterpreting
the error return from uvm_map_pageable().   When unwiring pages, the
only things that can cause that to fail (return an error code) are if
the addresses are bogus (there is no UVM map entry, or entries, for them)
which I think cannot happen here, as shm is managing its own entries
(what would happen should the code munmap() the region assigned by shm()
I have no idea - I doubt it would be good) - of if the pages are not
(still) wired.   In both of those cases, there is no wired shm
region any more, so what I plan on doing to fix this is:

Index: sysv_shm.c
===================================================================
RCS file: /cvsroot/src/sys/kern/sysv_shm.c,v
retrieving revision 1.133
diff -u -r1.133 sysv_shm.c
--- sysv_shm.c	21 Feb 2019 03:37:19 -0000	1.133
+++ sysv_shm.c	15 Mar 2019 18:05:05 -0000
@@ -294,9 +294,9 @@
 			uvm_obj_unwirepages(shmseg->_shm_internal, 0, size);
 			error = uvm_map_pageable(&p->p_vmspace->vm_map,
 			    shmmap_se->va, shmmap_se->va + size, true, 0);
+			shmseg->shm_perm.mode &= ~SHMSEG_WIRED;
 			if (error)
 				return EIO;
-			shmseg->shm_perm.mode &= ~SHMSEG_WIRED;
 		}
 	}
 
That is, regardless of the error, turn off the SHMSEG_WIRED flag, so we
never attempt to unwire it again.   Since we have done the
uvm_obj_unwirepages() and decremented the pmap's wired_counts already.
That is, the wiring is undone and an error return from uvm_map_pageable()
doesn't alter that.

That seems to prevent this particular crash anyway, so unless anyone has
any good reason why not, I will make that change quite soon.

However, the relationship between the shm() functions and the uvm
(m*() functions) looks to be something of a mess - we really cannot keep
a state where by doing bizarre combinations of calls from those two
sets, which internally seem to not know much about each other, can
either make a mess of the kernel data structures, or crash the kernel.

kre

ps: I also suspect that after a wire/unwire operation has been done
to a found shm segment, the code should probably just break out of the
loop - it should not find two segments with the same ID should it?
Not breaking just means scanning the rest of the list if I am right
about that - harmless - but it would be a bit of a waste of time,
especially when there are many shm segments around.

Follow-Ups:
- Re: Another UVM wire page count underflow panic
  - From: Robert Elz
- Re: Another UVM wire page count underflow panic
  - From: Robert Elz
- Re: Another UVM wire page count underflow panic
  - From: Jason Thorpe

Prev by Date: Re: strange libc symbol renaming issue
Next by Date: Re: Another UVM wire page count underflow panic
Previous by Thread: strange libc symbol renaming issue
Next by Thread: Re: Another UVM wire page count underflow panic
Indexes:

Home | Main Index | Thread Index | Old Index