Subject: Re: uvm_vslock / uvm_vsunlock problems
To: Stephan Uphoff <ups@stups.com>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 07/20/2003 08:30:50
hi,

On Mon, Jul 14, 2003 at 11:52:00AM -0400, Stephan Uphoff wrote:
> > On Thu, Jul 10, 2003 at 07:30:37PM -0400, Stephan Uphoff wrote:
> > > uvm_vslock() is used in physio and sys___sysctl to wire user process memory.
> > > uvm_vsunlock() later unwires the memory if it is not in a wired map entry.
> > > 
> > > Problem 1:
> > > -----------
> > > uvm_vslock assumes that once a page is wired - it will stay wired until
> > > uvm_vsunlock (potentially) unwires it.
> > > This is not true for mmaped file pages when the file is truncated.
> > > (All managed mappings are released - page is freed)
> > 
> > yup, that's a problem.  one approach would be to have uvm_vslock() leave
> > the pages PG_BUSY in addition to wiring them (and of course to have
> > uvm_vsunlock() unbusy them as well).  it looks like all the callers of
> > uvm_fault_wire(VM_FAULT_WIRE) would be ok with this behaviour
> > (possibly modulo assertions), so I'd suggest changing the behaviour
> > of uvm_fault() in the VM_FAULT_WIRE case.  there would need to be another
> > argument to uvm_fault_unwire() / uvm_fault_unwire_locked() to indicate
> > the fault type used in uvm_fault_wire() (so it can tell whether the pages
> > should be unbusied or not).
> > 
> > this has the potential to introduce deadlocks but I can't think of
> > a problem off the top of my head.
> 
> Unfortunately your deadlock suspicions are correct.
> Examples:
> uvm_vslock tries to lock a VA range that is currently mapped (in order) to 
> pages p1,p2.
> - This would cause a self-deadlock with p1 == p2.
> - If p1 and p2 belong to the same vm-object 
>   and object offset of p1 > object offset of p2
>   uvm_vslock can deadlock with various uvm fault and flushing functions.
>   ( Page busy order violation )

ah right, there is it.  :-)

I suspect we'll have to move from locking pages just with PG_BUSY
to using some shared/exclusive style of page locking.
more thought is needed.

I looked to see what freebsd does in this case, they've added a "hold_count"
field in the page structure which keeps the page from being allocated to
another use when freed until the last hold is dropped.  their mechanism
for handling truncation of wired pages doesn't appear to use that, but
instead the page is left attached to the object.  I assume that this state
is detected if the file is later grown so that the page can be zeroed,
but I couldn't figure out how that part works.  at any rate, the hold_count
idea seems like it would take care of this set of problems with uvm_vslock().


> > > Problem 3:
> > > ----------
> > > uvm_vslock indirectly calls uvm_fault.
> > > If uvm_fault is called without an VM_PROT_WRITE 
> > > bit in accesstype to wire a loaned page it will not
> > > break the loan.
> > > This can cause the buffer created by physio to eventual map to the
> > > wrong pages and could break a KASSERT in uvm_pagefree.
> > 
> > callers of uvm_vslock() must include VM_PROT_WRITE in the access_type
> > if they're going to modify the pages, otherwise the caller has a bug.
> > physio() does the right thing here.
> 
> Sorry - this needs a better explanation.
> Probably the easiest way is an example:
> 
> 	Process A and B both map the same page P without COW.
>         ( Changes to P form A will be visible in B)
> 
> 	Process A writes P to a TCP socket causing P to be on loan to 
>         the kernel (TCP layer).
>  
> 	Process A writes P to a tape.
>         uvm_vslock calls uvm_fault without VM_PROT_WRITE.
> 
> 	< P is now wired and still loaned out >
> 
> 	While the tape write is in progress:
> 
> 		Process B tries to write to P -> this causes a page fault
> 		 breaking the loan. 
> 	
> 		P is now without an owner - but sill loaned out.
> 
> 		The kernel (TCP layer) releases the loan causing the page 
> 		to be on the free list.
> 	
> 		The in kernel buffer prepared by physio for the tape write 
> 		in Process A still points to P. (now on the free list)
> 
>          ...


ah, yea, this is a problem.  physio's reference to the page is only detectable
in that the page is wired, but that's not enough to keep the page from being
freed.  for this case it would suffice to use the loaning mechanism to keep
the page around, but that wouldn't help the physio read case.  it would be
best to use the same mechanism to handle both, ie. whatever we use to solve
the first problem above.

-Chuck