Subject: Re: uvm_vslock / uvm_vsunlock problems
To: Chuck Silvers <chuq@chuq.com>
From: Stephan Uphoff <ups@stups.com>
List: tech-kern
Date: 07/14/2003 11:52:00
Hi,

Chuck Silvers wrote:
> hi,
> 
> 
> On Thu, Jul 10, 2003 at 07:30:37PM -0400, Stephan Uphoff wrote:
> > Hi,
> > 
> > uvm_vslock() is used in physio and sys___sysctl to wire user process memory.
> > uvm_vsunlock() later unwires the memory if it is not in a wired map entry.
> > 
> > Problem 1:
> > -----------
> > uvm_vslock assumes that once a page is wired - it will stay wired until
> > uvm_vsunlock (potentially) unwires it.
> > This is not true for mmaped file pages when the file is truncated.
> > (All managed mappings are released - page is freed)
> 
> yup, that's a problem.  one approach would be to have uvm_vslock() leave
> the pages PG_BUSY in addition to wiring them (and of course to have
> uvm_vsunlock() unbusy them as well).  it looks like all the callers of
> uvm_fault_wire(VM_FAULT_WIRE) would be ok with this behaviour
> (possibly modulo assertions), so I'd suggest changing the behaviour
> of uvm_fault() in the VM_FAULT_WIRE case.  there would need to be another
> argument to uvm_fault_unwire() / uvm_fault_unwire_locked() to indicate
> the fault type used in uvm_fault_wire() (so it can tell whether the pages
> should be unbusied or not).
> 
> this has the potential to introduce deadlocks but I can't think of
> a problem off the top of my head.

Unfortunately your deadlock suspicions are correct.
Examples:
uvm_vslock tries to lock a VA range that is currently mapped (in order) to 
pages p1,p2.
- This would cause a self-deadlock with p1 == p2.
- If p1 and p2 belong to the same vm-object 
  and object offset of p1 > object offset of p2
  uvm_vslock can deadlock with various uvm fault and flushing functions.
  ( Page busy order violation )

> 
> 
> > Problem 2:
> > ----------
> > The uvm_vslock() / uvm_vsunlock() pair assumes only one thread 
> > of control per address space.
> > Because of clone(2) or scheduler activations (lwps) this is no longer always
> > true.
> 
> this would be taken care of by the above.
> 
> 
> > Problem 3:
> > ----------
> > uvm_vslock indirectly calls uvm_fault.
> > If uvm_fault is called without an VM_PROT_WRITE 
> > bit in accesstype to wire a loaned page it will not
> > break the loan.
> > This can cause the buffer created by physio to eventual map to the
> > wrong pages and could break a KASSERT in uvm_pagefree.
> 
> callers of uvm_vslock() must include VM_PROT_WRITE in the access_type
> if they're going to modify the pages, otherwise the caller has a bug.
> physio() does the right thing here.

Sorry - this needs a better explanation.
Probably the easiest way is an example:

	Process A and B both map the same page P without COW.
        ( Changes to P form A will be visible in B)

	Process A writes P to a TCP socket causing P to be on loan to 
        the kernel (TCP layer).
 
	Process A writes P to a tape.
        uvm_vslock calls uvm_fault without VM_PROT_WRITE.

	< P is now wired and still loaned out >

	While the tape write is in progress:

		Process B tries to write to P -> this causes a page fault
		 breaking the loan. 
	
		P is now without an owner - but sill loaned out.

		The kernel (TCP layer) releases the loan causing the page 
		to be on the free list.
	
		The in kernel buffer prepared by physio for the tape write 
		in Process A still points to P. (now on the free list)

         ...



Stephan