Subject: UVM faults due to use of coda
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 04/07/2007 17:20:09
For a long time, I've had occasional crashes when copying files from
coda to ffs.  Because I was running X, and didn't have crash dumps set
up right, I was never clear on why.  I set up a crashbox with DDB and no
X, and was able to get a clean backtrace.

Coda clients are mostly in the userspace program venus, and has a
puffs-like interface (but just for coda) for the kernel to talk to
venus.  On open, venus passes a device/inode of a container file with
the actual contents in the cache to the kernel which does a VOP_OPEN on
it and redirects read/write.

I can provoke crashes in two ways:

  copy a file from coda to ffs.  This usually works, sometimes crashes.
  cp is using mmap.  The crash backtrace showed uvm_fault hitting an
  assertion during a fault in ufs_write.  I think this is from getting a
  read fault from the mapped coda file while doing copyin for the write
  system call on the regular file.  My theory is that if file blocks are
  not in vm then this happens, but if one has just cat'd the file or
  read it over the network then it will be in vm.

  execute a file in coda.  This seems to crash 100%.


The following patch to sys/uvm/uvm_fault, which turns two KASSERTs into
printfs, results in being able to execute programs.

Running a really simple program with one call to printf results in

uvm_fault 1 curpg->uobject 0xcb5a0558 uobj 0xcbaba350
uvm_fault 2 uobj 0xcbaba350 uobjpage->uobject 0xcb5a0558

Running a larger program (fsx from FreeBSD) results in

uvm_fault 1 curpg->uobject 0xcb59d0bc uobj 0xcbaba3f8
uvm_fault 1 curpg->uobject 0xcb59d0bc uobj 0xcbaba3f8
uvm_fault 1 curpg->uobject 0xcb59d0bc uobj 0xcbaba3f8
uvm_fault 1 curpg->uobject 0xcb59d0bc uobj 0xcbaba3f8
uvm_fault 1 curpg->uobject 0xcb59d0bc uobj 0xcbaba3f8
uvm_fault 2 uobj 0xcbaba3f8 uobjpage->uobject 0xcb59d0bc

I don't understand this code fully, but I think uvm_fault is objecting
to pages from the coda vnode being filled from the container vnode.

Any clues as to how to fix this cleanly? 



--- uvm_fault.c.~1.119.~	2007-03-02 08:11:36.000000000 -0500
+++ uvm_fault.c	2007-04-07 10:23:25.000000000 -0400
@@ -1075,7 +1075,13 @@ ReFault:
 				if (curpg == NULL || curpg == PGO_DONTCARE) {
 					continue;
 				}
+#if 0
 				KASSERT(curpg->uobject == uobj);
+#else
+				if (curpg->uobject != uobj)
+					printf("uvm_fault 1 curpg->uobject %p uobj %p\n",
+					       curpg->uobject, uobj);
+#endif
 
 				/*
 				 * if center page is resident and not
@@ -1607,7 +1613,13 @@ Case2:
 	 *  - at this point uobjpage could be PG_WANTED (handle later)
 	 */
 
+#if 0
 	KASSERT(uobj == NULL || uobj == uobjpage->uobject);
+#else
+	if (uobj != NULL && uobj != uobjpage->uobject)
+		printf("uvm_fault 2 uobj %p uobjpage->uobject %p\n",
+		       uobj, uobjpage->uobject);
+#endif
 	KASSERT(uobj == NULL || !UVM_OBJ_IS_CLEAN(uobjpage->uobject) ||
 	    (uobjpage->flags & PG_CLEAN) != 0);
 	if (promote == false) {