Subject: Re: nfs vs pagedaemon
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 04/08/2003 10:00:40
--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

hi,

yea, this makes sense.  the more general description of the problem
is that nfs_strategy() should only use the commit mechanism when it's
being called from VOP_PUTPAGES() on an NFS vnode, but there are other
ways that nfs_strategy() can be called.  NFS swap files is one way we
can go wrong, but there are others, such as using an NFS file for
a vnd backing file.  one way to fix this more completely would be to
have the NFS VOP_PUTPAGES() path add an extra flag to its buffers
that nfs_strategy() can look at to decide if it should use stable writes.
I had written this up a while back but didn't commit it for some reason.
the diff is attached if you want to pursue it.

to allow unstable writes for NFS swap files, it may be possible to
transfer ownership of the pages being paged out from the original
anon or aobj to the NFS vnode before sending the data to the server.
I haven't thought it through though, so this may not work out very well.

-Chuck


On Mon, Apr 07, 2003 at 10:50:16PM +0900, YAMAMOTO Takashi wrote:
> i found the problem.
> because the file offset that corresponds to the page changes
> in the case of swap and we're using byte-range of the file
> to track commited/uncommited, we can't do unstable writes+commit for swap.
> 
> the attached patch simply attempts stable writes in the case of pageout.
> i'll commit this if no one objects.
> thanks.
> 
> YAMAMOTO Takashi


--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.nfs-unstable"

Index: kern/vfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.85
diff -u -r1.85 vfs_bio.c
--- kern/vfs_bio.c	2002/09/06 13:23:52	1.85
+++ kern/vfs_bio.c	2003/01/01 23:18:51
@@ -424,7 +424,7 @@
 	}
 
 	/* Otherwise, the "write" is done, so mark and release the buffer. */
-	CLR(bp->b_flags, B_NEEDCOMMIT|B_DONE);
+	CLR(bp->b_flags, B_DONE);
 	splx(s);
 
 	brelse(bp);
Index: miscfs/genfs/genfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/miscfs/genfs/genfs_vnops.c,v
retrieving revision 1.68
diff -u -r1.68 genfs_vnops.c
--- miscfs/genfs/genfs_vnops.c	2002/11/15 14:01:57	1.68
+++ miscfs/genfs/genfs_vnops.c	2003/01/01 23:18:53
@@ -1392,7 +1392,8 @@
 	mbp->b_bufsize = npages << PAGE_SHIFT;
 	mbp->b_data = (void *)kva;
 	mbp->b_resid = mbp->b_bcount = bytes;
-	mbp->b_flags = B_BUSY|B_WRITE|B_AGE| (async ? (B_CALL|B_ASYNC) : 0);
+	mbp->b_flags = B_BUSY | B_WRITE | B_AGE | B_PAGEIO |
+	    (async ? (B_CALL|B_ASYNC) : 0);
 	mbp->b_iodone = uvm_aio_biodone;
 	mbp->b_vp = vp;
 	LIST_INIT(&mbp->b_dep);
@@ -1430,7 +1431,8 @@
 			bp->b_data = (char *)kva +
 			    (vaddr_t)(offset - pg->offset);
 			bp->b_resid = bp->b_bcount = iobytes;
-			bp->b_flags = B_BUSY|B_WRITE|B_CALL|B_ASYNC;
+			bp->b_flags = B_BUSY | B_WRITE | B_CALL | B_ASYNC |
+			    B_PAGEIO;
 			bp->b_iodone = uvm_aio_biodone1;
 			bp->b_vp = vp;
 			LIST_INIT(&bp->b_dep);
@@ -1633,7 +1635,7 @@
 	bp = pool_get(&bufpool, PR_WAITOK);
 	splx(s);
 
-	bp->b_flags = B_BUSY | B_WRITE | B_AGE;
+	bp->b_flags = B_BUSY | B_WRITE | B_AGE | B_PAGEIO;
 	bp->b_vp = vp;
 	bp->b_lblkno = offset >> vp->v_mount->mnt_fs_bshift;
 	bp->b_data = (char *)kva;
Index: nfs/nfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_bio.c,v
retrieving revision 1.85
diff -u -r1.85 nfs_bio.c
--- nfs/nfs_bio.c	2002/10/29 10:15:16	1.85
+++ nfs/nfs_bio.c	2003/01/01 23:18:57
@@ -985,21 +985,25 @@
 		bp->b_error = error;
 	    }
 	} else {
-	    int i, npages = bp->b_bufsize >> PAGE_SHIFT;
+	    int i, npages = (bp->b_bcount + PAGE_MASK) >> PAGE_SHIFT;
 	    struct vm_page *pgs[npages];
 	    boolean_t needcommit = TRUE;
 
-	    if ((bp->b_flags & B_ASYNC) != 0 && NFS_ISV3(vp)) {
+	    if ((~bp->b_flags & (B_ASYNC | B_PAGEIO)) == 0 &&
+		NFS_ISV3(vp)) {
 		    iomode = NFSV3WRITE_UNSTABLE;
 	    } else {
 		    iomode = NFSV3WRITE_FILESYNC;
+		    needcommit = FALSE;
 	    }
 
-	    for (i = 0; i < npages; i++) {
-		    pgs[i] = uvm_pageratop((vaddr_t)bp->b_data +
-					   (i << PAGE_SHIFT));
-		    if ((pgs[i]->flags & PG_NEEDCOMMIT) == 0) {
-			    needcommit = FALSE;
+	    if (needcommit) {
+		    for (i = 0; i < npages; i++) {
+			    pgs[i] = uvm_pageratop((vaddr_t)bp->b_data +
+						   (i << PAGE_SHIFT));
+			    if ((pgs[i]->flags & PG_NEEDCOMMIT) == 0) {
+				    needcommit = FALSE;
+			    }
 		    }
 	    }
 	    if (!needcommit && iomode == NFSV3WRITE_UNSTABLE) {
Index: sys/buf.h
===================================================================
RCS file: /cvsroot/src/sys/sys/buf.h,v
retrieving revision 1.55
diff -u -r1.55 buf.h
--- sys/buf.h	2002/10/06 17:05:56	1.55
+++ sys/buf.h	2003/01/01 23:18:57
@@ -193,7 +193,7 @@
  * These flags are kept in b_flags.
  */
 #define	B_AGE		0x00000001	/* Move to age queue when I/O done. */
-#define	B_NEEDCOMMIT	0x00000002	/* Needs committing to stable storage */
+#define	B_PAGEIO	0x00000002	/* Data is file system page cache */
 #define	B_ASYNC		0x00000004	/* Start I/O, do not wait. */
 #define	B_BAD		0x00000008	/* Bad block revectoring in progress. */
 #define	B_BUSY		0x00000010	/* I/O in progress. */

--FL5UXtIhxfXey3p5--