Subject: Re: kern/34959
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Julio M. Merino Vidal <jmmv@NetBSD.org>
List: netbsd-bugs
Date: 11/01/2006 18:10:03
The following reply was made to PR kern/34959; it has been noted by GNATS.

From: "Julio M. Merino Vidal" <jmmv@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/34959
Date: Wed, 1 Nov 2006 18:06:33 +0000

 Upon further investigation, I have found that the panic appears whenever
 the NFS server decides to call uvm_loanuobjpages, which in turn uses
 tmpfs_getpages.  As a result, an easier way to trigger the error is to
 copy, e.g., an executable file on the exported file system and then try
 to execute it from within the NFS mount point.  Following my previous
 example:
 
 	$ cp /bin/cp /mnt/tmpfs
 	$ cd /mnt/remote
 	$ ./cp
 	<< machine crashes >>
 
 I've been looking at several parts of the code and I suspect the problem
 is somewhere in the tmpfs_getpages function, maybe because it does not
 handle some corner case or something like that.  Maybe I'm completely
 wrong and the problem is not there.
 
 Anyway.  Let's assume for a moment that tmpfs_getpages is correct and
 that it needn't handle that corner case.  Then: uvm_loanuobjpages is
 called from nfsrv_read.  That routine does the following:
 
 	m = m_get(M_WAIT, MT_DATA);
 	MCLAIM(m, &nfs_mowner);
 	pgpp = m->m_ext.ext_pgs;
 
 	error = uvm_loanuobjpages(&vp->v_uobj, pgoff, npages, pgpp);
 
 I'm not sure at all, but this feels incorrect.  The code is passing
 the m_ext.ext_pgs to the uvm_loanuobjpages function, yet it seems to be
 uninitialized; at least, M_EXT_PAGES is not set in m->m_flags and
 ext_pgs[0] is always 0x30000000.  This supposedly-bogus value is later
 handed off to auo_get, which, seeing that the page pointer is not NULL,
 assumes it is valid and does not fetch it (see uvm_aobj.c 1.81 around
 line 1022).  As a result, the rest of the uvm_loanuobjpages handles
 this invalid pointer and ends up crashing.  Is this a bug?  Note that
 it will be exposed whenever that NFS code path ends up calling
 auo_get without PGO_LOCKED (because it will enter step 2).
 
 For now I've been able to workaround the problem by making the
 nfsrv_read function initialize the mbuf's ext_pgs field to NULL
 pointers and later passing PGO_ALLPAGES to the pgo_get call in
 uvm_loanuobjpages.  The former may be fine, but the latter is most
 likely not.
 
 Oh, and now I've just tried to implement the PGO_LOCKED handling case
 in tmpfs_getpages (based on genfs_getpages) and it hasn't solved the
 problem.  I thought it would because normal operation does not use
 auo_get's step 2, but rather terminates quickly in step 1.
 
 -- 
 Julio M. Merino Vidal <jmmv@NetBSD.org>