Subject: panic in nfs_bioread
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 01/27/2005 13:42:58
With latest kernel sources on a diskless MIPS board, I'm seeing panics
like this:

trap: TLB miss (load or instr. fetch) in kernel mode
status=0x7f03, cause=0x8, epc=0x80074380, vaddr=0xc9475ff8
pid=346 cmd=tcsh usp=0x7fff78a8 ksp=0xcaf11cd0
Stopped in pid 346.1 (tcsh) at  netbsd:nfs_bioread+0x830: lw a1,-8(v0)

when I try to log in remotely with ssh.

This is being caused by the "invalid cache: ..." printf in
nfs_bioread().  It looks like NFS_GETCOOKIE() to trying to do a read
past the end of currently mapped kernel memory and failing.  In the
check on the previous line, we don't use NFS_GETCOOKIE(pdp) unless en is
greater than zero, so I've just added that check before trying to print
the cookie, and that seems to have fixed the problem so far.

After this change, I now see during boot to multi-user login prompt:

invalid cache: 0xc93f6000 0xc93f6000 0xc93f6000 off 80000 no cookie
192.168.0.42:/tftpboot/rhone.root: inaccurate wcc data (ctime) detected, disabling wcc
invalid cache: 0xc93f8000 0xc93f8000 0xc93f8000 off 80000 no cookie
invalid cache: 0xc943c000 0xc943c000 0xc943c000 off 80000 no cookie
invalid cache: 0xc9440000 0xc9440000 0xc9440000 off 80000 no cookie

and then when I log in remotely using ssh:

invalid cache: 0xc946a000 0xc946a000 0xc946a000 off a0000 no cookie
invalid cache: 0xc946c000 0xc946c000 0xc946c000 off 20000 no cookie
invalid cache: 0xc946e000 0xc946e000 0xc946e000 off a0000 no cookie
invalid cache: 0xc9470000 0xc9470000 0xc9470000 off 20000 no cookie
invalid cache: 0xc9478000 0xc9478000 0xc9478000 off 20000 no cookie
invalid cache: 0xc947a000 0xc947a000 0xc947a000 off 40000 no cookie
invalid cache: 0xc9484000 0xc9484000 0xc9484000 off 1a0000 no cookie
invalid cache: 0xc947c000 0xc947c000 0xc947c000 off a0000 no cookie
invalid cache: 0xc947e000 0xc947e000 0xc947e000 off 20000 no cookie
invalid cache: 0xc9480000 0xc9480000 0xc9480000 off 20000 no cookie
invalid cache: 0xc9482000 0xc9482000 0xc9482000 off 40000 no cookie
invalid cache: 0xc948e000 0xc948e000 0xc948e000 off 1a0000 no cookie
invalid cache: 0xc949e000 0xc949e000 0xc949e000 off 80000 no cookie
invalid cache: 0xc9492000 0xc9492000 0xc9492000 off a0000 no cookie
invalid cache: 0xc9494000 0xc9494000 0xc9494000 off 20000 no cookie
invalid cache: 0xc9496000 0xc9496000 0xc9496000 off 20000 no cookie
invalid cache: 0xc9498000 0xc9498000 0xc9498000 off 40000 no cookie
invalid cache: 0xc94ae000 0xc94ae000 0xc94ae000 off 120000 no cookie


I recall seeing messages like this in the dim and distant past, but
haven't seen them for a long time, and I can't recall the conditions
when I saw them previously.  Is there anything we can do to fix the
problem, or maybe put them under a separate NFS_DEBUG?

Cheers,
Simon.
--
Simon Burge                                   <simonb@wasabisystems.com>
NetBSD Development, Support and Service:   http://www.wasabisystems.com/

Index: nfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_bio.c,v
retrieving revision 1.125
diff -d -p -u -r1.125 nfs_bio.c
--- nfs_bio.c	26 Jan 2005 10:30:58 -0000	1.125
+++ nfs_bio.c	27 Jan 2005 02:22:39 -0000
@@ -322,10 +322,14 @@ diragain:
 		if ((caddr_t)dp >= edp || (caddr_t)dp + dp->d_reclen > edp ||
 		    (en > 0 && NFS_GETCOOKIE(pdp) != ndp->dc_cookie)) {
 #ifdef DEBUG
-		    	printf("invalid cache: %p %p %p off %lx %lx\n",
-				pdp, dp, edp,
-				(unsigned long)uio->uio_offset,
-				(unsigned long)NFS_GETCOOKIE(pdp));
+		    	printf("invalid cache: %p %p %p off %lx", pdp, dp, edp,
+			    (unsigned long)uio->uio_offset);
+			if (en > 0)
+				printf(" %lx",
+				    (unsigned long)NFS_GETCOOKIE(pdp));
+			else
+				printf(" no cookie");
+			printf("\n");
 #endif
 			nfs_putdircache(np, ndp);
 			brelse(bp);