Subject: root-on-nfs configs fail on busy Ethernets?
To: Frank van der Linden <frank@wins.uva.nl>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 05/05/1997 14:56:33
I recently set up a couple of P5/133s to boot `diskless' off a floppy
containing a kernel with a
	 "root ? type nfs"
config.  I've tried this floppy repeatedly, booting off
machines with:

	* a de-500
	* a 3c595-Tx
	* both a de-500 and a 3c595-TX

and in all cases I regularly get panics. But not _always_; I'm
guessing the problem is due at least partly to some other network
traffic that upsets the NFS-boot code.

Building with DDB and sniffing the traceback shows that nfs_reply() is
generating a bad pointer reference around line 667.  

The following patch fixes that for me; but I have no idea if it's
really correct, or what the offending traffic is.  (ntp chimes, perhaps?)

I've sent a PR, but I'd like to hear if anyone else has experienced
anything similar, or has any  clue what's going on in the NFS code.

(And apologies to Matt Thomas, it seems this wasn't a problem in the
de driver after all.)


*** nfs_socket.c.DIST	Wed Apr  9 04:23:02 1997
--- nfs_socket.c	Fri May  2 18:34:50 1997
***************
*** 663,668 ****
--- 663,679 ----
  		if (nam)
  			m_freem(nam);
  	
+ 
+ 		/* XXX multihomed machines lose? */
+ 		if (mrep == 0) {
+ 			printf("nfs_reply: null mbuf from nfs_receive()\n");
+ #if 0
+ 			return (0);
+ #else
+ 			continue;
+ #endif
+ 		}
+ 
  		/*
  		 * Get the xid and check that it is an rpc reply
  		 */