current-users: NFS lossage with diskless machines perhaps found

Subject: NFS lossage with diskless machines perhaps found
To: None <current-users@NetBSD.ORG>
From: Matthias Drochner <drochner@zelux6.zel.kfa-juelich.de>
List: current-users
Date: 06/11/1997 21:16:32

There were problems with diskless machines crashing in
bootstrap. I couldn't reproduce the problem for a long time,
but today, while the nfs server was somehow overloaded, it
happened. It happened again when I run heavy jobs at the
server, so I it's probably related to to the request-reply latency
(which is not too far away from the prevous assuptions like
strange packets aimed at the booting host, multiple interfaces
or odd netmasks...).
Here is how it happens:

-a process (was syslogd for me) calls nfs_reply(),
   this (line 650) calls nfs_receive(), this (line 599) calls
  soreceive()
-no data available in socket buffer (triggered by the latency),
  so it sleeps waiting for socket data (line 607)
-in the meantime, in /etc/rc, "mount -a -t nfs" is called
  which remounts the nfs file systems
-parameters have changed, nfs_decode_args() (line 546)
  makes a new socket connection, calling nfs_disconnect()
  first
-this (line 327) calls soshutdown(), this (line 813) calls
  sorflush(), this (line 832) calls socantrcvmore()
-here the socket is set to SS_CANTRCVMORE and
  the process waiting for socket data is awakened
-soreceive() terminates without data (line 588), its
  a end-of-file condition
-nfs_receive() returns without data and without error
-nfs_reply dereferences the NULL mbuf pointer
  (in line 675)

So the reason is that nfs_disconnect() is called without
acquiring the locks on the socket. Note that nfs_sndlock(),
as written in comments to nfs_reconnect(), is not sufficient
- both send and receive locks are necessary because sockets
can block in receive.

I hope this is the whole story. Too bad that the existing
nfs_rcvlock() and nfs_sndlock() cannot be directly
called from nfs_decode_args() because they need a
struct nfsreq* argument - so I can't offer a nice small patch.

It's at least easy to check if this is the same problem
like other people have:
place a
printf("nfs_disconnect: flag=%x\n", nmp->nm_flag);
in nfs_disconnect(). If the crash (or this printout with the
recent workaround) is 1:1 correlated with nm_flag bit
NFSMNT_RCVLOCK set, it's the same.

best regards
Matthias Drochner