NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

To: tech-kern%netbsd.org@localhost
Subject: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
From: Edgar Fuß <ef%math.uni-bonn.de@localhost>
Date: Wed, 31 Jul 2019 10:45:21 +0200

Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount combined with a mis-configured IPFilter that blocked all but the first fragment of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error (or so Taylor says) that a vnode lock may be held accross I/O, in this case, network I/O.

It should be reproducable with a default NFS mount and a
	block in all with frag-body
IPFilter rule and then trying to readdir.

Now, in some cases, the machine in question recovered after fixing the filter rules, in others, it didn't, forcing a reboot. This strikes me as a bug because the same lock-up could as well have been caused by network problems instead of ipfilter mis-configuration.

It looks like the operation to which the reply was lost sometimes doesn't get retried. Do we have some weird bug where the first fragment arriving stops the timeout but the blocking of the remaining fragments cause it to wedge?

Follow-Ups:
- Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
  - From: Hauke Fath
- Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
  - From: Jason Thorpe
- Re: NFS lockup after UDP fragments getting lost
  - From: Greg Troxel

References:
- 8.1 tstile lockup after nfs send error 51
  - From: Edgar Fuß

Prev by Date: bootloaders handling of unknown console device
Next by Date: Re: NFS lockup after UDP fragments getting lost
Previous by Thread: Re: 8.1 tstile lockup after nfs send error 51
Next by Thread: Re: NFS lockup after UDP fragments getting lost
Indexes:

Home | Main Index | Thread Index | Old Index