tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)



Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount combined with a mis-configured IPFilter that blocked all but the first fragment of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error (or so Taylor says) that a vnode lock may be held accross I/O, in this case, network I/O.

It should be reproducable with a default NFS mount and a
	block in all with frag-body
IPFilter rule and then trying to readdir.

Now, in some cases, the machine in question recovered after fixing the filter rules, in others, it didn't, forcing a reboot. This strikes me as a bug because the same lock-up could as well have been caused by network problems instead of ipfilter mis-configuration.

It looks like the operation to which the reply was lost sometimes doesn't get retried. Do we have some weird bug where the first fragment arriving stops the timeout but the blocking of the remaining fragments cause it to wedge?


Home | Main Index | Thread Index | Old Index