Re: NFS lockup after UDP fragments getting lost

To: Edgar Fuß <ef%math.uni-bonn.de@localhost>
Subject: Re: NFS lockup after UDP fragments getting lost
From: Greg Troxel <gdt%lexort.com@localhost>
Date: Wed, 31 Jul 2019 08:54:20 -0400

Edgar Fuß <ef%math.uni-bonn.de@localhost> writes:

> Thanks to riastradh@, this tuned out to be caused by an (UDP, hard)
> HFS mount combined with a mis-configured IPFilter that blocked all but
> the first fragment of a fragmented NFS reply (e.g., readdir) combined
> with a NetBSD design error (or so Taylor says) that a vnode lock may
> be held accross I/O, in this case, network I/O.

Holding a vnode lock across IO seems like a bug to me too.  Marking the
vnode as having an in-process operation so others can
lock/read/report-that-status/unlock seems ok.  But I'm sure you already
know that vnode locking is hard.

> It looks like the operation to which the reply was lost sometimes
> doesn't get retried. Do we have some weird bug where the first
> fragment arriving stops the timeout but the blocking of the remaining
> fragments cause it to wedge?

Probably not.  fragments sit until there's a packet and then the packet
is sent to the stack.  So the NFS code is almost certainly totally
unaware of the arrival of the first fragment.

References:
- 8.1 tstile lockup after nfs send error 51
  - From: Edgar Fuß
- NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
  - From: Edgar Fuß

Prev by Date: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
Next by Date: Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
Previous by Thread: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
Next by Thread: Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)
Indexes:

Home | Main Index | Thread Index | Old Index