tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NFS lockup after UDP fragments getting lost

Edgar Fuß <> writes:

> Thanks to riastradh@, this tuned out to be caused by an (UDP, hard)
> HFS mount combined with a mis-configured IPFilter that blocked all but
> the first fragment of a fragmented NFS reply (e.g., readdir) combined
> with a NetBSD design error (or so Taylor says) that a vnode lock may
> be held accross I/O, in this case, network I/O.

Holding a vnode lock across IO seems like a bug to me too.  Marking the
vnode as having an in-process operation so others can
lock/read/report-that-status/unlock seems ok.  But I'm sure you already
know that vnode locking is hard.

> It looks like the operation to which the reply was lost sometimes
> doesn't get retried. Do we have some weird bug where the first
> fragment arriving stops the timeout but the blocking of the remaining
> fragments cause it to wedge?

Probably not.  fragments sit until there's a packet and then the packet
is sent to the stack.  So the NFS code is almost certainly totally
unaware of the arrival of the first fragment.

Home | Main Index | Thread Index | Old Index