tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr



On Sat, Jun 20, 2015 at 10:09:23PM +0200, Emmanuel Dreyfus wrote:
> Christos Zoulas <christos%astron.com@localhost> wrote:
> 
> > Ok, what is it supposed to do? Does it fail? Give up? Get interrupted
> > and keep looping?
> 
> The process stuck in tstile waiting for vnode lock is a consequence of
> the initial problem: ioflush stuck in cv_wait().
> 
> What about this: we introduce a mnt_timeo in struct mount, and use
> cv_timedwait() instead of cv_wait() in genfs_do_putpages(). If timeout
> expires, we get a failure: the page was not put to storage. 
> 
> mnt_timeo should have a sane default (which one) for all filesystems,
> not only NFS: that way we fix process stuck in tstile because of hardare
> failure (I already saw that).  For NFS we use the NFS timeout.
> 
> That way ioflush never holds a vnode lock forever, and umount -f should
> work.
> 
> Opinion?


we shouldn't need to change the genfs code to make "soft" work.
if the underlying RPCs time out and all the retries are exhausted,
the NFS code should report the error back to the genfs code by doing
the usual B_ERROR/b_error thing with the buffer, and the genfs code
should handle that by unlocking pages, etc, just like it would for
a failed write to a scsi or ata device, and eventually that should
percolate back up the stack until the cv_wait() returns.
does this not work currently?

the I/O timeout policy for disk-backed file systems is in the disk drivers,
and I don't think any of them are currently administratively tunable.
a generic I/O timeout-tuning interface which would encompass everything
(NFS, local disks, SAN-attached storage, iscsi, virtio, etc)
could be possible, it's worth a look at the drivers to see how much
commonality there is between their timeout/retry policy logic.

-Chuck


Home | Main Index | Thread Index | Old Index