tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr



FWIW, I have had a problem with my server getting stuck in "tstile". I could not reproduce the problem easily, but I saw it in production often enough that it was a headache.  The Intel port (as opposed to PPC) seems not to have the problem.

If there is no timeout on this loop, and it theoretically only has a problem on HW errors, I have doubts. The machine with the hangs does not have any other symptoms of HW errors. HOWEVER, I have a persistent suspicion that the PPC port drops interrupts on occasion. Just sayin.

If this hang happens, I think a panic is far better than a hang. What I would see is the machine lock up hard, with zillions of processes "stuck" in tstile, and no new procs could start. If I caught this early, I could get a couple of ps outputs done. Otherwise, I could get into the kernel debugger - sometimes.

Just my opinion.

-dgl-

On Jun 20, 2015, at 3:27 PM, Christos Zoulas <christos%zoulas.com@localhost> wrote:

> On Jun 20, 10:09pm, manu%netbsd.org@localhost (Emmanuel Dreyfus) wrote:
> -- Subject: Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr
> 
> | Christos Zoulas <christos%astron.com@localhost> wrote:
> | 
> | > Ok, what is it supposed to do? Does it fail? Give up? Get interrupted
> | > and keep looping?
> | 
> | The process stuck in tstile waiting for vnode lock is a consequence of
> | the initial problem: ioflush stuck in cv_wait().
> | 
> | What about this: we introduce a mnt_timeo in struct mount, and use
> | cv_timedwait() instead of cv_wait() in genfs_do_putpages(). If timeout
> | expires, we get a failure: the page was not put to storage. 
> | 
> | mnt_timeo should have a sane default (which one) for all filesystems,
> | not only NFS: that way we fix process stuck in tstile because of hardare
> | failure (I already saw that).  For NFS we use the NFS timeout.
> | 
> | That way ioflush never holds a vnode lock forever, and umount -f should
> | work.
> | 
> 
> This is not that simple. There is at least one more place where
> it does while (vp->v_numoutput != 0) cv_wait().. And I am not
> sure what happens if you make VOP_PUTPAGES timeout.
> 
> christos



Home | Main Index | Thread Index | Old Index