tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: vnode locking oddities, again



On Thu, Oct 29, 2009 at 05:46:08AM +0000, David Holland wrote:
> On Thu, Oct 22, 2009 at 06:38:21PM +0200, Manuel Bouyer wrote:
>  > Now the problems.
>  > First is in getcleanvnode(). After selecting a victim from a free list,
>  > v_usecount is incremented, vp is cleaned and then we check that we're
>  > the only user of the vp. Shouldn't vclean() be done only in the
>  > (vp->v_usecount == 1) case ?
> 
> That sounds like a good bet, yes.
> 
>  > Second is vrelel(). Because of locks being dropped and reaquired at various
>  > places (one notable point is VOP_INACTIVE()), 
> 
> Randomly dropping and acquiring locks is not correct and inevitably
> leads to random problems. This should be cleaned up.

Sure. It seems there are places in the kernel which relies on
getting the interlock to work on the vnode, without bumping the
usecount (and so without using vget). These won't notice if they aquire
the lock while another thread is in the middle of vclean(). But I'm not
sure they properly check that the vnode has not been vclean'ed once they
hold the interlock either :(

I added some code to try to detect places using vnodes without
proper reference counting (v_data is set to NULL when the vnode is placed
on the freelist and restored when it's removed from the free list).

One suspicious point is uvm:
genfs_do_putpages() at netbsd:genfs_do_putpages+0xb51
genfs_putpages() at netbsd:genfs_putpages+0x1b
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x30
uvn_put() at netbsd:uvn_put+0x41
uvm_pageout() at netbsd:uvm_pageout+0x420

uvm_pageout() (uvmpd_scan_queue() really, but it's inlined) gets the interlock
(though uvmpd_trylockowner I guess) but doesn't bump the reference count.
It could get the interlock while the vnode is being vcleaned.
Also I didn't see where we check that the vnode isn't clean once we have the
lock.
More genereally uvm will manipulate v_uobj which contains v_interlock and
v_usecount; I guess it'll do the equivalent of vget() but without
checking the vnode flags (and especially without detecting if a vnode is
clean or is being cleaned).

A second suspicious place is bwrite/bdwrite: it gets a struct buf *, from
which it use b_vp; my diag code shows that this vnode can be on a free list
(I guess the hold list) in which case it could get recycled while bwrite
uses it.
Doing vget/vrele in bwrite/bdwrite seems to mostly work; I got a
KASSERT in WAPBL code once, because vrele ends up calling vn_lock()
and WAPBL doens't like it.

> 
> I have been meaning to look into this, but you know what my available
> time plot looks like. :(
> 
>  > Shouldn't vget check for VI_INACTNOW in addition to (VI_XLOCK |
>  > VI_FREEING) ?
> 
> My offhand guess is that VI_FREEING should be abolished entirely. The
> locking should be set up to not expose vnodes in an intermediate state
> at all. Getting to that point might be a problem though.

Yes. I guess as a temoprary measure to improve things it's acceptable ?

> 
> Relatedly to all this, is there any reason vget returns vnodes locked?

What do you mean with "vget returns vnodes locked" ?
Actually you can vget() without taking vn_lock, just pass 0 for flags.

> It should serve no purpose if the refcounting is working properly, and
> creates a problem for rename.
> 
>  > Also, what's the point of VI_INACTREDO ? It's set of cleared on multiple
>  > places but never tested ?
> 
> It's tested in one place in the midle of vrelel().

I don't know why I didn't see it. thanks !

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index