Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/ufs/ufs



On Tue, Sep 22, 2009 at 09:51:13PM +0300, Antti Kantee wrote:
> > >From what I understand the vnode is not on the free list yet so it can't
> > be reused for something else.
> 
> For which one? ;)
> 
> No, when the reference count (and hold count) goes to zero, the vnode
> goes onto the freelist.  However, before use, it must be cleaned.

But v_usecount is not 0. vget() did increment it before sleeping.

> 
> > > If there is a race in vfs and XLOCK is not used properly, I think that
> > > should be investigated and fixed instead of patching file systems here
> > > and there.
> > 
> > I don't know how XLOCK is supposed to be used (and I even less know how
> > to change things without creating deadlocks). As I see it XLOCK is set
> > while the vnode is being cleaned, not before or after cleaning it, so
> > XLOCK is not going to avoid this race anyway.
> 
> XLOCK is supposed to be set when the vnode is being cleaned.  vget()
> can then detect this early on, wait for the vnode to actually be cleaned
> (i.e. VOP_RECLAIM() has done its thing so a new vnode can be crafted
> without problems) and return an error.

This is what it does, yes. But what happens here is that XLOCK was not
set, so vget() tried to aquire the vn_lock. It sleeps here, and the
vnode is being VOP_RECLAIM'ed while it sleeps.
Before it sleeps XLOCK is not set, when it's woken up XLOCK is not
set any more and the vnode is clean.

> 
> I am *guessing* that the selection for a vnode cleanup is racy and
> the vnode is selected before the reference count is checked (i.e.
> the situation before vget() upped the refcount for the duration of
> its operation).  But I might be completely wrong too, as that doesn't
> really explain what the relationship with fhtovnode is.  Maybe it just
> more readily triggers the race?

that's not an issue with the reference count. It's an issue with vclean()
calling VOP_RECLAIM() even if the refcount is greater than 1, and
vrelel() calling vclean() even if the refcount is greater than 1,
when the file has been unlink(2)ed. There's a comment about this in
vrelel(), look for variable "recycle".

> 
> However, I'm still very far from convinced the current fix is appropriate.

I'm not either. But at last it makes the NFS server useable, so I think
it's OK as a stopgap measure for netbsd-5.

> You should at least file a big allcaps PR about the issue.

there is one: kern/41147 (this is about the consequences, not the cause
though).

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index