tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Reclaiming vnodes



On Tue Sep 15 2009 at 13:28:58 -0400, Thor Lancelot Simon wrote:
> On Tue, Sep 15, 2009 at 06:21:36PM +0100, David Laight wrote:
> > 
> > You still have a potential deadlock - or at least a failure if
> > getnewvnode() cannot actually return a new vnode because none are
> > free at the exact moment of the allocate request.
> > 
> > You can't sleep awaiting vnodes because that will deadlock against
> > the reclaim thread, so a system that is cycling vnodes very quickly
> > will fail during the allocate.
> 
> This is what I was trying to get at when we discussed this privately
> a few weeks ago -- to me it seems like we're just moving the problem
> around, allowing two threads to deadlock against each other instead
> of allowing one thread to sleep where it shouldn't.

... hmmm... ah, vrele_thread() calls getcleanvnode, not getnewvnode.
No wonder the patch and the discussion didn't make any sense ;)

Anyway, I don't think this patch is tested well enough until the mentioned
deadlock manifests itself.  And after that, well, back to the drawing
board.  The good news is that you can design one test to expose exactly
this problem.  And it's good src/tests ammo for the future.

I guess the big lesson here is how to test these kind of changes.  It took
me months of real world use to get similar deadlocks identified and ironed
out from the puffs kernel code, so forgive me if I just outright laugh at
"it booted in qemu" testing.  In fact, in addition to specific tests,
e.g. ones pointed out by Thor, I'd like to know the code was run for a
few weeks on a real world server or desktop before it was committed.


Home | Main Index | Thread Index | Old Index