tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Reclaiming vnodes



On Wed, Sep 23, 2009 at 06:18:04PM +0200, Adam Hamsik wrote:
> Hi,
> On Sep,Tuesday 22 2009, at 11:55 PM, Bill Stouder-Studenmund wrote:
> 
> >>I do not have such system yet. What this patch does is that it uses
> >>another thread to reclaims vnodes, this way vnodes are reclaimed in
> >>different context than are allocated.
> >
> >You REALLY need to test such a system. If you can't test one, you  
> >should
> >not commit the change. Find a system and run it hard. Something like  
> >a few
> >compiles going on and have a few "find" commands running in the
> >background (on file systems each with more files than you have max
> >vnodes).
> 
> I'm testing this patch on my test server with 4 cpus by building a  
> release with -j8 and very low maxvnodes limit and with some other  
> vnode hungry tasks at behind.
> 
> >
> >>Current:
> >>Vnodes are allocated only if there are no vnodes on a free_list. If
> >>there is a free vnode on a list it will be recycled which actually
> >>means that it will call VOP_RECLAIM.
> >>
> >>In zfs there is a problem with calling getnewvnode form zfs_zget, in
> >>some cases getnewvnode pick vnode from a free list and call
> >>VOP_RECLAIM. This can lead to deadlock because VOP_RECLAIM can try to
> >>lock same mutex as was hold by zfs_zget. This can't be easily fixed  
> >>if
> >>we do not want to touch and change whole zfs locking protocol.
> >>
> >>With Patch:
> >>
> >>Vnodes are only allocated and there is no vnode recycling. If number
> >>of used vnodes in a system is > than kern.vnodes_num_hiwat(in  
> >>percents
> >>of maxvnodes) vrele thread will be woken up and it will start with
> >>releasing of free vnodes until number of used vnodes is < than
> >>kern.vnodes_num_lowat.
> >
> >The problem with this is you're going to reduce caching just because  
> >you
> >can't figure out how to fix the locking.
> >
> >Also, it strikes me that your design doesn't scale well. Right now,  
> >each
> >thread that wants a vnode becomes a vnode cleaning thread. So if we  
> >need
> >to reclaim a lot of vnodes at once, all the waiting threads pitch in  
> >to do
> >work. Without an easy way to dynamically scale the number of  
> >reclaiming
> >threads, you introduce a resource allocation issue. Pick too few  
> >threads,
> >and you can choke a busy system. Pick too many, and you waste space  
> >usable
> >by other subsystems.
> >
> 
> But vnodes doesn't need to be reclaimed so often as allocated. I think  
> that allocation is more critical then  speed of reclaim.

Huh? After we reach the max # of vnodes, every allocated vnode comes from 
a vnode being cleaned.

Also, since you have one thread, what happens when one file system is slow 
about reclaiming a vnode? Say an nfs mount that's having problems. Or a 
disk array that is having a lot of i/o errors that take a while but which 
it's recovering from. Your one thread now stalls, and so you can starve 
the whole OS of vnodes.

> >My suggestion on how to fix this is:
> >
> >1) Get a mutex variant that always records lock owner (curlwp()) and a
> >mutex_enter() variant that will report EDEADLK when you try to lock  
> >a lock
> >you have.
> >
> >2) Use said mutexes in zfs and use said mutex_enter() variant in the
> >places that VOP_RECLAIM would hit.
> >
> >3) Have VOP_RECLAIM report EDEADLK if you'd get a deadlock w/ the
> >operation (and clean up appropriately).
> >
> >4) have getvnode() try the next vnode if it gets EDEADLK.
> 
> This seems to me as a hack.

Well, it's less of a hack than what you're describing. :-) You have a 
locking issue that leads to deadlock. Rather than fix it (which I agree is 
probably hard) or work around it (what I describe above), you have decided 
to notably change the dynamics of reclaiming.

What happens if we run out of vnodes?

Take care,

Bill

Attachment: pgpsEngObDOO7.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index