On Wed, Sep 23, 2009 at 06:18:04PM +0200, Adam Hamsik wrote: > Hi, > On Sep,Tuesday 22 2009, at 11:55 PM, Bill Stouder-Studenmund wrote: > > >>I do not have such system yet. What this patch does is that it uses > >>another thread to reclaims vnodes, this way vnodes are reclaimed in > >>different context than are allocated. > > > >You REALLY need to test such a system. If you can't test one, you > >should > >not commit the change. Find a system and run it hard. Something like > >a few > >compiles going on and have a few "find" commands running in the > >background (on file systems each with more files than you have max > >vnodes). > > I'm testing this patch on my test server with 4 cpus by building a > release with -j8 and very low maxvnodes limit and with some other > vnode hungry tasks at behind. > > > > >>Current: > >>Vnodes are allocated only if there are no vnodes on a free_list. If > >>there is a free vnode on a list it will be recycled which actually > >>means that it will call VOP_RECLAIM. > >> > >>In zfs there is a problem with calling getnewvnode form zfs_zget, in > >>some cases getnewvnode pick vnode from a free list and call > >>VOP_RECLAIM. This can lead to deadlock because VOP_RECLAIM can try to > >>lock same mutex as was hold by zfs_zget. This can't be easily fixed > >>if > >>we do not want to touch and change whole zfs locking protocol. > >> > >>With Patch: > >> > >>Vnodes are only allocated and there is no vnode recycling. If number > >>of used vnodes in a system is > than kern.vnodes_num_hiwat(in > >>percents > >>of maxvnodes) vrele thread will be woken up and it will start with > >>releasing of free vnodes until number of used vnodes is < than > >>kern.vnodes_num_lowat. > > > >The problem with this is you're going to reduce caching just because > >you > >can't figure out how to fix the locking. > > > >Also, it strikes me that your design doesn't scale well. Right now, > >each > >thread that wants a vnode becomes a vnode cleaning thread. So if we > >need > >to reclaim a lot of vnodes at once, all the waiting threads pitch in > >to do > >work. Without an easy way to dynamically scale the number of > >reclaiming > >threads, you introduce a resource allocation issue. Pick too few > >threads, > >and you can choke a busy system. Pick too many, and you waste space > >usable > >by other subsystems. > > > > But vnodes doesn't need to be reclaimed so often as allocated. I think > that allocation is more critical then speed of reclaim. Huh? After we reach the max # of vnodes, every allocated vnode comes from a vnode being cleaned. Also, since you have one thread, what happens when one file system is slow about reclaiming a vnode? Say an nfs mount that's having problems. Or a disk array that is having a lot of i/o errors that take a while but which it's recovering from. Your one thread now stalls, and so you can starve the whole OS of vnodes. > >My suggestion on how to fix this is: > > > >1) Get a mutex variant that always records lock owner (curlwp()) and a > >mutex_enter() variant that will report EDEADLK when you try to lock > >a lock > >you have. > > > >2) Use said mutexes in zfs and use said mutex_enter() variant in the > >places that VOP_RECLAIM would hit. > > > >3) Have VOP_RECLAIM report EDEADLK if you'd get a deadlock w/ the > >operation (and clean up appropriately). > > > >4) have getvnode() try the next vnode if it gets EDEADLK. > > This seems to me as a hack. Well, it's less of a hack than what you're describing. :-) You have a locking issue that leads to deadlock. Rather than fix it (which I agree is probably hard) or work around it (what I describe above), you have decided to notably change the dynamics of reclaiming. What happens if we run out of vnodes? Take care, Bill
Attachment:
pgpsEngObDOO7.pgp
Description: PGP signature