tech-kern: Re: Redoing file system suspension API (update)

Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 06/30/2006 10:32:20
On Thu, Jun 29, 2006 at 08:54:54PM -0700, Bill Studenmund wrote:
> On Thu, Jun 29, 2006 at 09:52:43PM +0200, Juergen Hannken-Illjes wrote:
> > On Thu, Jun 29, 2006 at 11:04:08AM -0700, Jason Thorpe wrote:
> > > 
> > > On Jun 29, 2006, at 9:54 AM, Juergen Hannken-Illjes wrote:
> > > 
> > > >At least as a base.  Lockmgr locks lack the "do I have a shared  
> > > >lock" query.
> > > >I need a lock where a thread already owning a shared lock succeeds  
> > > >when
> > > >it wants another shared lock.  Last time I looked lockmg locks had  
> > > >this only
> > > >for exclusive locks.
> 
> Well, the other thing we'd need is that if we have the lock exclusive and 
> want it shared, we succeed.

Right

[lockmgr explanation removed]
> > And here we are.  I need shared lock recursion even if another thread wants
> > an exclusive lock.  The transaction lock will work like:
> > 
> > 	xxx_read()
> > 	{
> > 		vn_trans_lock()  <-- (1)
> > 		...
> > 		xxx_getpages()
> > 		...
> > 		vn_trans_unlock()
> > 	}
> > 
> > and
> > 
> > 	xxx_getpages()
> > 	{
> > 		vn_trans_lock()  <-- (2)
> > 		...
> > 		vn_trans_unlock()
> > 	}
> > 
> > When waiting at (2) the lock at (1) will never release.
> 
> Ok. That's a mess.
> 
> We should look at what Solaris does.

It does what we need :-)  When aquiring a shared lock it walks a (thread-owned)
list and if it finds a shared lock it returns "have a lock".

I can't think of a way to do this without direct or indirect thread specific
information.  Either

    Add a list of "struct mount" on which we own shared locks to "struct lwp".
    Length will be at most the depth of mount points.

or

    Add a list of "struct lwp" who own a shared lock to our lock.  This list
    will become too long to be implemented as a simple list.  Would need
    a hash tab or so.

> I was thikning if a fix, then realized it won't work. Then I thought of 
> another, and it too won't work.
> 
> To really make this work, we would probably need to retool some of how the 
> UBC code works. We would need to differentiate between a page fault due to 
> a program (or some random old kernel routine) accessing memory and a page 
> fault due to VOP_READ() or VOP_WRITE() accessing memory it just mapped in.
> 
> The difference is that the first isn't in a transaction, and so needs to 
> implicitly start one. At the uvm level, however, the latter already is in 
> a transaction. So starting one will have the deadlock that you describe.

My example was just one of many.  Our file systems use recursion all
over.  When ufs runs readlink it calls VOP_READ, when it runs readdir it
calls VOP_READ and so on.  And a page fault IS a transaction if it cannot be
resolved from memory.  As soon as it starts to BMAP/READ we have to cover it.

> I'm not sure if getpages is a bad example, as whiel we would LIKE to slow 
> down reads to let writes and a sync complete, that's more a matter of i/o 
> scheduling. We won't break a snapshot with it. I however expect there are 
> other example lurking around that need addressing, and so your point is 
> quite valid.
> 
> 
> I think the thing to do is have internal and external interface calls. We 
> only do transaction games at the external calls, and the internal calls 
> assume the right barrier was taken at the exterior.

Assuming you meant VFS internal/external...  How would you get the transactions
when a file system calls another one?

> We however still need some way to mark that we've got a transaction lock 
> as the deadlock between starting a transaction (getting the trans lock) 
> and a VOP_ entry (and the external interface locking I mention above) 
> still remains.
> 
> I don't think lockmgr locks will do this well, as they don't track lock 
> ownership for shared ownings.

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)