Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 06/30/2006 10:32:20
On Thu, Jun 29, 2006 at 08:54:54PM -0700, Bill Studenmund wrote:
> On Thu, Jun 29, 2006 at 09:52:43PM +0200, Juergen Hannken-Illjes wrote:
> > On Thu, Jun 29, 2006 at 11:04:08AM -0700, Jason Thorpe wrote:
> > >
> > > On Jun 29, 2006, at 9:54 AM, Juergen Hannken-Illjes wrote:
> > >
> > > >At least as a base. Lockmgr locks lack the "do I have a shared
> > > >lock" query.
> > > >I need a lock where a thread already owning a shared lock succeeds
> > > >when
> > > >it wants another shared lock. Last time I looked lockmg locks had
> > > >this only
> > > >for exclusive locks.
>
> Well, the other thing we'd need is that if we have the lock exclusive and
> want it shared, we succeed.
Right
[lockmgr explanation removed]
> > And here we are. I need shared lock recursion even if another thread wants
> > an exclusive lock. The transaction lock will work like:
> >
> > xxx_read()
> > {
> > vn_trans_lock() <-- (1)
> > ...
> > xxx_getpages()
> > ...
> > vn_trans_unlock()
> > }
> >
> > and
> >
> > xxx_getpages()
> > {
> > vn_trans_lock() <-- (2)
> > ...
> > vn_trans_unlock()
> > }
> >
> > When waiting at (2) the lock at (1) will never release.
>
> Ok. That's a mess.
>
> We should look at what Solaris does.
It does what we need :-) When aquiring a shared lock it walks a (thread-owned)
list and if it finds a shared lock it returns "have a lock".
I can't think of a way to do this without direct or indirect thread specific
information. Either
Add a list of "struct mount" on which we own shared locks to "struct lwp".
Length will be at most the depth of mount points.
or
Add a list of "struct lwp" who own a shared lock to our lock. This list
will become too long to be implemented as a simple list. Would need
a hash tab or so.
> I was thikning if a fix, then realized it won't work. Then I thought of
> another, and it too won't work.
>
> To really make this work, we would probably need to retool some of how the
> UBC code works. We would need to differentiate between a page fault due to
> a program (or some random old kernel routine) accessing memory and a page
> fault due to VOP_READ() or VOP_WRITE() accessing memory it just mapped in.
>
> The difference is that the first isn't in a transaction, and so needs to
> implicitly start one. At the uvm level, however, the latter already is in
> a transaction. So starting one will have the deadlock that you describe.
My example was just one of many. Our file systems use recursion all
over. When ufs runs readlink it calls VOP_READ, when it runs readdir it
calls VOP_READ and so on. And a page fault IS a transaction if it cannot be
resolved from memory. As soon as it starts to BMAP/READ we have to cover it.
> I'm not sure if getpages is a bad example, as whiel we would LIKE to slow
> down reads to let writes and a sync complete, that's more a matter of i/o
> scheduling. We won't break a snapshot with it. I however expect there are
> other example lurking around that need addressing, and so your point is
> quite valid.
>
>
> I think the thing to do is have internal and external interface calls. We
> only do transaction games at the external calls, and the internal calls
> assume the right barrier was taken at the exterior.
Assuming you meant VFS internal/external... How would you get the transactions
when a file system calls another one?
> We however still need some way to mark that we've got a transaction lock
> as the deadlock between starting a transaction (getting the trans lock)
> and a VOP_ entry (and the external interface locking I mention above)
> still remains.
>
> I don't think lockmgr locks will do this well, as they don't track lock
> ownership for shared ownings.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)