Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <>
From: Bill Studenmund <>
List: tech-kern
Date: 06/30/2006 10:11:14
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 30, 2006 at 10:32:20AM +0200, Juergen Hannken-Illjes wrote:
> On Thu, Jun 29, 2006 at 08:54:54PM -0700, Bill Studenmund wrote:
> > On Thu, Jun 29, 2006 at 09:52:43PM +0200, Juergen Hannken-Illjes wrote:
> >=20
> > Ok. That's a mess.
> >=20
> > We should look at what Solaris does.
> It does what we need :-)  When aquiring a shared lock it walks a (thread-=
> list and if it finds a shared lock it returns "have a lock".
> I can't think of a way to do this without direct or indirect thread speci=
> information.  Either
>     Add a list of "struct mount" on which we own shared locks to "struct =
>     Length will be at most the depth of mount points.
> or
>     Add a list of "struct lwp" who own a shared lock to our lock.  This l=
>     will become too long to be implemented as a simple list.  Would need
>     a hash tab or so.

Let's just do what Solaris does. Add info to struct lwp so we remember=20
shared locks we hold. Since a thread will hold far fewer shared locks than=
threads will own a given lock, the list is shorter to associate it with=20
the lwp not struct lock.

> > The difference is that the first isn't in a transaction, and so needs t=
> > implicitly start one. At the uvm level, however, the latter already is =
> > a transaction. So starting one will have the deadlock that you describe.
> My example was just one of many.  Our file systems use recursion all
> over.  When ufs runs readlink it calls VOP_READ, when it runs readdir it
> calls VOP_READ and so on.  And a page fault IS a transaction if it cannot=
> resolved from memory.  As soon as it starts to BMAP/READ we have to cover=

You mis understood about the page fault. It is not an independent=20
transaction if it is a result of the memmove in ffs_read() or ffs_write().=
But that's resolvable by having the file system and uvm coordinate more=20
directly & pre-read pages before the memmove().

As for the recursion, recursive shared locks that we get even if we=20
wouldn't get the lock if we didn't have it sound like a must-have.

> > I think the thing to do is have internal and external interface calls. =
> > only do transaction games at the external calls, and the internal calls=
> > assume the right barrier was taken at the exterior.
> Assuming you meant VFS internal/external...  How would you get the transa=
> when a file system calls another one?

I'm not sure. Well, you grab the lock on the other file system. What isn't=
clear to me is if these cases really represent a transaction in the first=
file system.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)