Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/30/2006 10:11:14
--8t9RHnE3ZwKMSgU+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 30, 2006 at 10:32:20AM +0200, Juergen Hannken-Illjes wrote:
> On Thu, Jun 29, 2006 at 08:54:54PM -0700, Bill Studenmund wrote:
> > On Thu, Jun 29, 2006 at 09:52:43PM +0200, Juergen Hannken-Illjes wrote:
> >=20
> > Ok. That's a mess.
> >=20
> > We should look at what Solaris does.
>=20
> It does what we need :-)  When aquiring a shared lock it walks a (thread-=
owned)
> list and if it finds a shared lock it returns "have a lock".
>=20
> I can't think of a way to do this without direct or indirect thread speci=
fic
> information.  Either
>=20
>     Add a list of "struct mount" on which we own shared locks to "struct =
lwp".
>     Length will be at most the depth of mount points.
>=20
> or
>=20
>     Add a list of "struct lwp" who own a shared lock to our lock.  This l=
ist
>     will become too long to be implemented as a simple list.  Would need
>     a hash tab or so.

Let's just do what Solaris does. Add info to struct lwp so we remember=20
shared locks we hold. Since a thread will hold far fewer shared locks than=
=20
threads will own a given lock, the list is shorter to associate it with=20
the lwp not struct lock.

> > The difference is that the first isn't in a transaction, and so needs t=
o=20
> > implicitly start one. At the uvm level, however, the latter already is =
in=20
> > a transaction. So starting one will have the deadlock that you describe.
>=20
> My example was just one of many.  Our file systems use recursion all
> over.  When ufs runs readlink it calls VOP_READ, when it runs readdir it
> calls VOP_READ and so on.  And a page fault IS a transaction if it cannot=
 be
> resolved from memory.  As soon as it starts to BMAP/READ we have to cover=
 it.

You mis understood about the page fault. It is not an independent=20
transaction if it is a result of the memmove in ffs_read() or ffs_write().=
=20
But that's resolvable by having the file system and uvm coordinate more=20
directly & pre-read pages before the memmove().

As for the recursion, recursive shared locks that we get even if we=20
wouldn't get the lock if we didn't have it sound like a must-have.

> > I think the thing to do is have internal and external interface calls. =
We=20
> > only do transaction games at the external calls, and the internal calls=
=20
> > assume the right barrier was taken at the exterior.
>=20
> Assuming you meant VFS internal/external...  How would you get the transa=
ctions
> when a file system calls another one?

I'm not sure. Well, you grab the lock on the other file system. What isn't=
=20
clear to me is if these cases really represent a transaction in the first=
=20
file system.

Take care,

Bill

--8t9RHnE3ZwKMSgU+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFEpVsxWz+3JHUci9cRAiuRAJ97u/LnrKx88uX6CjpDmDrm4a/FpwCfZ1/j
k5qSGEyVnS6NUDX0n2LUQbk=
=u+oB
-----END PGP SIGNATURE-----

--8t9RHnE3ZwKMSgU+--