Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/28/2006 08:17:43
--ReaqsoxgOBHFXBhH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 27, 2006 at 12:52:57PM +0200, Juergen Hannken-Illjes wrote:
> On Mon, Jun 26, 2006 at 02:31:44PM -0700, Bill Studenmund wrote:
> > On Mon, Jun 26, 2006 at 08:30:20PM +0200, Juergen Hannken-Illjes wrote:
> > > On Mon, Jun 26, 2006 at 09:43:59AM -0700, Bill Studenmund wrote:
> > > > I'm sorry, but this is an important point. I have the feeling it wa=
s=20
> > > > missed.
> > >=20
> > > Not sure I get it right: you mean taking the transaction lock for
> > > read/write/ioctl in every file system while taking it for other opera=
tions
> > > outside?
> > >=20
> > > Looks difficult to maintain.
> >=20
> > How is it difficult to maintain?
>=20
> We have to do it for all operations of all file systems.  And we need
> thread-recursive locks as file systems call operations on other file syst=
ems.

I'm sorry. I do not understand the causality implied in this sentance. The=
=20
fact that a file system may call operations on other file systems (only=20
unionfs does this AFAIK) does not mean we need recursion.

We also don't need recursion in general. All we need is for the lock=20
routine to return "success", "failure", and "You already have the lock."=20
If we get a "failure" return, we exit whatever we're doing. If we get=20
success, we later release the lock. If we get "You already have the lock",=
=20
then we just skip the unlock later on.

> Once an operation has the lock we cannot deny the lock to other operations
> called from here.  Take unionfs's `copy-up' as an example.

I don't understand what you mean by "[denying] the lock". ?? If a file=20
system decides it wants to perform a transaction, it starts then ends the=
=20
transaction.

Note also that while you're right that we have to add this logic to=20
specific file systems (and the implicit assessment that we may have more=20
file systems than entry points that make certain transactions), we really=
=20
only have to add this functionality to file systems that handle snapshots.

So only ffs needs the logic for now.

> And I'm not sure if it can be free of deadlocks doing it (with locked vno=
des)
> inside the file system.

Yes, deadlocks are an issue. However we can work around them. We put the=20
transaction lock at a certain point in the locking hierarcy, and if we=20
need to grab a lock that's further up the chain, we release our current=20
locks, grab the one we need, then re-grab others.

> > The idea is that we only use transaction locks above the file system if=
 we
> > have a real transaction.
> >=20
> > vn_lock() isn't a transaction lock, we use it as an atomicity lock. So =
as=20
> > long as you don't unlock/relock in the middle of your atomic operation.=
=20
> > i.e. you unlock/lock either before or (if you're weird) after the "read=
"=20
> > or the "write", you're fine. I'm assuming POSIX atomicity here.
>=20
> Do we really need more than this kind of atomicity?

No, but:

1) a transaction lock isn't an atomicity lock. :-) Close, but not quite. =
=20
So let's be explicit about where we're doing transactions.

2) we do want vn_lock() & such to eventually go away. We'll then only have
the transaction locks left.

> > Then we need buckets of them. If I understood your earlier discussions,=
 we=20
> > then need transaction locking around every caller into the VFS/VOP laye=
r.=20
> > That seems messier to maintain.
>=20
> Buckets is not the right measure.  I suppose the number of lock pairs is
> roughly the same for both ways.  Getting too much above VFS means we need
> more vn_xxx helper functions...
>=20
> > > > What do other OSs do?
> > >=20
> > > No OS I know of has something like this.  Do you have a special OS in=
 mind?
> >=20
> > Yes, they don't have this. But other OSs handle snapshotting. How do th=
ey=20
> > handle the suspension? Do they bother? If so, how do they do it? If the=
y=20
> > don't, why do we have a problem and they don't?
> >=20
> > We're adding a new locking hierarcy. I think we should look at prior ar=
t=20
> > before we go too far. If we need the new hierarcy, we will do it. But=
=20
> > let's make sure we didn't overlook a cool idea somewhere else first.
>=20
> FreeBSD has what we have now -- no surprise, we took ours from FreeBSD.  =
The
> difference is FreeBSD has snapshots only for ffs file systems.
>=20
> Solaris has snapshots for ufs file systems.  If I remember right they got
> file system locks before snapshots.  It has different lock levels:
>=20
> 	unlock / name lock / write lock / delete lock / hard lock / error lock
>=20
> >From a quick tour through OpenSolaris it uses
> ufs_lockfs_begin()/ufs_lockfs_end() calls in most operations where
> every ufs_lockfs_begin() has a mask describing the lock levels it should
> wait on.  It takes care of recursive operations (state is stored in a lin=
ked
> list of the thread struct), waits or errors if needed.  Silence is aquire=
d by
> an operations counter becoming zero.
> Looks very close to your approach.  But Solaris has no vnode locks.

I think it's a better approach in the long run.

Take care,

Bill

--ReaqsoxgOBHFXBhH
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFEop2XWz+3JHUci9cRAu6iAJ4mnK7AzLam5UFck1Xb9Yk4+weESwCfZCpK
cbUySwVNk80tybV8xxTmE90=
=rSbN
-----END PGP SIGNATURE-----

--ReaqsoxgOBHFXBhH--