Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <>
From: Bill Studenmund <>
List: tech-kern
Date: 06/28/2006 08:17:43
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 27, 2006 at 12:52:57PM +0200, Juergen Hannken-Illjes wrote:
> On Mon, Jun 26, 2006 at 02:31:44PM -0700, Bill Studenmund wrote:
> > On Mon, Jun 26, 2006 at 08:30:20PM +0200, Juergen Hannken-Illjes wrote:
> > > On Mon, Jun 26, 2006 at 09:43:59AM -0700, Bill Studenmund wrote:
> > > > I'm sorry, but this is an important point. I have the feeling it wa=
> > > > missed.
> > >=20
> > > Not sure I get it right: you mean taking the transaction lock for
> > > read/write/ioctl in every file system while taking it for other opera=
> > > outside?
> > >=20
> > > Looks difficult to maintain.
> >=20
> > How is it difficult to maintain?
> We have to do it for all operations of all file systems.  And we need
> thread-recursive locks as file systems call operations on other file syst=

I'm sorry. I do not understand the causality implied in this sentance. The=
fact that a file system may call operations on other file systems (only=20
unionfs does this AFAIK) does not mean we need recursion.

We also don't need recursion in general. All we need is for the lock=20
routine to return "success", "failure", and "You already have the lock."=20
If we get a "failure" return, we exit whatever we're doing. If we get=20
success, we later release the lock. If we get "You already have the lock",=
then we just skip the unlock later on.

> Once an operation has the lock we cannot deny the lock to other operations
> called from here.  Take unionfs's `copy-up' as an example.

I don't understand what you mean by "[denying] the lock". ?? If a file=20
system decides it wants to perform a transaction, it starts then ends the=

Note also that while you're right that we have to add this logic to=20
specific file systems (and the implicit assessment that we may have more=20
file systems than entry points that make certain transactions), we really=
only have to add this functionality to file systems that handle snapshots.

So only ffs needs the logic for now.

> And I'm not sure if it can be free of deadlocks doing it (with locked vno=
> inside the file system.

Yes, deadlocks are an issue. However we can work around them. We put the=20
transaction lock at a certain point in the locking hierarcy, and if we=20
need to grab a lock that's further up the chain, we release our current=20
locks, grab the one we need, then re-grab others.

> > The idea is that we only use transaction locks above the file system if=
> > have a real transaction.
> >=20
> > vn_lock() isn't a transaction lock, we use it as an atomicity lock. So =
> > long as you don't unlock/relock in the middle of your atomic operation.=
> > i.e. you unlock/lock either before or (if you're weird) after the "read=
> > or the "write", you're fine. I'm assuming POSIX atomicity here.
> Do we really need more than this kind of atomicity?

No, but:

1) a transaction lock isn't an atomicity lock. :-) Close, but not quite. =
So let's be explicit about where we're doing transactions.

2) we do want vn_lock() & such to eventually go away. We'll then only have
the transaction locks left.

> > Then we need buckets of them. If I understood your earlier discussions,=
> > then need transaction locking around every caller into the VFS/VOP laye=
> > That seems messier to maintain.
> Buckets is not the right measure.  I suppose the number of lock pairs is
> roughly the same for both ways.  Getting too much above VFS means we need
> more vn_xxx helper functions...
> > > > What do other OSs do?
> > >=20
> > > No OS I know of has something like this.  Do you have a special OS in=
> >=20
> > Yes, they don't have this. But other OSs handle snapshotting. How do th=
> > handle the suspension? Do they bother? If so, how do they do it? If the=
> > don't, why do we have a problem and they don't?
> >=20
> > We're adding a new locking hierarcy. I think we should look at prior ar=
> > before we go too far. If we need the new hierarcy, we will do it. But=
> > let's make sure we didn't overlook a cool idea somewhere else first.
> FreeBSD has what we have now -- no surprise, we took ours from FreeBSD.  =
> difference is FreeBSD has snapshots only for ffs file systems.
> Solaris has snapshots for ufs file systems.  If I remember right they got
> file system locks before snapshots.  It has different lock levels:
> 	unlock / name lock / write lock / delete lock / hard lock / error lock
> >From a quick tour through OpenSolaris it uses
> ufs_lockfs_begin()/ufs_lockfs_end() calls in most operations where
> every ufs_lockfs_begin() has a mask describing the lock levels it should
> wait on.  It takes care of recursive operations (state is stored in a lin=
> list of the thread struct), waits or errors if needed.  Silence is aquire=
d by
> an operations counter becoming zero.
> Looks very close to your approach.  But Solaris has no vnode locks.

I think it's a better approach in the long run.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)