Subject: RE: CVS commit: src
To: Bill Studenmund <>
From: Gordon Waidhofer <>
List: tech-kern
Date: 06/22/2004 11:31:10
> > > > ... it as a requirement that file 
> > > > systems do real, hard-core locking. And given the state of things when I 
> > > > started, that was a very good thing.
> > > 
> > > why do you think that exposing a lock is a requirement?
> > 
> > I'd like to ask the same question differently.
> > 
> > Suppose a file system's VOP_LOCK() and VOP_UNLOCK()
> > are no-ops, and the file system can be trusted to
> > do the right thing (not really that hard) for the
> > primary VOPs (LOOKUP, READ, WRITE, etc). What
> > semantics would break?
> Whatever access callers of the file system expected to be serialized that 
> now aren't. i.e. a case where a caller called VOP_LOCK() and expected an 
> exclusive lock is now in place. Especially if it expected that lock to be 
> held across a call to ltsleep().

It's the VOP caller's expectations that are vague.

I sense there are subtleties to the file model,
but I'd be hard pressed to point to them. About
the only thing I can readily point to is VOP_LEASE().
So, I believe you. I would never try a no-op VOP_LOCK().
Still, before the file model evolves organically
any further, it might be good to solidify the file
model that is there.
> Also, for things like delete and rename, would it be so easy? Or file 
> creation?

Yup. It is easy. Consider NFS. The VOP_LOCK() held on
the client is not honored on the server. But it all
works out just fine.

This touches on the discussion a couple months ago about
whether to VOP_LOOKUP() the last component of a pathname
to something like remove(). For me, mount points are a
bona fide part of the file model and so the last
VOP_LOOKUP() is required to support that aspect of
the file model. Other file models that don't use mount
points -- like prefix paths (QNX) -- wouldn't need to do
the last VOP_LOOKUP(). I'm not suggesting a change to the
file model. Oh, no. Just that there are other ideas, and
that there are concrete aspects of the NetBSD file model
driving this particular answer.

I believe there are aspects of the NetBSD file model
that drive VOP_LOCK() having exclusion semantics. But
the file system should not be blamed. It's just that
those aspects don't feel as bona fide because I can't
point to them. "Caller's expectations" isn't particularly
pointed. I sense the impetus of this email thread
was trying to better understand such aspects of the
file model.

> > >From VOP_LOCK(9)
> >     VOP_LOCK() is used to serialise access to the
> >     file system such as to present two writes to
> >     the same file from happening at the same time.
> > 
> > Why? Is this a semantic of the file model? Or is
> > this a context to make things "easier" on the
> > underlying file system?
> It is file system semantics. A call to write(2), barring errors, is 
> supposed to be atomic. Thus if you have two write calls that overlap, the 
> overlapping data are to have come from one call or the other, not some mix 
> of both.

So atomicity of write(2) is the issue. Yes? No? Maybe?
Atomicity of persistant storage isn't the issue.....

If there is a race of two threads entering the kernel, there
is a race. Serializing above the VOP layer is an arbitrary,
and reasonable, thing to do. But it doesn't really resolve
the race.

If the underlying file system guarantees atomicity
of persistant storage updates is the file model satisfied?

One would be tempted to say "yes." But, probably not.
The issue isn't the atomicity of the writes. It's
those VOP caller's expectations, meaning vague aspects
of the file model.

It's all perfectly reasonable. It's also squarely
in the way of a lot of performance opportunities.
And I wouldn't advocate a change. Indeed, the idea
scares me.

But let's be honest about the file model and not blame
the file system, nor claim that the VOP callers
are doing the file system a favor.

Here's a way to evaluate and explain the file model.
VOP_LOCK() isn't honored by the NFS server. Now, make
a case for VOP_LOCK().

> Take care,
> Bill

Thanx for the reply and the interesting discussion.