Re: struct file at VFS level (round 1.5)

To: Emmanuel Dreyfus <manu%netbsd.org@localhost>
Subject: Re: struct file at VFS level (round 1.5)
From: Christoph Badura <bad%bsd.de@localhost>
Date: Mon, 2 May 2016 16:52:38 +0200

On Mon, May 02, 2016 at 07:15:12AM +0000, Emmanuel Dreyfus wrote:
> NetBSD filesystems implement advisory locks, where the only place the
> filesystem actually check locks as VOP_ADVLOCK. Any other operation 
> performed on a file region locked by someone else succeeds: the locks
> are only advisory, and it is the application duty to enforce it.

> The GLusterFS people are working to implement mandatory locks, where
> the filesystem actually enforces locks on any operation: if you write
> to a region locked by someone else, you get a failure. 

For clarity's sake and so that we all know what we are talking about, I
assume you're talking about fcntl(2) style advisory record locking and
SVR3/SVR4 style mandatory record locking.

If you are talking about something else, then you need to provide us with
the detailed specifications.

> Mandatory locks must be implemented at the filesystem level, because
> in the distributed filesystem case, the local kernel knows nothing 
> about locks sets form other machines. It requires the filesystem to 
> distinguish operations on different struct file for a given vnode, 
> because a lock may be acquired for a struct file and not for another 
> one. Currently NetBSD does not pass this information to filesystems.

I'm really not sure how to put this.  The above paragraph is full of
misunderstandings and misconceptions.  Starting with the assertion that
file systems need to distinguish operations on different struct files,
to the false claim that "currently NetBSD does not pass this information
to filesystems" when, in fact, it does through VOP_ADVLOCK, or the idea
that the kernel "needs to know about locks set from other machines" at a
level above the VFS layer or outside and above what's provided by
VOP_ADVLOCK.

1. fcntl(2) record locks operate at the vnode level. Period. That's
how they are defined to work.  It does not matter what struct file
references the vnode.  E.g. process A opens a file read-write and locks
the entire file, process B opens the same file read-only.  Process A's
lock must be visible to process B but the two processes do not use the
same struct file.

2. fcntl(2) records locks are owned by individual processes.  If the
process goes way (or calls close(2) on a file descriptor) the locks that
are owned by it are released.

3. advisory and mandatory record locks operate on the same lock model.
The only difference is that mandatory locks need to be checked in addition
by every open(2), read(2), write(2) and equivalent system call.

At the system call level the kernel doesn't need to know whether a lock is
known only on the local machine or through a distributed file system.

For local file systems all this can be implemented above the file system
level.  For distributed file systems, the kernel needs to interact with
the file system to query and publish lock details.  That's what
VOP_ADVLOCK was invented for (originally to deal with NFS locks[2]).

Mandatory file locking simply is a SMOP for the kernel to add a flag to a
vnode that indicates whether mandatory locking is in effect and adding the
necessary checks and interaction with file systems to open(2), read(2),
write(2) etc. at the syscall level.  VOP_ADVLOCK is the right place to
handle the necessary interaction with the file systems.[3]

Everything else that the kernel needs above the file system level is
provided in vfs_lockf.c.  And you'd be well advised to interact with
that code as the generic and file system independent interface for
record locking and extend it where necessary to handle locks not owned
by a local process.  That could be as simple as the owner being the
userland PUFFS/FUSE process, if you are lucky.

You did read and understand vfs_lockf.c, did you?

Of course, the interaction with with the distributed lock manager in the
PUFFS/FUSE server process is much more complicated, e.g. by machines on
which locks are owned rebooting.  That particular hell is reserved as an
excercise for the file system implementors.

[1] Reasonable explanations of how the record locking is implemented are
in "The Design and Implementation of the 4.4 BSD Operating System" and
"Advanced Programming in the UNIX Environment".

[2] Note that our NFS implementation "short circuits" locks to be non-
distributed.  That's more or less because a reliably distributed lock
manager wasn't available but we wanted to have fcntl(2) advisory record
locking working for the local locks only case.

[3] People may bikeshed about the name, but that is irrelevant.

> I proposed a patch that embeds the required information in struct cred:
> https://ftp.espci.fr/shadow/manu/filecred1.patch

You haven't shown any necessity for putting the required information into
struct cred.  In fact, it seems based on the misconception that fcntl(2)
record locks are somehow related to struct file.

> Ome expresse dissatisfaction with the approach, ostly because struct
> cred would not be the right place. Doing it another was means modifying
> the VFS interface to add a reference to struct file where needed (that
> is: anything that touch file content). Would that approach be better?

No, that approach would not be better.  In fact, the kernel already
stores the necessary information (c.f. vfs_lockf.c) it cares about.  The
File system specific stuff can be stored off v_data.

--chris

Follow-Ups:
- Re: struct file at VFS level (round 1.5)
  - From: Emmanuel Dreyfus

References:
- struct file at VFS level (round 1.5)
  - From: Emmanuel Dreyfus

Prev by Date: Re: Scripting DDB in Forth?
Next by Date: Re: Scripting DDB in Forth?
Previous by Thread: struct file at VFS level (round 1.5)
Next by Thread: Re: struct file at VFS level (round 1.5)
Indexes:

Home | Main Index | Thread Index | Old Index