tech-kern: Re: fsctl(2) [was: Re: Interface to change NFS exports]

Subject: Re: fsctl(2) [was: Re: Interface to change NFS exports]
To: None <jmmv84@gmail.com>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/13/2005 19:23:06
On Sep 12, 2005, at 2:05 AM, Julio M. Merino Vidal wrote:

> We all agree in that a new system call is needed.  Some also want this
> new interface to not only manage NFS exports but also to allow  
> changing
> other settings from a mount point.  I think this is a good idea too.
>
> Given these comments, I've started the implementation of a fsctl(2)
> function call, with the following signature:
>
>     int fsctl(const char *path, enum fsctl_command command, void  
> *data);

I haven't read the HP-UX fsctl(2) manual page, but I'll point out  
that OS X 10.4 also has a fsctl(2) system call (although I don't see  
a manual page for it).

The 10.4 fsctl(2) basically has ioctl(2) semantics (including the  
size field and direction bits in the command argument), and the  
signature looks like this:

int	fsctl(const char *path, u_long cmd, void *data, int options);

"options" is a flags word that currently has one option --  
FSOPT_NOFOLLOW, which means "don't follow symbolic links".  That flag  
is used in several VFS syscalls in 10.4.

In 10.4, all fsctl(2) commands are currently file system-specific,  
but that doesn't mean we can't have generic ones that either all file  
systems implement or that are handled at the VFS layer (in general, I  
would like to see us move a LOT more stuff out of individual file  
systems and into the VFS layer).

> At the moment, command can be one of FSCTL_EXPORT_NFS_GET or
> FSCTL_EXPORT_NFS_SET, to query or set NFS export lists respectively
> based on the given path.  (Minor question: can an enum be used as a
> system call argument, or should I better use an integer?  If not,  
> why?)

Use an ioctl-style command argument :-)  It has the nice property of  
handling versioning for you, if the size of the argument were to  
change for some reason.

> The problem with this interface is that it doesn't let you change
> multiple mount points atomically, as some others have suggested.
> I also agree that having this feature could be nice.

I don't see the value of changing multiple mount points atomically...  
most important is that an individual mount point's export list is  
updated atomically.

> In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
> to this new system call, as well as other stuff like the quota
> management.

I don't see anything wrong with keeping MNT_UPDATE as-is.  Its  
semantics are "update the mount", i.e. change from r/w to r/o or  
whatever.  MNT_GETARGS ... well, I have other opinions on that, as  
well... I would rather we had string-based mount arguments, rather  
than the binary blobs we have now.

> Do you think this is correct and flexible enough for the current and
> future purposes?

I think I would like to have an fsctl(2), sure.  But going back to  
the original discussion about NFS exports, I think that we should  
switch to a model where the export list is not maintained by the  
kernel, but rather ONLY by mountd(8).  I believe someone else  
mentioned this as what is done by Solaris...

In this model, the kernel would make an upcall to mountd(8), which  
would either approve or deny, and the kernel would cache the result.   
Updating the export list then becomes a matter of simply flushing the  
kernel's "export cache".

-- thorpej