Subject: Re: fsctl(2) [was: Re: Interface to change NFS exports]
To: None <jmmv84@gmail.com>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/13/2005 19:23:06
On Sep 12, 2005, at 2:05 AM, Julio M. Merino Vidal wrote:
> We all agree in that a new system call is needed. Some also want this
> new interface to not only manage NFS exports but also to allow
> changing
> other settings from a mount point. I think this is a good idea too.
>
> Given these comments, I've started the implementation of a fsctl(2)
> function call, with the following signature:
>
> int fsctl(const char *path, enum fsctl_command command, void
> *data);
I haven't read the HP-UX fsctl(2) manual page, but I'll point out
that OS X 10.4 also has a fsctl(2) system call (although I don't see
a manual page for it).
The 10.4 fsctl(2) basically has ioctl(2) semantics (including the
size field and direction bits in the command argument), and the
signature looks like this:
int fsctl(const char *path, u_long cmd, void *data, int options);
"options" is a flags word that currently has one option --
FSOPT_NOFOLLOW, which means "don't follow symbolic links". That flag
is used in several VFS syscalls in 10.4.
In 10.4, all fsctl(2) commands are currently file system-specific,
but that doesn't mean we can't have generic ones that either all file
systems implement or that are handled at the VFS layer (in general, I
would like to see us move a LOT more stuff out of individual file
systems and into the VFS layer).
> At the moment, command can be one of FSCTL_EXPORT_NFS_GET or
> FSCTL_EXPORT_NFS_SET, to query or set NFS export lists respectively
> based on the given path. (Minor question: can an enum be used as a
> system call argument, or should I better use an integer? If not,
> why?)
Use an ioctl-style command argument :-) It has the nice property of
handling versioning for you, if the size of the argument were to
change for some reason.
> The problem with this interface is that it doesn't let you change
> multiple mount points atomically, as some others have suggested.
> I also agree that having this feature could be nice.
I don't see the value of changing multiple mount points atomically...
most important is that an individual mount point's export list is
updated atomically.
> In the (near) future, we could migrate MNT_GETARGS and MNT_UPDATE
> to this new system call, as well as other stuff like the quota
> management.
I don't see anything wrong with keeping MNT_UPDATE as-is. Its
semantics are "update the mount", i.e. change from r/w to r/o or
whatever. MNT_GETARGS ... well, I have other opinions on that, as
well... I would rather we had string-based mount arguments, rather
than the binary blobs we have now.
> Do you think this is correct and flexible enough for the current and
> future purposes?
I think I would like to have an fsctl(2), sure. But going back to
the original discussion about NFS exports, I think that we should
switch to a model where the export list is not maintained by the
kernel, but rather ONLY by mountd(8). I believe someone else
mentioned this as what is done by Solaris...
In this model, the kernel would make an upcall to mountd(8), which
would either approve or deny, and the kernel would cache the result.
Updating the export list then becomes a matter of simply flushing the
kernel's "export cache".
-- thorpej