tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

fs-independent quotas

So, a few months back we got a new improved quota format for FFS.
Unfortunately, one of the side effects of this was to sprinkle
specific knowledge of the new format through all the userlevel quota
tools and quota support logic. To be fair, this was alongside the
existing specific knowledge of the old quota format; nonetheless, it's
messy and unscalable.

We may want to add more quota formats (e.g. the different and
incompatible new quota format FreeBSD added last year) or add quota
support to other filesystems (tempfs, perhaps v7fs) or even add other
filesystems that have or may have their own native quota handling
(zfs, Hammer, you name it). Also, my planned lfs-renovation is
currently hung up on the VFS-level quota interface, because I don't
want to rip out the existing maybe-partial support for quotas but
can't plug new code into the existing framework.

For these and other reasons I've been intending to rework the quota
system. As some people know I've also been avoiding doing it, because
it looks like work; however, it really ought to go in before netbsd-6
is branched and it's getting to the point where I feel like I'm
starting to hold that up, and that's just not on.

So, here's what I'm planning to do.

It seems to me that quotas are fundamentally a special-purpose
key/value store; that is, you look up quota information for a
particular thing (the key) and get back the quota settings and current
usage information (the value). This means, to me at least, that the
quota system can and should be accessed like a key/value store; this
means that a dump of the quota information can be tabular rather than
hierarchically structured and there's therefore no need to involve

I am going to change the exposed schema slightly from the traditional
quota system. In the traditional quota system, there are two quota
tables (which manifest physically as files), one for user quotas and
one for group quotas. Within each table the id (uid or gid) is the
key, and the value contains quota and usage information for both block
and file quotas. This is simple and effective, but not very flexible;
also, it's no longer the 80s and it's not necessary to link the
physical representation closely to the logical representation.

Therefore, I'll do a bit of relational normalization and provide the
following logical representation:

   - the quota key is:
        the quota *class*
        the id
        the quota *type*

   - the quota value is:
        the configured hard limit
        the configured soft limit
        the configured grace period
        the current usage
        the current grace expiry time (if any)

The quota *class* is the thing the quota is imposed on; this is
currently either "user" or "group". There is no likely prospect of
additional quota classes appearing.

The quota *type* is the thing the quota is about; this is currently
either "blocks" or "files". There is, however, prior art out there
(not in NetBSD though) that provides quotas for additional types.

Ideally as much of the code as possible should be written to be able
to transparently handle additional quota classes and types; this may
end up being more of a long-term goal than a direct consequence of the
current work, though.

In this schema the quotas pertaining to a user "joe", with uid 101 and
gid 100, might appear as:

   class  id    type    hard    soft    usage   grace   expire
   user   101   block   11000   10000   5072    7d      -
   user   101   file    1100    1000    280     7d      -
   gid    100   block   22000   20000   11543   14d     -
   gid    100   file    2200    2000    5072    14d     -

In the traditional quota implementation these four rows in the
(filesystem-independent) logical schema fit into two struct dqblks,
one holding the group data and one holding the user data. I believe
the quota2 physical representation is similar.

(I have no plans to change any of the physical representations or the
code that manages it.)

The current structural plan is for this logical schema to be exposed
by each file system at the VFS layer. That is, each filesystem will be
responsible for translating between its internal, on-disk format and
the filesystem-independent logical schema. The VFS- and syscall-level
kernel code should not need to do much of anything but hand off to the
filesystem; this is more or less how things currently are and always
have been.

The userlevel quota library is going to be completely rewritten to
provide a key/value access API to the logical schema described above.
This will be converted to quotactl calls to the kernel... and also
some other actions, such as contacting rquotad on NFS servers. There
are also some cases with the old-style quotas where the tools access
the quota files directly; some of these cases may go away, but I'm not
sure they all can. This logic and the FS-specific knowledge it
requires can and should be contained inside libquota.

I'm also going to crib from FreeBSD's quota library and add libquota
calls for things like turning quotas on and off. This should make the
userlevel tools simpler, and should make life easier for any
third-party tools that want to manipulate quotas. (There aren't many,
but a few do exist.) Unfortunately, direct compat with FreeBSD's quota
library isn't feasible as theirs is not FS-independent.

I expect the following tools to become FS-independent:


I'm also intending to add quotadump(8) and quotarestore(8) tools to
allow backing up quota settings easily. With the traditional quota
system you can just back up the quota files (and since they're exposed
in the filesystem, this happens by default unless you explicitly
exclude them) but with in-FS quotas that no longer works and a
dump/restore method is needed. I think quotadump and quotarestore will
probably end up as hard links to edquota, but that's not entirely
clear yet.

I'm going to remove the current quotactl(8) as it seems to be entirely
specific to the current proplib-based interface.

Note that quotacheck(8) is specific to the old-style FFS quotas and is
not FS-independent; this will not (and cannot) change.

One remaining thing: I'm intending to systematize the current mess of
quotas enabled/disabled/on/off/vanilla/chocolate/strawberry as

1. A file system type can have or not have support for quotas. If
there is no support for quotas, nothing else works.

2. Any given filesystem volume may have or not have quota data on it.
This is the filesystem code's problem and irrelevant to the
FS-independent logic.

3. Any given filesystem volume may be mounted with or without quotas
enabled. If quotas are not enabled, quota information is not available
and the quota utilities will not be able to do anything.

4. Once mounted, quotas can be either on or off. As far as the
FS-independent code is concerned, quotas being off means only that
they aren't enforced; that is, with quotas off operations that
increase usage do not fail with EDQUOT. When quotas are off, quota
information can still be inspected or updated.

I am not intending to change the specific semantics that turning
quotas on has the traditional quota system. Those semantics are
required for quotacheck to be able to do its thing properly. However,
knowledge of this behavior should be limited to the code in FFS (and
probably some in libquota) that needs to know the gory details.

Currently there are, as far as I can tell, multiple ways to enable
quotas for a filesystem in /etc/fstab, and the quota utilities check
fstab in various (and I think not always consistent) ways to try to
figure out what's going on. My intent is to nuke all that: only mount
should care what's in /etc/fstab, because otherwise the tools won't
work properly on temporary mounts. The quota library (and thus the
tools) should detect whether a mounted filesystem has quotas enabled
by calling quotactl; if quotactl fails, quotas are not enabled. (In
the long run there should be a FS-independent mount flag to indicate
this; however, I'm not sure we're ready for that just yet.)

It is not specified whether a filesystem mounted with quotas enabled
comes up with quotas turned on or off. The traditional system requires
"off", of course, but I think the default for new code that doesn't
require quotacheck should be "on".

The following is a sketch of the intended libquota API:

   #include <quota.h>

   struct quotavolume; /* Opaque. */
   struct quotacursor; /* Opaque. */

   struct quotakey {
           unsigned qk_class;
           id_t qk_id;
           unsigned qk_type;
   struct quotaval {
           uint64_t qv_hardlimit;
           uint64_t qv_softlimit;
           uint64_t qv_usage;
           time_t qv_grace;
           time_t qv_expire;

   #define QUOTA_CLASS_USER     0
   #define QUOTA_CLASS_GROUP    1

   #define QUOTA_TYPE_BLOCKS    0
   #define QUOTA_TYPE_FILES     1

   #define QUOTA_DEFAULTID      ((id_t)-1)

   #define QUOTA_NOLIMIT        ((uint64_t)0xffffffffffffffff)

   #define QUOTA_INFINITEGRACE  ((time_t)-1)
   #define QUOTA_NOGRACE        ((time_t)0)

   struct quotavolume *quota_open(const char *volume_path);
   void quota_close(struct quotavolume *);

   const char *quota_getschemaname(struct quotavolume *);

   unsigned quota_getnumclasses(struct quotavolume *);
   const char *quota_getclassname(struct quotavolume *, unsigned class);

   unsigned quota_getnumtypes(struct quotavolume *);
   const char *quota_gettypename(struct quotavolume *, unsigned type);

   int quota_on(struct quotavolume *);
   int quota_off(struct quotavolume *);

   int quota_get(struct quotavolume *, const struct quotakey *key,
                 struct quotaval *val_ret);

   int quota_put(struct quotavolume *, const struct quotakey *key,
                 const struct quotaval *val);

   int quota_delete(struct quotavolume *, const struct quotakey *key);

   struct quotacursor *quota_opencursor(struct quotavolume *);
   void quotacursor_close(struct quotacursor *);

   int quotacursor_get(struct quotacursor *qc, struct quotakey *key_ret,
                       struct quotaval *val_ret);

   int quotacursor_getn(struct quotacursor *qc, struct quotakey *keys_ret,
                        struct quotaval *vals_ret, int maxnum);

   bool quotacursor_atend(struct quotacursor *);
   int quotacursor_rewind(struct quotacursor *);

This should all, I hope, be fairly self-explanatory, with the possible
exception of getschemaname; the idea of that is to fetch a
human-readable string that reflects the underlying implementation in
use so it's possible to tell what that is. Programs aren't supposed to
interpret it.

Other minor notes:

 - qv_grace probably doesn't need to be 64 bits wide but there's no
particular harm in it;

 - some of the reserved values are different from the corresponding
reserved values in the physical implementations; this is more or less

 - with some implementations, some of the possible values that can be
looked up are hardwired; e.g. with the old-style quota format it is
impossible to set quotas for uid 0 or gid 0. While the API allows
attempting to change these hardwired values, such attempts will fail.

 - some implementations may not be able to distinguish a blank entry
from no entry, in which case applying a blank entry with quota_put is
the same as deleting the entry with quota_delete. Others, however,

 - The choice of "class" for types-of-ID and "type" for types-of-thing
is somewhat arbitrary. One might argue that it would make more sense
the other way around. I could be persuaded to switch it (or to change
to other terms) but speak up fast. I do think being clear about these
as different kinds of things is a good idea; the current code and docs
are not particularly and I'm planning to fix that as I go.

This API is a draft, because I'm not anything like done converting the
userlevel tools yet, let alone tackling the kernel side. Similarly, I
don't have patches to offer.

However, as this mail is already almost five pages long (or ten if
double-spaced) I think there's plenty to discuss here already, and
it's probably time to get any necessary bikeshedding started.

(Lemon yellow with pale sky blue trim, I think. Since I'm being loud.)

David A. Holland

Home | Main Index | Thread Index | Old Index