Re: fs-independent quotas

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: fs-independent quotas
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Thu, 20 Oct 2011 17:35:16 +0000
On Thu, Oct 20, 2011 at 12:56:17PM +0200, Manuel Bouyer wrote:
 > > > > So, a few months back we got a new improved quota format for FFS.
 > > > > Unfortunately, one of the side effects of this was to sprinkle
 > > > > specific knowledge of the new format through all the userlevel quota
 > > > > tools and quota support logic. To be fair, this was alongside the
 > > > > existing specific knowledge of the old quota format; nonetheless, it's
 > > > > messy and unscalable.
 > > > 
 > > > of course there's been changes to the tools, as there's a new format.
 > > 
 > > The tools ought to be format-independent.
 > 
 > I can't parse this, can you explain ? The tools needs to be aware of the
 > format to do something usefull with the data, isn't it ?

The tools can and should work with a filesystem-independent abstract
schema. This should be independent of any filesystem's on-disk quota
format, just as the <dirent.h> structures are independent of any
filesystem's on-disk directory layout.

 > > > You'll have to explain this. lfs is some variant of ffs, I see no
 > > > reasons why it coudln't use the new format.
 > > 
 > > It could use whatever format it wants. To the extent it currently
 > > supports quotas, I think it's limited to the old-style quotas, that
 > > is, quota1. But there's no way to plug it in without taking the
 > > fs-dependent code currently in all the tools and access pathway and
 > > making a third or perhaps a third and fourth copy of all the logic.
 > 
 > that's plain wrong. If it's quota1 you can use the quota1 code in
 > sys/ufs/ufs (just as it would have done before quota2).

No, it is not wrong. It cannot use the quota1 code in ufs; the whole
premise of the proposed lfs renovation is to unhook lfs from ufs. The
ufs code is a big blob, not a library of components; you can't just
use parts of it, or at least not easily.

I can copy the ufs quota1 structures and some of the ufs quota code,
yes; but then I have struct lfs_dqblk, and I need to interface it to
the rest of the system, and as things currently stand that forces me
to clone all the ffs-quota1-specific quota code all over everywhere.

The lfs/ufs split would have been committed ages ago if the quota
system hadn't gotten in the way. This is why, last spring, when yo
were designing quota2, I was asking you to fix things above the FS to
be FS-independent. But you didn't; instead it got worse. I tried at
the time to explain the situation and the premises, and why the quota
system should be FS-independent at and above the VFS level, but I got
ignored and then sucked away by real life.

Now I'm trying to fix it.

 > > Likewise, if I were to go add quota support to v7fs, or try to hook up
 > > whatever quota support zfs has, or commit Hammer and try to get
 > > whatever quota support *it* has working, or add ext2 quota support, or
 > > write a new fs with quota support, or whatever, I'd have to make still
 > > more copies of the logic to cope with all the different formats and
 > > layouts.
 > 
 > Of course if you have new on-disk format you need to do some conversion,
 > whatever "filesystem independant" format you use.
 > But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the
 > convertion from plist to some binary representation.

I could cut and paste it, maybe. That's not particularly desirable.

 > > This is not a good idea, not scalable, and not sensible, especially
 > > when a filesystem-independent (read "format-independent" if you like)
 > > interface is both perfectly possible and simpler.
 > 
 > I strongly believe the plist representation is format-independent.
 > It has exactly the same informations as what you propose.

Right now, I'm not sure if it is or not. I'm only sure that it's
highly complicated (unnecessarily so) and underdocumented. Meanwhile,
you've also been arguing that the quota2 on-disk structures are
format-independent, so forgive me if I take this all with a grain of
salt.

 > >  > This is exactly the format described in quotactl(2).
 > > 
 > > No, what's described in quotactl(2) is something about commands and
 > > arguments... and while there is a substructure that looks something
 > > like this, the fact remains that it's a *sub*structure
 > 
 > Yes, but you still need a way to pass commands. You didn't talk about this.

No, because I had something like the old quotactl(2) in mind - an
ordinary call passing a filesystem identifier, a command code, and an
argument.

 > > and the schema
 > > is not tabular.
 > 
 > I don't understant what you mean here. there's a set of values associated
 > with an id, I can't see the difference with what your proposing.

There's a complicated hierarchical structure of arrays and
maps/dictionaries, as opposed to a single flat table with columns.
Or, put another way, the schema I proposed is (I think) in third
normal form, and yours isn't.

Another way to put it is that your schema requires proplib to manage
it, with all the attendant complexity, whereas mine works perfectly
well as an array of C structs.

 > >  > > The quota *class* is the thing the quota is imposed on; this is
 > >  > > currently either "user" or "group". There is no likely prospect of
 > >  > > additional quota classes appearing.
 > >  > 
 > >  > I don't think we should limit ourselve to these class. I could see
 > >  > per-host or per-hostgroup quotas for networked filesystems for example.
 > > 
 > > I'm not limiting it to anything, but I'll believe in more quota
 > > classes when I see them. Per-host quotas (even if they make sense,
 > > which I question) aren't going to work very well with a 32-bit id, for
 > > example.
 > 
 > right, that's where a plist is a win.

...no, not really, you'll still have to rewrite all the existing code
that assumes the ID field it's getting out of the proplib bundle is an
integer, and you'll still need to do compat versioning on the system
and library calls. You just lose the ability to have the compiler find
the code that needs to be changed.

Dynamic typing isn't a panacea.

 > > Whereas, as I pointed out before, there are filesystems in the field
 > > with more than two quota types.
 > 
 > The current format has no limitations in this area.

But most or all of the current code does.

 > > > This is what we have now: the logical schema is a proplib-based table;
 > > > and each filesystem translate it to its own format.
 > > > We can provide some helper functions to assist with the transforms,
 > > > this is what I started to do in quota2_subr.c. It looks ffs-specific but
 > > > is really close to what you're proposing here.
 > > 
 > > All the current code that I've seen in the userlevel tools uses
 > > ffs-specific data structures, either the new ones or the old ones
 > > depending on which format is in use. Describing that as really close
 > > to what I'm proposing is a pretty big stretch.
 > 
 > You probably didn't look closely.

No, I've looked very closely. I've been working on the userlevel tools
to fix these problems, remember?

 > Yes, the userland code does a plist to binary convertion do a
 > structure which is identical to the quota2 structure, but that
 > doesn't make it ffs-specific.

So then why does it fall back to the quota1 structure when quota1 is
in use?

 > > > > The userlevel quota library is going to be completely rewritten to
 > > > > provide a key/value access API to the logical schema described above.
 > > > > This will be converted to quotactl calls to the kernel... and also
 > > > > some other actions, such as contacting rquotad on NFS servers. There
 > > > > are also some cases with the old-style quotas where the tools access
 > > > > the quota files directly; some of these cases may go away, but I'm not
 > > > > sure they all can.
 > > > 
 > > > They can't if you want to keep some level of backward-compat.
 > > 
 > > I'm still not sure of that.
 > 
 > For example if you want repquota to be able to dump quotas from
 > a quota1 file of an unmounted filesystem (this is part of the
 > quota1 -> quota2 migration).

I don't see that you can do anything with an unmounted filesystem in
repquota. Unless the quota files for the filesystem are on a different
(and mounted) volume, it won't be able to read them, and it doesn't
have any code to mount the filesystem temporarily to do that.

So I really don't know what you're talking about.

I also see no merit whatsoever to working with quota information on
unmounted filesystems and I don't think this should be implemented or
supported.

 > > No. The userlevel tools, including repquota, should be able to read
 > > and write quota information using a uniform filesystem-independent
 > > interface. To the extent that special per-filesystem logic is needed
 > > above the kernel, it should be encapsulated inside libquota and not
 > > spread around everywhere indiscriminately.
 > 
 > It's not everywhere, it's in: repquota (for the convertion to
 > quota2 I mentioenned above, and because it was working this way before),
 > quotacheck and quotaon (because they have to, they're ffs quota1 specific),
 > and edquota (because it was working this way before).

That's pretty close to everywhere. And again, everything should be
able to read and write quota information using a uniform filesystem-
independent interface. There is no need to spread special-case code
throughout the system.

 > And again, this is independant from the representation format actually used.

How? It's representation-specific code.

 > > As I explained, the filesystem-independent semantics for
 > > quotaon/quotaoff are only that quota enforcement is enabled or
 > > disabled. This is a useful thing to be able to do. We could get rid of
 > > it; but I see no reason to.
 > 
 > So it's different from what quotaon/quotaoff actually do (right now,
 > for ffs quota1, when quota are off, they're not enforced any more,
 > but also not updated any more. This is not allowed for quota2).
 >
 > I'm not against the new semantic but then we need something to do
 > what quotaon/quotaoff actually do for ffs quota1 (you can't start
 > using/updating the quota data at mount time because quotacheck has not run
 > yet so data may be stale. And yuu can't run quotacheck before mount because
 > the quota file may be on the filesystem itself).

No, as I said, I'm not intending to change the special semantics
required by the old quota implementation. I'm also not intending to
guarantee that anything else supports them.

If you think there is never any reason to disable quotas temporarily
without unmounting, then perhaps the on/off feature is not needed in
the FS-independent interface and can be removed. However, when I've
suggested this elsewhere I've been told that it should stay.

 > >  > > I expect the following tools to become FS-independent:
 > >  > > 
 > >  > >    quota(1)
 > >  > >    quot(8)
 > >  > >    edquota(8)
 > >  > 
 > >  > they already are.
 > > 
 > > Not at all. Believe me, I've been hacking on edquota all day.
 > 
 > OK, so:
 > quota(1) is not using any on-disk structure any more. So please explain in
 > which way it's not FS-independent.

Let's see; just to begin with it assumes that the only quota types are
for blocks and files. Otherwise, perhaps not; while there's code in
src/usr.bin/quota that accesses quota1 files by name, that code is not
actually used in quota(1) and only used by other quota tools via
.PATH. (gross...)

 > quot(8) is by nature ffs-specific (and quota-independant as it doens't care
 > if quota is enabled or not, or even compiled in kernel) as it collects data
 > from the raw device. It could be changed to get informations from the
 > kenrel quota system, but then it's not quot(8) anymore, it's a clone of
 > repquota(8). This is a major feature change.

Hrm. ok, I sit corrected, I made the mistake of reading the man page
rather than the code.

 > edquota(8): it can edit ffs quota1 data from an unmounted filesystem, yes
 > (this is a feature I choose to keep - for now). the quota2 part (which is
 > used for all mounted filesystems, even thoses using quota1) is
 > fs-independant.

As I have been saying, all the quota1 code that cannot live in the
kernel should live in the quota library.

 > > ...which seems to work using some kind of xml-based procedure call
 > > interface, which isn't what a sysadmin wants to deal with when they're
 > > trying to run a backup or migrate to new disks.
 > 
 > you'll have to explain this. xml has its issues, but it's easily parseable
 > (which is why I choose it over some binary representation. Having written
 > scripts to manage quotas, I know how bad our old text-based tools are).
 > For a migration I'm not sure the admin cares at all about the format
 > of the file, it would as well be a binary blob. But if he needs to look
 > at it, a text-based format (even if it's xml) is certainly easier
 > to manage.

The proper text-based format that is easy to manage with scripts and
script tools is a columnar file delimited by whitespace; this can be
fed to awk, sed, cut(1), etc., whereas XML is a huge hassle by
comparison.

Meanwhile, quotactl(8) appears to use not just XML data but also some
form of XMLRPC-type encoding of quota access commands into XML. The
format of these does not appear to be documented, or if it is, I
haven't found where yet.

Some time ago there was already a lengthy argument (on this list and
elsewhere) about whether encoding system call operations and arguments
in XML was a good idea, and the consensus was negative.

 > > What sorts of actions from scripts are you thinking of? For backups,
 > > that's what quotadump and quotarestore are for. For most other usages,
 > > including stuff like massediting 10,000 student quotas at the start of
 > > a semester or whatnot, edquota serves nicely.
 > 
 > NO. Really not. This may be OK for a one-shot run, but when you want to
 > write a tool that needs to read *all* quotas, do some computation on it
 > and change some of them what we had before quota2 is really not convenient.

Please be specific...

 > > the other distinctions, I think they're more or less self-explanatory.
 > > If you want to know the purpose of drawing these distinctions
 > > carefully at all, it's because currently the semantics are unclear and
 > > poorly documented.
 > 
 > poorly documented, I agree. But they're not unclear for me.

Unfortunately, you aren't the only user.

 > Also, in the above I think you should make it clear that when quotas
 > are off, the filesystem will still update quota usage, even if not
 > enforcing the limits.

That's filesystem-specific.

 > > quota1 support isn't going to be removed.
 > 
 > That's a change in my plans then.  Why do you think it should stay ?
 > This kind of quota system is not going to work for modern filesystem sizes
 > (quotachek takes ages).

Because it's an on-disk format. We still read and write ancient
versions of FFS; I don't see that ancient versions of FFS quotas
should be treated any differently, even if they're obsolete from a
technical perspective.

 > > Anyhow, as I wrote above, the knowledge of whether quotas exist should
 > > be maintained and provided by the kernel, so it works reliably and
 > > with mounts that aren't listed in fstab. All file systems that support
 > > quotas can and should do this.
 > 
 > this is what quota2 does. quota1 is different here, and I think I explained
 > why. We can choose to change it, but then it is what I would
 > call a major behavior change and I think there should be a transition
 > period.

quota1 is not different here, or should not be, there's just a pile of
legacy code that should have been cleaned up ages ago.

I don't see that there's any behavior change involved here that isn't
a plain and simple bug fix.

 > > Also, you're once again wrong about what's using this logic. In
 > > addition to quotacheck and quotaon, quota, edquota, and repquota are
 > > all checking fstab.
 > 
 > no, quota is not. edquota and repquota are, I already explained why.

Yes, sorry, I was misled by the code in quota's source directory that
it doesn't use.

None of these programs should be checking fstab.

 > > An encoded form of the API I already described, with get/put/delete
 > > and cursors.
 > 
 > So we loose the clear command. I guess it's implemented as part of put.

Do we? Maybe not, I didn't say the API was finalized.

But, pray tell, where is this clear command you mention documented?

 > > No, because (among other things) the schema I'm implementing is not
 > > the same. The proplib schema is hierarchical, for example,
 > > rather than being normalized;
 > 
 > I see this is an advantage, not an inconvenient. You're flattening something
 > that is naturally hierarchical.

No, it's a bug. You're adding bogus hierarchical structure to
something that's naturally tabular.

Furthermore, as I alluded to above, tabular data is much easier to
handle with shell tools.

 > What I understant is that you mostly want a enhanced API for userland
 > tool. It can be implemented without changes to quotactl(2) or the kernel
 > interface.

I would also like a VFS-level kernel interface that new filesystems
can be plugged into sanely.

-- 
David A. Holland
dholland%netbsd.org@localhost
Follow-Ups:
- Re: fs-independent quotas
  - From: Manuel Bouyer
References:
- fs-independent quotas
  - From: David Holland
- Re: fs-independent quotas
  - From: Manuel Bouyer
- Re: fs-independent quotas
  - From: David Holland
- Re: fs-independent quotas
  - From: Manuel Bouyer
Prev by Date: Re: fs-independent quotas
Next by Date: Re: fs-independent quotas
Previous by Thread: Re: fs-independent quotas
Next by Thread: Re: fs-independent quotas
Indexes:
Home | Main Index | Thread Index | Old Index