tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: fs-independent quotas

On Thu, Oct 20, 2011 at 05:35:16PM +0000, David Holland wrote:
>  > I can't parse this, can you explain ? The tools needs to be aware of the
>  > format to do something usefull with the data, isn't it ?
> The tools can and should work with a filesystem-independent abstract
> schema. This should be independent of any filesystem's on-disk quota
> format, just as the <dirent.h> structures are independent of any
> filesystem's on-disk directory layout.

the current proplib-based schema is independant of the on-disk format
(as it's just another representation of the same set of data that you 

>  > that's plain wrong. If it's quota1 you can use the quota1 code in
>  > sys/ufs/ufs (just as it would have done before quota2).
> No, it is not wrong. It cannot use the quota1 code in ufs; the whole
> premise of the proposed lfs renovation is to unhook lfs from ufs. The
> ufs code is a big blob, not a library of components; you can't just
> use parts of it, or at least not easily.
> I can copy the ufs quota1 structures and some of the ufs quota code,
> yes; but then I have struct lfs_dqblk, and I need to interface it to
> the rest of the system, and as things currently stand that forces me
> to clone all the ffs-quota1-specific quota code all over everywhere.

So, if I understand you properly, your lfs code won't use the 
quota1 on-disk format but some new format based on a lfs_dqblk structure.
Then it's a brand new disk format, the right thing to do is to use the
convertion functions from common/lib/libquota/ (as the ufs/quota1 and
ufs/quota2 code already do) and convert from here to your on-disk

You can't claim a data representation isn't filesystem-independant because
it doesn't correspond to you on-disk representation. As it's
filesystem-independant it has (by definition) to be converted to every
on-disk representation.

> The lfs/ufs split would have been committed ages ago if the quota
> system hadn't gotten in the way. This is why, last spring, when yo
> were designing quota2, I was asking you to fix things above the FS to
> be FS-independent. But you didn't; instead it got worse. I tried at
> the time to explain the situation and the premises, and why the quota
> system should be FS-independent at and above the VFS level, but I got
> ignored and then sucked away by real life.

Well, I don't remember the details of that time but what I retained
is that you didn't like xml.
Now you're saying "I move lfs out of ufs and I can't use quota1 for lfs".
Yes, of course as quota1 is tightly coupled to ufs, and my project was
not to make quota1 filesystem-independant - it was to add a new
on-disk quota for ffs with some better properties. You can't blame
me for not making quota1 (or even quota2) reusable outside of ufs when 
my goal was to get a new on-disk format for ffs. That's just not the
same work.

Now, I don't think the current quota1 code is that much tied to ufs.
If you want to use the same dqblk for your on-disk format (but then
it's on-disk format, you can't claim it's fs-independant), code can
certainly be reorganised to make it reusable outside of ufs. But that's
orthogonal to filesystem-independant format representation.

> Now I'm trying to fix it.
>  > > Likewise, if I were to go add quota support to v7fs, or try to hook up
>  > > whatever quota support zfs has, or commit Hammer and try to get
>  > > whatever quota support *it* has working, or add ext2 quota support, or
>  > > write a new fs with quota support, or whatever, I'd have to make still
>  > > more copies of the logic to cope with all the different formats and
>  > > layouts.
>  > 
>  > Of course if you have new on-disk format you need to do some conversion,
>  > whatever "filesystem independant" format you use.
>  > But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the
>  > convertion from plist to some binary representation.
> I could cut and paste it, maybe. That's not particularly desirable.

Now that I understand where you want to go, it's not the right thing
to do. Use the code in common/lib/libquota and write convertion routines
for your filesystem. You can call it a 'cut-n-paste' from quota2_subr.c,
but as quota2_subr.c is about converting the filsystem-independant
data to the quota2 on-disk format, and you use a different on-disk
format you can't blame it for not fitting your needs.

>  > > This is not a good idea, not scalable, and not sensible, especially
>  > > when a filesystem-independent (read "format-independent" if you like)
>  > > interface is both perfectly possible and simpler.
>  > 
>  > I strongly believe the plist representation is format-independent.
>  > It has exactly the same informations as what you propose.
> Right now, I'm not sure if it is or not. I'm only sure that it's
> highly complicated

It's not more complicated than the table representation you proposed
(beside being xml-based, but that's all whe have now).

> (unnecessarily so) and underdocumented. Meanwhile,

documentation can always be improved. The plist format is described
in quotactl(2), you can comment on what you think is missing.

> you've also been arguing that the quota2 on-disk structures are
> format-independent, so forgive me if I take this all with a grain of
> salt.

No, I certainly never wrote that (or I didn't mean to - I tend to use quota2
for both the new kernel/userland interface and associated functions, and the
new ffs on-disk format; I should probably use different names. From now
on I'll try to use ffs-quota2 for the later, and quotactl2 for the former).
Nothing outside of sys/ufs/ uses ffs-quota2 on-disk structures, they use
ufs_quota_entry from common/include/quota/quotaprop.h (you'll claim it's
usf-specific but it's not: it's the format you described. The ufs here
means "ufs-like semantic"). the sys/ufs/ufs/*quota2* files are about
converting from ufs_quota_entry from/to ffs-quota2 on-disk format, and
doing something usefull with that based on the command received.
If you say sys/ufs/ufs/*quota2* are ufs-specific I agree.
But this is by definition filesystem-specific code and it can hardly be

>  > >  > This is exactly the format described in quotactl(2).
>  > > 
>  > > No, what's described in quotactl(2) is something about commands and
>  > > arguments... and while there is a substructure that looks something
>  > > like this, the fact remains that it's a *sub*structure
>  > 
>  > Yes, but you still need a way to pass commands. You didn't talk about this.
> No, because I had something like the old quotactl(2) in mind - an
> ordinary call passing a filesystem identifier, a command code, and an
> argument.

This caused issues for puffs. This is another place where plist is a win.

>  > > and the schema
>  > > is not tabular.
>  > 
>  > I don't understant what you mean here. there's a set of values associated
>  > with an id, I can't see the difference with what your proposing.
> There's a complicated hierarchical structure of arrays and
> maps/dictionaries, as opposed to a single flat table with columns.
> Or, put another way, the schema I proposed is (I think) in third
> normal form, and yours isn't.

Yes, you've flattened something which is hierarchical.
Also, you seem to think there won't ever be more than blocks and inodes
quotas, I'm not sure about that. One avantage of a hierarchical structure
to represent hierarchical datas is that it can adapt to changes in hierarchy.

> Another way to put it is that your schema requires proplib to manage
> it, with all the attendant complexity, whereas mine works perfectly
> well as an array of C structs.

The quotactl2 could be represented as an array of C structs as well.
But a text-based representation is more flexible (there's no problems
of system call versionning each time you want to change something in the
format, for example).

>  > > I'm not limiting it to anything, but I'll believe in more quota
>  > > classes when I see them. Per-host quotas (even if they make sense,
>  > > which I question) aren't going to work very well with a 32-bit id, for
>  > > example.
>  > 
>  > right, that's where a plist is a win.
>, not really, you'll still have to rewrite all the existing code
> that assumes the ID field it's getting out of the proplib bundle is an
> integer, and you'll still need to do compat versioning on the system
> and library calls. You just lose the ability to have the compiler find
> the code that needs to be changed.

No, you don't need to change that much thing: usual code will keep parsing
it as an integer, where this new networked filesysem will expect something
else (a 64bit hash, an IP address, whatever). So only the networked filesystem
code needs to know about it (and tools that can deal with this quota
format - your "filesystem-independant" tools certainly won't).

And no there's no compat versionning needed here.

> Dynamic typing isn't a panacea.
>  > > Whereas, as I pointed out before, there are filesystems in the field
>  > > with more than two quota types.
>  > 
>  > The current format has no limitations in this area.
> But most or all of the current code does.

Yes, mostly because they come from quota1. But adapting them wouldn't be
much work (the most problematic would be to find a sane way to present
extra data to the user), and older tools would just ignore the unknown

With a binary format you'll have ABI issues between kernel and userland.

>  > > All the current code that I've seen in the userlevel tools uses
>  > > ffs-specific data structures, either the new ones or the old ones
>  > > depending on which format is in use. Describing that as really close
>  > > to what I'm proposing is a pretty big stretch.
>  > 
>  > You probably didn't look closely.
> No, I've looked very closely. I've been working on the userlevel tools
> to fix these problems, remember?

So I think you didn't spot the right problems. the quotactl2 part is
filesystem-independant, and they're all using quotactl2, exept when
the filesystem is not mounted and they have to read the quota file
by themselve.

>  > Yes, the userland code does a plist to binary convertion do a
>  > structure which is identical to the quota2 structure, but that
>  > doesn't make it ffs-specific.
> So then why does it fall back to the quota1 structure when quota1 is
> in use?

Where did you see that ? AFAIK it falls back to quota1 structure when
it has to read the quota file by itself because the filesystem is not
mounted (or quotaon has not run yet).

>  > For example if you want repquota to be able to dump quotas from
>  > a quota1 file of an unmounted filesystem (this is part of the
>  > quota1 -> quota2 migration).
> I don't see that you can do anything with an unmounted filesystem in
> repquota. Unless the quota files for the filesystem are on a different
> (and mounted) volume, it won't be able to read them, and it doesn't
> have any code to mount the filesystem temporarily to do that.

Hum, you're right, it seems I broke this. I'll have a look at fixing
it, it's a bug.

> So I really don't know what you're talking about.
> I also see no merit whatsoever to working with quota information on
> unmounted filesystems and I don't think this should be implemented or
> supported.

As I said, it helps converting from ffs-quota1 to ffs-quota2, as you can't
have both enabled for a given filesystem at the same time.

>  > > No. The userlevel tools, including repquota, should be able to read
>  > > and write quota information using a uniform filesystem-independent
>  > > interface. To the extent that special per-filesystem logic is needed
>  > > above the kernel, it should be encapsulated inside libquota and not
>  > > spread around everywhere indiscriminately.
>  > 
>  > It's not everywhere, it's in: repquota (for the convertion to
>  > quota2 I mentioenned above, and because it was working this way before),
>  > quotacheck and quotaon (because they have to, they're ffs quota1 specific),
>  > and edquota (because it was working this way before).
> That's pretty close to everywhere. And again, everything should be
> able to read and write quota information using a uniform filesystem-
> independent interface. There is no need to spread special-case code
> throughout the system.

You can move this code to libquota if you want, I don't mind
(especially if you're planning to keep ffs-quota1 around).

>  > And again, this is independant from the representation format actually 
> used.
> How? It's representation-specific code.

Yes, what I meant, it has nothing do to with the quotactl2 format. It's
internal to some tools that have some filesystem-specific code (to read/write
a filesystem-specific format directly).

>  > > As I explained, the filesystem-independent semantics for
>  > > quotaon/quotaoff are only that quota enforcement is enabled or
>  > > disabled. This is a useful thing to be able to do. We could get rid of
>  > > it; but I see no reason to.
>  > 
>  > So it's different from what quotaon/quotaoff actually do (right now,
>  > for ffs quota1, when quota are off, they're not enforced any more,
>  > but also not updated any more. This is not allowed for quota2).
>  >
>  > I'm not against the new semantic but then we need something to do
>  > what quotaon/quotaoff actually do for ffs quota1 (you can't start
>  > using/updating the quota data at mount time because quotacheck has not run
>  > yet so data may be stale. And yuu can't run quotacheck before mount because
>  > the quota file may be on the filesystem itself).
> No, as I said, I'm not intending to change the special semantics
> required by the old quota implementation. I'm also not intending to
> guarantee that anything else supports them.
> If you think there is never any reason to disable quotas temporarily
> without unmounting, then perhaps the on/off feature is not needed in
> the FS-independent interface and can be removed. However, when I've
> suggested this elsewhere I've been told that it should stay.

What do you mean with "disabling" ? if it's stopping enforcing
the limits then I don't mind (but I can't see a use for it).
If you mean "stopping enforcing the limits and stopping updating the
usage values" then it's a problem, because this would be a filesystem
corruption for ffs-quota2 (and for other modern in-filesystem formats
I've looked at).

>  > >  > > I expect the following tools to become FS-independent:
>  > >  > > 
>  > >  > >    quota(1)
>  > >  > >    quot(8)
>  > >  > >    edquota(8)
>  > >  > 
>  > >  > they already are.
>  > > 
>  > > Not at all. Believe me, I've been hacking on edquota all day.
>  > 
>  > OK, so:
>  > quota(1) is not using any on-disk structure any more. So please explain in
>  > which way it's not FS-independent.
> Let's see; just to begin with it assumes that the only quota types are
> for blocks and files.

Yes. But that's mostly because 1) we don't have anything more than that yet
and 2) I don't have in mind a sane presentation to users which would
allow to display an arbitraty number of quota types withing the
current (and historic) display.
To much change in the way things are displayed here could break existing
third-party tools. 

> Otherwise, perhaps not; while there's code in
> src/usr.bin/quota that accesses quota1 files by name, that code is not
> actually used in quota(1) and only used by other quota tools via
> .PATH. (gross...)

Well, there was already a file shared by quota tools here, I didn't feel adding
another file just for this small function at some other place.
So yes, there's some functions in src/usr.bin/quota which is not used
by quota(1) itself.

>  > quot(8) is by nature ffs-specific (and quota-independant as it doens't care
>  > if quota is enabled or not, or even compiled in kernel) as it collects data
>  > from the raw device. It could be changed to get informations from the
>  > kenrel quota system, but then it's not quot(8) anymore, it's a clone of
>  > repquota(8). This is a major feature change.
> Hrm. ok, I sit corrected, I made the mistake of reading the man page
> rather than the code.
>  > edquota(8): it can edit ffs quota1 data from an unmounted filesystem, yes
>  > (this is a feature I choose to keep - for now). the quota2 part (which is
>  > used for all mounted filesystems, even thoses using quota1) is
>  > fs-independant.
> As I have been saying, all the quota1 code that cannot live in the
> kernel should live in the quota library.

OK, fine by me.

>  > > ...which seems to work using some kind of xml-based procedure call
>  > > interface, which isn't what a sysadmin wants to deal with when they're
>  > > trying to run a backup or migrate to new disks.
>  > 
>  > you'll have to explain this. xml has its issues, but it's easily parseable
>  > (which is why I choose it over some binary representation. Having written
>  > scripts to manage quotas, I know how bad our old text-based tools are).
>  > For a migration I'm not sure the admin cares at all about the format
>  > of the file, it would as well be a binary blob. But if he needs to look
>  > at it, a text-based format (even if it's xml) is certainly easier
>  > to manage.
> The proper text-based format that is easy to manage with scripts and
> script tools is a columnar file delimited by whitespace; this can be
> fed to awk, sed, cut(1), etc., whereas XML is a huge hassle by
> comparison.

I dissagree. a columnar file delimited by whitespace may be fine
for shell scripts (if you don't have fields that can be empty or be a
single whitespace, because then you have problems), but it's not for e.g.
perl or python.
I've been parsing the output of repquota from a perl script, and it's
really not nice.

> Meanwhile, quotactl(8) appears to use not just XML data but also some
> form of XMLRPC-type encoding of quota access commands into XML.

Yes it is.
> format of these does not appear to be documented, or if it is, I
> haven't found where yet.

There's an example in quotactl(8), and a more complete
description of commands in quotactl(2).

> Some time ago there was already a lengthy argument (on this list and
> elsewhere) about whether encoding system call operations and arguments
> in XML was a good idea, and the consensus was negative.

I don't remember this (any pointers) ? It's certainly bad for
performances-sensitive syscalls, but for low-usage sycalls or ioctl
it's fine. It's architecture-neutral and make kdump's life easier,
among other things.

>  > > What sorts of actions from scripts are you thinking of? For backups,
>  > > that's what quotadump and quotarestore are for. For most other usages,
>  > > including stuff like massediting 10,000 student quotas at the start of
>  > > a semester or whatnot, edquota serves nicely.
>  > 
>  > NO. Really not. This may be OK for a one-shot run, but when you want to
>  > write a tool that needs to read *all* quotas, do some computation on it
>  > and change some of them what we had before quota2 is really not convenient.
> Please be specific...

At work, user quotas are stored in a database (with other user account
informations). I have scripts which makes sure the quotas on the servers
matches what's in the database, and that uid/gid not in the database
can't write to the filesystems.

With the current tools the first issue is to find all the existing
quotas, as repoquota doesn't them is usage is 0 (I agree, that's not
a format issue). Then I have to parse repquota output, and that's
not easy because some fiels can be empty (I have 4 variants of the
regexp for this). a quotactl(8) with a getall command fixes this.

Then I have to call edquota for each change, and this is slow.
building a xml string with the needed command and calling quotactl with it
is faster.

>  > > the other distinctions, I think they're more or less self-explanatory.
>  > > If you want to know the purpose of drawing these distinctions
>  > > carefully at all, it's because currently the semantics are unclear and
>  > > poorly documented.
>  > 
>  > poorly documented, I agree. But they're not unclear for me.
> Unfortunately, you aren't the only user.

Documentation can always be improved. That's not a reason to come
with something new, that also needs to be documented.

>  > Also, in the above I think you should make it clear that when quotas
>  > are off, the filesystem will still update quota usage, even if not
>  > enforcing the limits.
> That's filesystem-specific.

then it should be said.

>  > > quota1 support isn't going to be removed.
>  > 
>  > That's a change in my plans then.  Why do you think it should stay ?
>  > This kind of quota system is not going to work for modern filesystem sizes
>  > (quotachek takes ages).
> Because it's an on-disk format. We still read and write ancient
> versions of FFS; I don't see that ancient versions of FFS quotas
> should be treated any differently, even if they're obsolete from a
> technical perspective.

OK, that's reasonable.

>  > > Anyhow, as I wrote above, the knowledge of whether quotas exist should
>  > > be maintained and provided by the kernel, so it works reliably and
>  > > with mounts that aren't listed in fstab. All file systems that support
>  > > quotas can and should do this.
>  > 
>  > this is what quota2 does. quota1 is different here, and I think I explained
>  > why. We can choose to change it, but then it is what I would
>  > call a major behavior change and I think there should be a transition
>  > period.
> quota1 is not different here, or should not be, there's just a pile of
> legacy code that should have been cleaned up ages ago.

In the perspective of keeping ffs-quota1 then I agree.

> I don't see that there's any behavior change involved here that isn't
> a plain and simple bug fix.

repquota(8) and edquota(8) not being able to work on ffs-quota1 with
the filesystem unmounted is not a simple bug fix for me, as I've been
using this in the past. But I could probably live without it.

>  > > Also, you're once again wrong about what's using this logic. In
>  > > addition to quotacheck and quotaon, quota, edquota, and repquota are
>  > > all checking fstab.
>  > 
>  > no, quota is not. edquota and repquota are, I already explained why.
> Yes, sorry, I was misled by the code in quota's source directory that
> it doesn't use.
> None of these programs should be checking fstab.

I explained why I think they should (at last for one more major
release), but that's not a strong requirement for me.

>  > > An encoded form of the API I already described, with get/put/delete
>  > > and cursors.
>  > 
>  > So we loose the clear command. I guess it's implemented as part of put.
> Do we? Maybe not, I didn't say the API was finalized.
> But, pray tell, where is this clear command you mention documented?

quotactl(2), but you'll say it's filesystem-specific, which is
partly true (ffs-quota1 will reject it, but we could choose to
have it revert to defaults instead, I guess).

>  > > No, because (among other things) the schema I'm implementing is not
>  > > the same. The proplib schema is hierarchical, for example,
>  > > rather than being normalized;
>  > 
>  > I see this is an advantage, not an inconvenient. You're flattening 
> something
>  > that is naturally hierarchical.
> No, it's a bug. You're adding bogus hierarchical structure to
> something that's naturally tabular.

OK, so let's call it hierachical tablular if you want.
You have different quota class (currenly user, group)
for each class, there are ids (either uid or gid)
for each id, the are quota types (block or file currently)
these quota type have values (limits, usage, times).
This could be represented in a mutidimentionnal array, but a table is not
a complete representation of this.

> Furthermore, as I alluded to above, tabular data is much easier to
> handle with shell tools.

but not with other languages.

>  > What I understant is that you mostly want a enhanced API for userland
>  > tool. It can be implemented without changes to quotactl(2) or the kernel
>  > interface.
> I would also like a VFS-level kernel interface that new filesystems
> can be plugged into sanely.

You didn't talk much about this.
Right now, you're getting from VFS a proplist describing a set of
commands and arguments. Then you have a some library functions
to parse this proplist, and contruct a new proplist for the
I'm not sure what a VFS-level kernel interface would do more,
expept providing a binary representation of command and values instead of
calling the libquota functions to get it yourself.
Maybe some more code could be abstracted from the ufs code (to decode the
commands and run callbacks for example - this just have me an
idea that I'll try to implement this week-end), but I think the proplist
should be exposed to filesystems, if one ever needs some filesystem-specific

Manuel Bouyer <>
     NetBSD: 26 ans d'experience feront toujours la difference

Home | Main Index | Thread Index | Old Index