tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: radix tree implementation for quota ?



    Date:        Sun, 28 Nov 2010 11:11:37 -0600
    From:        Ted Lemon <mellon%fugue.com@localhost>
    Message-ID:  <AE6B95BF-EAB2-433F-807B-5EBB5BD84087%fugue.com@localhost>

  | On Nov 28, 2010, at 11:08 AM, Ignatios Souvatzis wrote:
  | > Why? It's a sparse file unless you copy it using cp.
  | 
  | True enough, but using sparse files this way tends to

No, that's not the issue, it is that each active uid causes a block
to get allocated, if the uids are spread widely enough, there ends
up being lots of wasted space.  That, plus for large uids the file
appears to be absurdly large - that's just cosmetic really, but when
people see a 20TB file on a 1TB filesystem, they start to wonder.

If you're going to force the quota info to be in the filesystem itself,
then you should probably consider splitting the two types of info that
are stored there - there's the current usage info (etc) that can certainly
be associated (very closely) with the filesystem, and there are the
administrator set limits, which really shouldn't be.

The problem is that we want to be able to have limits set for transient
filesystems (tmpfs, or mfs, etc), they shouldn't just be lost when the
filesystem evaporates.

When it was originally done, that was enough for me to allow the quota
data to be anywhere (ie: not require it to live in the filesystem being
monitored) - splitting it was just too much trouble.

bouyer%antioche.eu.org@localhost said:
  | For a flat text file, lookup time is probably prohibitive (remeber we
  | need to get acces to the quota information for every file create or delete,

not really, it is all supposed to be cached, the file is touched only
when a new user's file is first touched (ie, opened) or when the last
ref to some user's file is made (or a sync operation, but they're not
frequent - most likely they never happen in practice, except as part
of unmount.)   You want the accesses to be quite quick, but they don't
have to be zero cost.

mellon%fugue.com@localhost said:
  | No, wouldn't the kernel just deliver UIDs and usages?   Why would you want
  | more than that in the kernel?

Well, first, and obviously, the kernel has to actually enforce the limits,
so at the very least it needs to know the limits.   Having the file ops be
in user space was the way the Sydney version of this stuff worked, and that
had enough problems that it was what I deliberately set out to be different
from, communicating between kernel and some user space daemon whenever a
file for a new user is touched is just plain hard to get right.   The Sydney
stuff didn't even try, they told the kernel the limits when a user logged in
(as part of login, and I think, su) which meant that I/O to files belonging
to users who were not logged in was totally untracked (and it is pretty easy
to see how to use that to totally defeat the system ...)

bouyer%antioche.eu.org@localhost said:
  | You're suggesting I should include Berkeley DB in the kernel, right ? 

When I saw your first message, I planned on suggesting just that.
Well, not quite that, as the full blown db stuff isn't needed, just
the hash access method, or a simplified version of it.   That's pretty
much what we do now actually, with a trivial hash function (blockno =
id * quote_block_size / disk block size) and then direct access to the
quote info in the block - a more intelligent hash function, with a scan
when the block is read to find the appropriate entry and dbm style block
splitting on overflow ought to work reasonably well, and not be either
large or costly.

kre



Home | Main Index | Thread Index | Old Index