Subject: Re: bin/23725: possible quotacheck enhancement
To: None <kre@munnari.OZ.AU>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 12/17/2003 17:37:23
[ On Friday, December 12, 2003 at 19:11:53 (+0700), kre@munnari.OZ.AU wrote: ]
> Subject: bin/23725: possible quotacheck enhancement
>
> 
> 	quotacheck can run very slowly (take a long time, an extremely
> 	long time) if run on a filesystem that happens to have a file
> 	allocated to a uid (or gid if using group quotas) that is large.

... unless there is a uid (or gid if using group quotas) with a value of
ULONG_MAX, in which case it'll just run forever....   :-)

> 	I believe it has one bug - bad things are likely if uid 2^32-1
> 	actually owns files (or exists in the password file).  (same for
> 	groups) (assuming long is 32 bits

Yes it still goes in a loop.  I tried fixing this but I couldn't quite
get the logic right.  If I managed to get the loop to exit then I ended
up with a zero-byte quota file somehow.  Instead I just added the
following hard-nosed checks to the beginning of addid().  Note on my
systems I have both UID_MAX and GID_MAX defined as (~(uid_t)0).

	u_long maxid;

	switch (type) {
	case GRPQUOTA:
		maxid = GID_MAX;
		break;
	default:
	case USRQUOTA:
		maxid = UID_MAX;
		break;
	}
	if (id > maxid)			/* only possible if sizeof(u_long) > 4 */
		errx(1, "encountered impossible %s ID value: %lu", qfextension[type], id);
	if (id > (maxid - 1))		/* -1 makes us loopy! */
		errx(1, "encountered invalid %s ID value: %lu", qfextension[type], id);

> 	If someone decides to test this, please let me know how you get on.

Other than not allowing a UID/GID of ULONG_MAX, it seems to work on the
test i386 machine under netbsd-1-6 (patches applied by hand as UFS2
support and other -current frobbing has munged things beyond patch's
ability to do it automatically).

(I think repquota could also use a similar fix, though it does
eventually finish running.)

The new '-q' flag does speed things up significantly (I still have a
"-2" user and group):

	# time quotacheck -v -u /mfbd
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	   86.20s real    21.48s user    27.90s system

	# time quotacheck -v -q -u /mfbd
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	   32.26s real     1.04s user     1.71s system

There is another bug somewhere, perhaps in the checkfstab() subroutine
borrowed from fsck, or in the way it's used.  It seems when I have two
filesystems with quotas enabled that one of them gets checked twice:

	# fgrep quota /etc/fstab
	/dev/raid0a /build ffs rw,nodev,nosuid,softdep,userquota 1 2
	/dev/raid1a /mfbd ffs rw,nodev,nosuid,softdep,userquota 1 2
	# quotacheck -v -q -a
	*** Checking user quotas for /dev/rraid0a (/build)
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	/mfbd: root     fixed:  inodes 28 -> 3  blocks 5784 -> 192
	/mfbd: woods    fixed:  inodes 251693 -> 364    blocks 33448720 -> 10442488
	/mfbd: root     fixed:  inodes 3 -> 28  blocks 192 -> 5784
	/mfbd: woods    fixed:  inodes 364 -> 251693    blocks 10442488 -> 33448720
	#

If I remove the quota files and run it again it still seems to do the
second FS twice, but of course multiprocessing randomness results in
different ordering to the operations:

	# quotaoff -a
	# rm /*/quota.user
	# quotacheck -v -q -a
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	*** Checking user quotas for /dev/rraid0a (/build)
	quotacheck: creating quota file /build/quota.user
	/build: root     fixed: inodes 0 -> 25  blocks 0 -> 5592
	/build: woods    fixed: inodes 0 -> 251329      blocks 0 -> 23006232
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	quotacheck: creating quota file /mfbd/quota.user
	/mfbd: root     fixed:  inodes 0 -> 3   blocks 0 -> 192
	/mfbd: woods    fixed:  inodes 0 -> 364 blocks 0 -> 10442488
	/mfbd: root     fixed:  inodes 3 -> 27  blocks 192 -> 5624
	/mfbd: woods    fixed:  inodes 364 -> 251693    blocks 10442488 -> 33448720
	#

The result is a bogus quota file.  The first values (e.g. 364 inodes,
10442488 blocks for my uid) are correct.

Little more is revealed to my eyes by the '-d' flag:

	# quotacheck -d -v -q -a
	pass 1, name /dev/rraid0a
	pass 1, name /dev/rraid1a
	pass 2, name /dev/rraid0a
	pass 2, name /dev/rraid1a
	disk /dev/rraid0: /dev/rraid0a 
	disk /dev/rraid1: /dev/rraid1a 
	*** Checking user quotas for /dev/rraid0a (/build)
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	*** Checking user quotas for /dev/rraid1a (/mfbd)
	done ffs: /dev/rraid1a (/mfbd) = 0x0
	/mfbd: root     fixed:  inodes 3 -> 28  blocks 192 -> 5784
	/mfbd: woods    fixed:  inodes 364 -> 251693    blocks 10442488 -> 33448720
	done ffs: /dev/rraid1a (/mfbd) = 0x0
	done ffs: /dev/rraid0a (/build) = 0x0

At one point I also ended up with a quota recorded for a non-existant
user (not listed in the passwd db) who doesn't seem to actually own any
files, at least none that can be found:

	# repquota -v  -a
	*** Report for user quotas on /build (/dev/raid0a)
	                        Block limits               File limits
	User            used    soft    hard  grace      used    soft    hard  grace
	root      --    2796       0       0               25       0       0       
	woods     --11503116       0       0           251329       0       0       
	
	*** Report for user quotas on /mfbd (/dev/raid1a)
	                        Block limits               File limits
	User            used    soft    hard  grace      used    soft    hard  grace
	root      --    2892       0       0               28       0       0       
	woods     --16724360       0       0           251693       0       0       
	1001      --      48       0       0                3       0       0       

	# find /mfbd -user 1001 -print
	#

This may also have been caused by the dueling quotacheck processes
though.  Turning off quotas, removing the quota.user files, and
re-running quotacheck, "fixed" it (even with '-q'), provided I do one
filesystem at a time by hand.

I haven't noticed any problems with "fsck -p" running more than once on
any filesystem so perhaps this is only a problem in how quotacheck uses
checkfstab()?  I've not yet tried fiddling with the fs_passno field to
see if making it different on each entry will help or not.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>