Subject: msdos filesystem consistency?
To: None <current-users@netbsd.org>
From: Brook Milligan <brook@biology.nmsu.edu>
List: current-users
Date: 08/20/1999 15:29:51
I am trying to create some msdos filesystem images for floppies and
have noticed some odd inconsistencies that indicate a bug either in
the msdos FAT handling code or the vnode code (I guess).  The basic
problem is that the second FAT is always correct, but differs from the
first FAT.  It can be corrected with a modified fsck_msdos (see patch
below), though.

Commands to trigger the bugs:

	dd if=/dev/zero of=/tmp/msdos.fs count=1440 bs=1k
	newfs_msdos -f 1440 /tmp/msdos.fs
	vnconfig -t floppy -v -c /dev/vnd0d /tmp/msdos.fs
	mount -t msdos /dev/vnd0a /mnt
	for f in 0 1 2 3 4 5 6 7; do cp /etc/motd /mnt/motd.$$f; done
	umount /mnt
	fsck_msdos.new -y /dev/vnd0d || true
	vnconfig -u /dev/vnd0a

Patch to create fsck_msdos.new used above (switches the order of
inquiry to give the second FAT precedence over the first):

	--- src/sbin/fsck_msdos/fat.c.orig	Thu Jan 22 11:48:44 1998
	+++ src/sbin/fsck_msdos/fat.c	Fri Aug 20 14:56:47 1999
	@@ -242,12 +242,12 @@
				}
				pwarn("Cluster %u is marked %s in FAT 0, %s in FAT %d\n",
				      cl, rsrvdcltype(*cp1), rsrvdcltype(*cp2), fatnum);
	-			if (ask(0, "use FAT 0's entry")) {
	-				*cp2 = *cp1;
	-				return FSFATMOD;
	-			}
				if (ask(0, "use FAT %d's entry", fatnum)) {
					*cp1 = *cp2;
	+				return FSFATMOD;
	+			}
	+			if (ask(0, "use FAT 0's entry")) {
	+				*cp2 = *cp1;
					return FSFATMOD;
				}
				return FSFATAL;

The bugs are indicated by the corrections needed by fsck_msdos.new:

  - EOFs in FAT1 (correct) correspond to free clusters in FAT0 (incorrect)
  - different clusters are involved each time the above commands are
    run; thus the problems are not deterministic
  - sometimes both FATs agree and no corrections are needed
  - the patch above is probably not optimal as a general solution, but
    just to indicate the problem and a workaround

Am I doing something stupid here?

Is this a known problem?

Should I be looking for the problem in the FAT code or the vnode code
or somewhere else?

Any suggestions for a fix?

Should I report this with send-pr?

Thanks alot for your help.

Cheers,
Brook