Subject: bin/13114: fsck_ffs deals poorly with bad blocks
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 06/04/2001 23:34:31
>Number:         13114
>Category:       bin
>Synopsis:       fsck_ffs deals poorly with bad blocks
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 04 20:34:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     John Hawkinson
>Release:        1.5
>Organization:
MIT
>Environment:
	
System: NetBSD zorkmid.mit.edu 1.5U NetBSD 1.5U (ZORKMID-$Revision: 1.10 $) #103: Mon May 21 21:10:47 EDT 2001 jhawk@zorkmid.mit.edu:/usr/local/netbsd-current/src/sys/arch/i386/compile/ZORKMID i386


>Description:
	fsck_ffs deals extremely poorly with bad blocks. My IDE drive seems to've
acquired some more, and when I run fsck, I get a report of the bad block,
but no information how this relates to the filesystem.

CANNOT READ: BLK 5505312
CONTINUE? yes

>How-To-Repeat:
	fsck /dev/ccd1b

CANNOT READ: BLK 5505312
CONTINUE? yes

>Fix:
	Well, fsck should print out as much contextual information as possible.
Presumably this means the current inode and the previous one, assuming that that
is known. I guess this is happening in pass1(), so we have not followed
any inode chains, so it may be difficult to figure this out.

	Perhaps we should set some state for "bad" inodes such that when they
are linked to in later passes they are pointed out for all to see?

	The following at least allows me to get a clue what's going on:

Index: inode.c
===================================================================
RCS file: /cvsroot/basesrc/sbin/fsck_ffs/inode.c,v
retrieving revision 1.33
diff -u -r1.33 inode.c
--- inode.c	1999/12/12 23:53:26	1.33
+++ inode.c	2001/06/05 03:31:11
@@ -344,7 +344,11 @@
 			size = inobufsize;
 			lastinum += fullcnt;
 		}
-		(void)bread(fsreadfd, (char *)inodebuf, dblk, size); /* ??? */
+		if (bread(fsreadfd, (char *)inodebuf, dblk, size)) {
+			if (debug)
+				printf("...while on inode %u; lastinum %u\n",
+				    inumber, lastinum);
+		}
 		if (doswap) {
 			int i, j;
 			for (i = inumber, dp  = inodebuf; i < lastinum; i++, dp++) {

So now I get something like:

CANNOT READ: BLK 5505312
CONTINUE? yes

THE FOLLOWING DISK SECTORS COULD NOT BE READ: 5505409,
...while on inode 667520; lastinum 667968

which is about as useful as setting a breakpoint on 'rwerror':

Breakpoint 1, rwerror (mesg=0x808f1c5 "READ", blk=5505312)
    at /usr/src/sbin/fsck_ffs/utilities.c:267
267             if (preen == 0)
(gdb) where
#0  rwerror (mesg=0x808f1c5 "READ", blk=5505312)
    at /usr/src/sbin/fsck_ffs/utilities.c:267
#1  0x805a24c in bread (fd=6, buf=0x8641000 "\201\001", blk=5505312, 
    size=57344) at /usr/src/sbin/fsck_ffs/utilities.c:348
#2  0x804c56b in getnextinode (inumber=667520)
    at /usr/src/sbin/fsck_ffs/inode.c:347
#3  0x804e770 in checkinode (inumber=667520, idesc=0xbfbfd454)
    at /usr/src/sbin/fsck_ffs/pass1.c:120
#4  0x804e72d in pass1 () at /usr/src/sbin/fsck_ffs/pass1.c:101
#5  0x804dc3f in checkfilesys (filesys=0x8096320 "/dev/rccd1b", mntpt=0x0, 
    auxdata=0, child=0) at /usr/src/sbin/fsck_ffs/main.c:237
#6  0x804da98 in main (argc=0, argv=0xbfbfd6c0)
    at /usr/src/sbin/fsck_ffs/main.c:167
#7  0x80481c5 in ___start ()

I'm not sure where to go from here, though...
>Release-Note:
>Audit-Trail:
>Unformatted: