Subject: kern/32701: [dM] Indirect blocks break on big filesystems
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 02/02/2006 17:25:00
>Number:         32701
>Category:       kern
>Synopsis:       [dM] Indirect blocks break on big filesystems
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 02 17:25:00 +0000 2006
>Originator:     der Mouse
>Release:        NetBSD 2.0
>Organization:
	Dis-
>Environment:
	The hardware corresponds to
System: NetBSD backup.openface.ca 3.0 NetBSD 3.0 (GENERIC) #0: Mon Dec 19 01:04:02 UTC 2005 builds@works.netbsd.org:/home/builds/ab/netbsd-3-0-RELEASE/i386/200512182024Z-obj/home/builds/ab/netbsd-3-0-RELEASE/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
	but this was observed under a 2.0 kernel, not the 3.0 kernel
	corresponding to the above uname output.  (Tests are underway
	under 3.0; I'll append to this ticket as I get results.)
>Description:
	The machine has a 3ware RAID card in it with 12 disks attached,
	set up as a RAID5 and a RAID0:

	twe0 at pci3 dev 1 function 0: 3ware Escalade
	twe0: interrupting at irq 10
	twe0: 12 ports, Firmware FE7S 1.05.00.065, BIOS BE7X 1.08.00.048
	twe0: Monitor ME7X 1.01.00.038, PCB Rev5    , Achip 3.20    , Pchip 1.30-66 
	twe0: port 0: ST3300831AS                              286168 MB
	twe0: port 1: ST3300831AS                              286168 MB
	twe0: port 2: ST3300831AS                              286168 MB
	twe0: port 3: ST3300831AS                              286168 MB
	twe0: port 4: ST3300831AS                              286168 MB
	twe0: port 5: ST3300831AS                              286168 MB
	twe0: port 6: ST3300831AS                              286168 MB
	twe0: port 7: ST3300831AS                              286168 MB
	twe0: port 8: ST3300831AS                              286168 MB
	twe0: port 9: ST3300831AS                              286168 MB
	twe0: port 10: ST3300831AS                              286168 MB
	twe0: port 11: ST3300831AS                              286168 MB
	ld0 at twe0 unit 0: 64K stripe RAID5, status: Normal
	ld0: 1956 GB, 255368 cyl, 255 head, 63 sec, 512 bytes/sect x 4102491904 sectors
	ld1 at twe0 unit 8: 1024K stripe RAID0, status: Normal
	ld1: 838 GB, 109443 cyl, 255 head, 63 sec, 512 bytes/sect x 1758210048 sectors

	I labeled ld0 as

	4 partitions:
	#        size    offset     fstype [fsize bsize cpg/sgs]
	 a: 4102491904         0     4.2BSD   1024  8192 56528  # (Cyl.      0 - 255368*)
	 d: 4102491904         0     unused      0     0        # (Cyl.      0 - 255368*)

	I created a FFSv1 filesystem in ld0a with fsize=1024 bsize=8192
	(as indicated by the values in the label).  Then I created 418
	files of exactly 4G each, split into five directories:
	00/0001-00/0099, 01/0100-01/0199, ..., 04/0400-04/0418.  (418
	is not special; I just had it keep creating until df reported
	at least 90% full, and that happened to be after 418 files.)
	Each file has distinctive content; given a disk block belonging
	to any of them, I could tell which file it belonged to and
	where in that file it belonged.

	Then I unmounted the filesystem and ran fsck.  fsck found many
	problems, mostly "INCORRECT BLOCK COUNT" and a lot of BAD or
	DUP BLKS.  Looking at what's actually on the disk with other
	tools, it appears that all the fsck-reported problems
	(certainly all the ones I spot-checked) are due to indirect
	blocks getting trashed.  (An iblock full of 0s gives INCORRECT
	BLOCK COUNT; the BAD and DUP blocks are from iblocks full of
	nonzero trash.)

	I haven't checked thoroughly yet, but preliminary indications
	are that corruption strikes whenever an indirect block falls
	above the 1T point on the disk, leading me to suspect a
	signed-32-bit bug somewhere in an indirect-block code path.
	Reinforcing this theory is that ld1a, despite having an 8k/64k
	filesystem (which ld0a did, earlier, but I'd heard conjectures
	that the blocksize was the problem), does not suffer from this
	issue as far as I've been able to tell.

>How-To-Repeat:
	See description above.

>Fix:
	Unknown.  "Don't go above 1T" seems to work, but is hardly a
	real fix.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B