Subject: kern/31544: The ffs softdep code appears to fail to write dirty bits to disk
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <he@uninett.no>
List: netbsd-bugs
Date: 10/10/2005 13:09:00
>Number:         31544
>Category:       kern
>Synopsis:       The ffs softdep code appears to fail to write dirty bits to disk
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 10 13:09:00 +0000 2005
>Originator:     Havard Eidnes
>Release:        NetBSD 3.99.9 19 Sep 2005
>Organization:
	UNINETT AS
>Environment:
System: NetBSD bean.urc.uninett.no 3.99.9 NetBSD 3.99.9 (GENERIC) #64: Mon Sep 19 13:49:02 CEST 2005  he@quattro.urc.uninett.no:/u/build/HEAD/obj/i386/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

	Around the 20th last month I installed NetBSD 3.99.9 as
	described above on an IBM x336 system with a 3.2G Xeon, 1GB
	memory and a single SCSI disk drive.  I partitioned the drive
	with one large /usr ffs file system covering most of the disk,
	mounted softdep, and the same day I checked out src, pkgsrc
	and xsrc from a CVS mirror.

	For various reasons I did not get around to doing what I had
	planned to do with the machine, so it was sitting there mostly
	idle, basically only running its periodic cron jobs.

	Two days ago we had an unexpected power outage which of course
	took out the power to this machine since it is not connected to
	a UPS.  One would have thought or hoped that all the dirty bits
	from the CVS checkout operation would have hit the platters on
	the disk drive after nearly two weeks idle time, but, no, that
	did not appear to be the case.

	When the machine came back up again, fsck found a multitude of
	problems in the /usr file system, some examples:

	BAD TYPE VALUE  I=209247  OWNER=he MODE=100644
	SIZE=2472 MTIME=Sep 20 14:38 2005 
	FILE=/src/contrib

	UNALLOCATED  I=214598  OWNER=he MODE=0
	SIZE=0 MTIME=Sep 20 14:37 2005 
	NAME=/src/include/CVS/Entries

	I count a total of 5657 "UNALLOCATED" and 1335 "BAD TYPE VALUE"
	errors in my "fsck -y" output.

	Clearly, something which was supposed to happen ("trickle sync?")
	did not.

	
>How-To-Repeat:
	Mount a file system with softdep, make lots of updates, such
	as checking out the NetBSD source from CVS, leave machine idle
	for a few days, and *then* yank it's power.  Watch near-
	unending lossage related to the bits you checked out from CVS
	more than 24 hours ago when the machine is brought up again.

>Fix:
	Sorry, don't know.