Subject: kern/31544: The ffs softdep code appears to fail to write dirty bits to disk
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <he@uninett.no>
List: netbsd-bugs
Date: 10/10/2005 13:09:00
>Number: 31544
>Category: kern
>Synopsis: The ffs softdep code appears to fail to write dirty bits to disk
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Oct 10 13:09:00 +0000 2005
>Originator: Havard Eidnes
>Release: NetBSD 3.99.9 19 Sep 2005
>Organization:
UNINETT AS
>Environment:
System: NetBSD bean.urc.uninett.no 3.99.9 NetBSD 3.99.9 (GENERIC) #64: Mon Sep 19 13:49:02 CEST 2005 he@quattro.urc.uninett.no:/u/build/HEAD/obj/i386/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Around the 20th last month I installed NetBSD 3.99.9 as
described above on an IBM x336 system with a 3.2G Xeon, 1GB
memory and a single SCSI disk drive. I partitioned the drive
with one large /usr ffs file system covering most of the disk,
mounted softdep, and the same day I checked out src, pkgsrc
and xsrc from a CVS mirror.
For various reasons I did not get around to doing what I had
planned to do with the machine, so it was sitting there mostly
idle, basically only running its periodic cron jobs.
Two days ago we had an unexpected power outage which of course
took out the power to this machine since it is not connected to
a UPS. One would have thought or hoped that all the dirty bits
from the CVS checkout operation would have hit the platters on
the disk drive after nearly two weeks idle time, but, no, that
did not appear to be the case.
When the machine came back up again, fsck found a multitude of
problems in the /usr file system, some examples:
BAD TYPE VALUE I=209247 OWNER=he MODE=100644
SIZE=2472 MTIME=Sep 20 14:38 2005
FILE=/src/contrib
UNALLOCATED I=214598 OWNER=he MODE=0
SIZE=0 MTIME=Sep 20 14:37 2005
NAME=/src/include/CVS/Entries
I count a total of 5657 "UNALLOCATED" and 1335 "BAD TYPE VALUE"
errors in my "fsck -y" output.
Clearly, something which was supposed to happen ("trickle sync?")
did not.
>How-To-Repeat:
Mount a file system with softdep, make lots of updates, such
as checking out the NetBSD source from CVS, leave machine idle
for a few days, and *then* yank it's power. Watch near-
unending lossage related to the bits you checked out from CVS
more than 24 hours ago when the machine is brought up again.
>Fix:
Sorry, don't know.