Subject: Hard drive corruption b/c of altq?
To: None <tech-net@netbsd.org>
From: Dobromir Montauk <dmontauk@rescomp.berkeley.edu>
List: tech-net
Date: 07/07/2003 16:53:12
We've deployed NetBSD 1.6 on half a dozen servers.  We have the kernel
compiled with all the ALTQ & IPSEC tags enabled on half the servers - the
other half don't have ALTQ.  Now, we're having a strange problem with the
ALTQ servers - they seem to corrupt their own hard drives.  All the
servers report hard drive problems, and if we fsck them, we get this:

** /dev/rwd0g
BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST
ALTERNATE

LOOK FOR ALTERNATE SUPERBLOCKS? [yn]

running fsck /dev/rwd0(a,e,f,g) or fsck_ffs -b 32 /dev/wd0(a,e,f,g) will
fix it for that sessions but as soon as they are rebooted, boot fails because of
the same error.

If we boot with the old, ALTQ-less kernel, it doesn't detect the errors at
all!  It boots & works fine.

Now, this is a pretty weird problem, so I'm not sure how to go about
debugging it.  It's pretty strange that one kernel sees no errors, while
the other does.

Any insight into this?

Thanks a lot,

Dobromir Montauk
System Administrator
Office of Residential Computing, UC Berkeley