tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

fsck -x fabricating inconsistencies?



[I know fsck is a userland program, but it's heavily tied to the kernel and 
the problem may be with snapshots, not fsck.]

After an unidentified (black sreen, completely unresponsive) crash of our file 
server (losing 2300 days of uptime), which came back up with no apparent 
issues, I decided to play safe (or so I thought) and run fsck -f -n -x /var/db 
on the NFS-exported (FFSv2, WAPBL, quota2) file systems.

The smaller ones showed no issues, but the two larger ones seemed to exhibit 
serious inconsistencies (unallocated/partly allocated inode, directory contains 
empty blocks, pass5: bad magic number, quota id in wrong hash list, link count 
0 should be -1, etc.) (add CAPS LOCK for the real fsck messages, of course); 
around 10k to 15k problems per fs.

I mentally prepared to newfs and restore from a backup, but running a "real" 
fsck (in single user mode, after unmounting) showed ZERO issues (on one fs, 
two trial ones on the other).

The only sightly unusual thing about those file systems is using quota2 and 
being null mounted elsewhere (/emul/linux32 for running TSM), but these 
properties are shared between the "good" and "bad" ones.

This looks like a bug in fss, but we also use snapshots for backups (the null 
mounts described above are just a safety net against snapshotting failure 
ruining the backup) on a daily basis for years.

I have to admit that's on NetBSD-6, but I'm not aware of any fixes in the 
fss area.


Home | Main Index | Thread Index | Old Index