odd filesystem temporary corruption: dir appears empty (w/o . ..)

To: current-users%netbsd.org@localhost
Subject: odd filesystem temporary corruption: dir appears empty (w/o . ..)
From: Greg Troxel <gdt%ir.bbn.com@localhost>
Date: Wed, 12 Jun 2013 08:00:48 -0400

I have a system which is pretty normal

  i386 (not amd64)
  NetBSD 6 from the last month or two
  intel DH67CL mobo, Core i5-2310 (4 cpu)
  8G RAM, of which 3569MB shows as available (since I haven't switched
    to amd64)

  DIAGNOSTIC, FFS_EI, -g, IPSEC/ESP/NAT, MROUTING (but not doing that)
  swwdog, ulpt commented out, using coda

  the system has not been running X lately

  OWC SSD for / (ffsv1, v2 superlock, no wapbl) /var /usr (ffsv2 wapbl)

  Seagate 1 TBish for /u0, 1 FS, FFSv2, WAPBL

This has crashed a few times, almost always under heavy checkout/build
load.  I am unclear on if it's because of the power supply being
stressed (which it shouldn't be) or because of the filesystem issue I'm
writing about.  When it crashes it panics in ffs code and fails to dump.

I have NetBSD-current checked out (and -5 and -6 trees), and often do
release builds for multiple architectures.

I did an update, on a tree which previously had updated ok, and got a
complaint about not being able to remove a directory (gdb6 in this
case).  I found a CVS directory which seemed to have nothing in it, as
in "ls -la" showed no files, not even '.' and '..'.  So to make progress
I moved it to /u0/lost+found, and then I was able to remove the
directories for the now-gone gdb bits.

Then, I went to /u0/lost+found to look at CVS.   Now, it had the usual
Entries/Root/Repository/Template, plus . and .., quite normal.   I
removed the 4 files, and then rmdir'd CVS.


So I wonder what's going on.  It seems like the read from the disk of
the blocks for the directory somehow ended up with an in-core
representation of the directory as being empty, but the actual disk
contents were ok.  Moving the directory shouldn't have changed its
inode #, so I wonder why the data got reread (perhaps due to maxvnodes
or 104488 and doing a cvs update on -current).

Might this be a locking bug surrounding vnode eviction?  Hardware
flakiness?  Using memory too close to the 4G limit, more than should be
used, due to something in the system messing with it?

Attachment: pgpYpaC5QA5LW.pgp
Description: PGP signature

Follow-Ups:
- Re: odd filesystem temporary corruption: dir appears empty (w/o . ..)
  - From: David Holland
- Re: odd filesystem temporary corruption: dir appears empty (w/o . ..)
  - From: Rhialto

Prev by Date: Re: new if_bge/mii panic
Next by Date: Re: odd filesystem temporary corruption: dir appears empty (w/o . ..)
Previous by Thread: daily CVS update output
Next by Thread: Re: odd filesystem temporary corruption: dir appears empty (w/o . ..)
Indexes:

Home | Main Index | Thread Index | Old Index