port-alpha: panic: bad dir

Subject: panic: bad dir
To: None <port-alpha@NetBSD.ORG>
From: Martin Grossman <grossman@BBN.COM>
List: port-alpha
Date: 05/27/1998 13:28:44
We are getting alot of these panics on 1 system, and a few on other systems.

All systems are exactly the same except the user load!
All are PC164 DEC motherboards with 512MB mem and a NCR scsi to a
10GB (WIDE) disk.

It seams to happen more often when high user load, and high NFS (client)
traffic.

It has happened on both local and NFS directories.

OUTPUT on console (and in /var/log/messages) (and in kernel dumps)

1) First bad
2) /usr: bad dir ino 7772 at offset 0: mangled entry
3) panic: bad dir

#1 is comming from ufs_lookup.c  ufs_dirbadentry() because ep->d_reclen
   isnot a multiple of 4
#2 is comming from ufs_lookup.c ufs_dirbad().
   a) I've seen "/", "/var", "/usr", and "/nfs/XXX/u1"   (first 3 are UFS)
   b) various inodes (7772 is 4 levels deep below /usr)
   c) its always at offset 0

>From running gdb -k /netbsd.1 /netbsd.1.core

I've figured out this much so far.....

1) we are in ufs_lookup() from an access() call
(ie backtrace is syscall,sys_access,namei,lookup,ufs_lookup,ufs_dirbad,panic)

2) 8 lines after label searchloop: in call to
VOP_BLKATOFF(vdp,dp->i_offset,NULL,&bp)
   I do a print *vdp (vnode) and everything looks right
   dp->i_offset is zero (which is fine)
   I do a print *dp (inode) and everything looks right
   I do a print *bp (buf) and everything looks right
   I do a print *ep (dirent) (ie bp->b_data) and its nothing like a
directory entry!

	It should contain an inode #, reclen, type, namelen, and a name

			BUT

	it contains	0x464c457f
			0x00010102
			0x00000000
			0x00000000
			0x90260002
			0x00000001
			0x00230000
			0xfffffc00
			0x00000040
			0x00000000

This is the beginning of some ELF executable file!!!!!

Is there any known bug (fixed or not) in or around the disk buffer cache?

PS We are running NetBSD 1.2G (November 1997).