Subject: NetBSD 1.6 softdep crash
To: None <netbsd-users@netbsd.org>
From: Piotr Stolc <socrtp@soclab.eu.org>
List: netbsd-users
Date: 01/20/2004 18:07:33
Hi all,
I've got NetBSD working on i386 machine. At Sep 2003 I've upgraded it to
NetBSD 1.6. It worked OK till last Friday when CPU fan failed and the system
was rebooting and crashing again after going to multiuser mode. I changed
the fan, but it didn't help, then I observed that it crashes after starting
postfix, with following error: panic: softdep_fsync: pending ops
Only /usr and /home were mounted with softdep, I've changed kernel from
1.6.2_RC1 to RC3, but it didn't help. I disabled softdeps for these
filesystem and the system is now working correctly.
So...What is going on? Did overheated CPU break my hardware? Then why it
affects only softdeps? Currently I don't have another i386 box to check
system with softdeps enabled.

Here is a backtrace from kernel's core:

$ gdb netbsd.26 --readnow
GNU gdb 5.0nb1
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...(no debugging symbols
found)...
(gdb) target kcore netbsd.26.core
panic: softdep_fsync: pending ops
#0  0x1 in ?? ()
(gdb) bt
#0  0x1 in ?? ()
#1  0xc01dc91b in cpu_reboot ()
#2  0xc014e72a in panic ()
#3  0xc011dfe5 in acquire_lock ()
#4  0xc01237c5 in softdep_fsync_mountdev ()
#5  0xc01282e4 in ffs_full_fsync ()
#6  0xc01280b8 in ffs_fsync ()
#7  0xc016fa2c in VOP_FSYNC ()
#8  0xc0126b0e in ffs_sync ()
#9  0xc016b51a in sys_sync ()
#10 0xc016a53a in vfs_shutdown ()
#11 0xc01dc8f3 in cpu_reboot ()
#12 0xc014e72a in panic ()
#13 0xc0123530 in softdep_fsync ()
#14 0xc016e2dd in sys_fsync ()
#15 0xc01e2b27 in syscall_plain ()
#16 0xc0100c30 in syscall1 ()
can not access 0xbfbfda84, invalid translation (invalid PDE)
can not access 0xbfbfda84, invalid translation (invalid PDE)
Cannot access memory at address 0xbfbfda84
(gdb)


-- 

s.