Subject: Re: Panics with softdep+nfsd
To: None <current-users@netbsd.org>
From: Paulo Alexandre Pinto Pires <p@ppires.org>
List: current-users
Date: 01/06/2001 17:46:50
On Wed, Jan 03, 2001 at 09:05:48PM -0700, Rick Kelly wrote:
> Paulo Alexandre Pinto Pires said:
> 
> >The system is 1.5P, with sources of 2000/12/24.  I systematically get
> >kernel panics (as shown below) triggered by nfsd.  Usually, this happens
> >when the NFS client is running Netscape Mail, during reindexing or
> >compacting of a message folder.
> 
> I did see something strange with 1.5O or 1.5P on i386.
> 
> No softdep or NFS involved.
> 
> Netscape started up okay, but if I went into Netscape mail/news,
> the window would never come up and the disks would thrash forever.
> 
> Whatever the problem was, it was fixed in a later update to 1.5P,
> and everything seems to work in 1.5Q.

Then, unfortunately, we are talking about different problems.  Even
though I upgraded the NFS server machine to 1.5Q, as of 2001/01/06,
it still panics, as I mentioned in my previous message, with messages
from softdep_write_inodeblock, triggered by some operation in nfsd.
The only difference from the behaviour of 1.5P is that I got an extra
message a few minutes _before_ the panic:

	ffs_fsync: dirty: tag 1 type VREG, usecount 1, writecount 0, refcount 1,
		tag VT_UFS, ino 892802, on dev 0, 8 flags 0x0, effnlink 1, nlink 1
		mode 0100644, owner 1001, group 256, size 196608 lock type vnlock: EXCL (count 1) by pid 132

At that moment, I was running Netscape Messenger in a 1.5P (2000/12/24)
NFS client machine.  Then, after a few (two or three) folder switches,
I evenually got the panic message.

	panic: softdep_write_inodeblock: direct pointer #7 mismatch 0 != 3569544
	Stopped in pid 132 (nfsd) at    cpu_Debugger+0x4:       leave
	db> continue
	syncing disks... panic: lockmgr: locking against myself

And after reboot, I got this from gdb:

	pappires@domine:/var/crash [9]: gdb netbsd.4
	GNU gdb 4.17
	Copyright 1998 Free Software Foundation, Inc.
	GDB is free software, covered by the GNU General Public License, and you are
	welcome to change it and/or distribute copies of it under certain conditions.
	Type "show copying" to see the conditions.
	There is absolutely no warranty for GDB.  Type "show warranty" for details.
	This GDB was configured as "i386--netbsd"...(no debugging symbols found)...
	(gdb) target kcore netbsd.4.core
	panic: %s: direct pointer #%d mismatch %d != %d
	#0  0x104 in ?? ()
	(gdb) backtrace
	#0  0x104 in ?? ()
	#1  0xc01f149f in cpu_reboot ()
	#2  0xc0133671 in panic ()
	#3  0xc0124e3e in lockmgr ()
	#4  0xc01544ec in genfs_lock ()
	#5  0xc01521ef in VOP_LOCK ()
	#6  0xc0151a46 in vn_lock ()
	#7  0xc014bcb2 in vget ()
	#8  0xc01d3d10 in ffs_sync ()
	#9  0xc014df86 in sys_sync ()
	#10 0xc014cfd8 in vfs_shutdown ()
	#11 0xc01f1477 in cpu_reboot ()
	#12 0xc0133671 in panic ()
	#13 0xc01cf9db in initiate_write_inodeblock ()
	#14 0xc01cf677 in softdep_disk_io_initiation ()
	#15 0xc015a3ae in spec_strategy ()
	#16 0xc01526ec in VOP_STRATEGY ()
	#17 0xc0146c0c in bwrite ()
	#18 0xc01cb55c in ffs_update ()
	#19 0xc015255c in VOP_UPDATE ()
	#20 0xc01cbc86 in ffs_truncate ()
	#21 0xc015251a in VOP_TRUNCATE ()
	#22 0xc01d9221 in ufs_setattr ()
	#23 0xc0151cb0 in VOP_SETATTR ()
	#24 0xc0196e65 in nfsrv_setattr ()
	#25 0xc01b21df in nfssvc_nfsd ()
	#26 0xc01b195f in sys_nfssvc ()
	#27 0xc01f57dd in syscall_plain ()
	#28 0xc0100d66 in syscall1 ()
	can not access 0xbfbfdce8, invalid translation (invalid PDE)
	can not access 0xbfbfdce8, invalid translation (invalid PDE)
	Cannot access memory at address 0xbfbfdce8.

-- 
	Pappires

... Qui habet aurem audiat quid Spiritus dicat ecclesiis.