kern/41523: Dirty filesystem causes mutex error

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/41523: Dirty filesystem causes mutex error
From: martti.kuparinen%iki.fi@localhost
Date: Tue, 2 Jun 2009 09:45:00 +0000 (UTC)

>Number:         41523
>Category:       kern
>Synopsis:       Dirty filesystem causes mutex error
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 02 09:45:00 +0000 2009
>Originator:     Martti Kuparinen
>Release:        NetBSD 5.99.x
>Organization:
>Environment:
>Description:

I updated and rebuilt the whole i386 distribution and rebooted my NetBSD
domU with the new 5.99.13 kernel. This is the panic I'm getting:

NetBSD 5.99.13 (XEN3PAE_DOMU) #0: Tue Jun  2 10:37:27 EEST 2009
...
root file system type: ffs
init: copying out path `/sbin/init' 11
Tue Jun  2 11:05:05 EEST 2009
Starting root file system check:
/dev/rxbd0a: 206770 files, 2260464 used, 2822439 free (9631 frags, 351601 
blocks, 0.2% fragmentation)
/dev/rxbd0a: MARKING FILE SYSTEM CLEAN
Mutex error: mutex_vector_enter: locking against myself

lock address : 0x00000000ca235e2c
current cpu  :                  0
current lwp  : 0x00000000c9e14060
owner field  : 0x00000000c9e14060 wait/spin:                0/0

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c0129e74 cs 9 eflags 246 cr2 805ecf8 ilevel 0
Stopped in pid 10.1 (fsck_ffs) at       netbsd:breakpoint+0x4:  popl    %ebp
db> bt
breakpoint(c0427e67,cad1e9d8,c045f080,c03215ba,c0427e67,1,0,0,cad1e9d8,c9e14060)
 at netbsd:breakpoint+0x4
panic(c0449139,c043704f,c04110a2,c04515a7,ca235e2c,0,c9e14060,c0166e2f,0,c9e1406
0) at netbsd:panic+0x1be
lockdebug_abort(ca235e2c,c04672fc,c04110a2,c04515a7,cad1ea20,cad1eb60,cad1ea4c,c
020945d,ca235e2c,c04110a2) at netbsd:lockdebug_abort+0x2d
mutex_abort(ca235e2c,c04110a2,c04515a7,c03c1335,c9e1cc38,c9e14060,0,0,c9e1cc38,c
ad1eb60) at netbsd:mutex_abort+0x34
mutex_vector_enter(ca235e2c,0,cad1eb3c,0,0,0,38,0,26,ca235e2c) at netbsd:mutex_v
ector_enter+0x1ad
mountd_set_exports_list(cad1eb60,c9e14060,cad1eb8c,805ecf8,1,ca1b8bc0,cad1eb8c,c
03b8fda,ca23552c,805ecf8) at netbsd:mountd_set_exports_list+0x111
nfs_export_update_30(ca23552c,805ecf8,ca1b8bbc,c9e1cc38,16,ca23552c,cad1ecac,c03
c6d45,ca23552c,805ecf8) at netbsd:nfs_export_update_30+0x41
vfs_hooks_reexport(ca23552c,805ecf8,ca1b8bbc,cad1ecd0,55001,ca1b8bbc,cad1ebec,c0
1643dd,11e040c,ca1b8bbc) at netbsd:vfs_hooks_reexport+0x4a
do_sys_mount(c9e14060,0,805ecfa,805ecf8,0,bf7fe4d4,0,4,cad1ed28,c0465d3c) at net
bsd:do_sys_mount+0x3e5
sys___mount50(c9e14060,cad1ed00,cad1ed28,bb73f000,c9e1ab98,c9e15594,19a,805ecfa,
805ecf8,55001) at netbsd:sys___mount50+0x49
syscall(cad1ed48,1f,1f,1f,1f,1,1,bf7fe4e8,0,55001) at netbsd:syscall+0xce


Next I ran "xm destroy xxx" on my dom0 and started the domU again as I
thought it would have clean filesystem (see the "MARKING FILE SYSTEM CLEAN"
line above). And it had, now I was able to boot normally.


root file system type: ffs
init: copying out path `/sbin/init' 11
Tue Jun  2 11:07:50 EEST 2009
Starting root file system check:
/dev/rxbd0a: file system is clean; not checking
swapctl: adding /dev/xbd0b as swap device at priority 0
Starting file system checks:
/dev/rxbd0a: file system is mounted read-write on /; not checking
/dev/rxbd0e: file system is journaled; not checking
Setting tty flags.


My fstab looks this

/dev/xbd0a      /               ffs     rw              1 1
/dev/xbd0b      none            swap    sw              0 0
/dev/xbd0e      /home           ffs     rw,log          1 2
tmpfs           /tmp            tmpfs   rw              0 0
kernfs          /kern           kernfs  rw              0 0
ptyfs           /dev/pts        ptyfs   rw              0 0
procfs          /proc           procfs  rw              0 0
/dev/xbd1a      /cdrom          cd9660  ro,noauto       0 0


The problem noticed by first running "halt -p" which causes

panic: kernel diagnostic assertion "!fd_isused(fdp, fd)" failed: file 
"/home/src/sys/kern/kern_descrip.c", line 175
Begin traceback...
uvm_fault(0xc9e1a350, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0136121 cs 9 eflags 10246 cr2 6 ilevel 0
panic: trap
Faulted in mid-traceback; aborting...
dumping to dev 142,1 offset 534615
dump device bad

which causes dirty filesystem, which causes this panic, which in turn makes
it quite difficult to reboot a host e.g. after upgrades...

>How-To-Repeat:

Simply run "halt -pÃ" to trigger the assert. Now you have a dirty filesystem.
Power off and start again and see the mutex error after fsck fixes the
filesystem.

>Fix:

Prev by Date: Re: bin/41497: diff against cp(1) to copy sparse files
Next by Date: Re: bin/41230 (-current: sh(1) endlessly looping in interactive mode)
Previous by Thread: port-i386/41522: pci devices left turned off
Next by Thread: Re: bin/41230 (-current: sh(1) endlessly looping in interactive mode)
Indexes:

Home | Main Index | Thread Index | Old Index