NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/40361: WAPBL issues with ffs_fsync in -current



>Number:         40361
>Category:       kern
>Synopsis:       WAPBL locking panic in -current
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 10 20:55:00 +0000 2009
>Originator:     Greg Oster
>Release:        NetBSD 5.99.6
>Organization:
>Environment:
System: NetBSD cow 5.99.6 NetBSD 5.99.6 (XEN3_DOMU.RF) #0: Sat Jan 10 12:37:48 
CST 2009  
oster@quad:/u1/builds/build120/src/obj/amd64/u1/builds/build120/src/sys/arch/amd64/compile/XEN3_DOMU.RF
 amd64
and
System: NetBSD thog 5.99.6 NetBSD 5.99.6 (RAIDFRAME.ddb) #0: Fri Jan  9 
21:25:28 CST 2009  
oster@quad:/u1/devel/current/src/sys/arch/i386/compile/RAIDFRAME.ddb i386

Architecture: i386/amd64/xen
Machine: i386/amd64/xen
>Description:

With a DIAGNOSTIC kernel and a non-logging /, create and mount a 
WAPBL-enabled filesystem.  Touch a file on said filesystem, and 
call 'sync'.  Observe the following panic:

panic: kernel diagnostic assertion "rw_write_held(&wl->wl_rwlock)" failed: file 
"/u1/builds/build120/src/sys/kern/vfs_wapbl.c", line 1580
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80137d7d cs e030 rflags 246 cr2  7f7ffdb55430 
cpl 0 rsp ffffa00046f2f670
Stopped in pid 406.1 (sync) at  netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
cpu_Debugger() at netbsd:cpu_Debugger+0x9
panic() at netbsd:panic+0x279
__kernassert() at netbsd:__kernassert+0x46
wapbl_jlock_assert() at netbsd:wapbl_jlock_assert+0x45
wapbl_add_buf() at netbsd:wapbl_add_buf+0x8a
bdwrite() at netbsd:bdwrite+0x18c
bwrite() at netbsd:bwrite+0x14d
vn_bwrite() at netbsd:vn_bwrite+0x21
VOP_BWRITE() at netbsd:VOP_BWRITE+0x60
bawrite() at netbsd:bawrite+0x61
ffs_full_fsync() at netbsd:ffs_full_fsync+0x4dc
ffs_fsync() at netbsd:ffs_fsync+0x92
VOP_FSYNC() at netbsd:VOP_FSYNC+0x7f
ffs_sync() at netbsd:ffs_sync+0x2c2
VFS_SYNC() at netbsd:VFS_SYNC+0x2c
sys_sync() at netbsd:sys_sync+0xab
sy_call() at 0xffffffff803fe659
syscall() at netbsd:syscall+0x1ba
ds          0xf650
es          0xdc08
fs          0x82
gs          0x1e84
rdi         0
rsi         0xd
rbp         0xffffa00046f2f670
rbx         0x500d50
rdx         0xffffffff80a25000
rcx         0xffffffff80a11000
rax         0x1
r8          0xffffffff806b3dc0  cpu_info_primary
r9          0x7f7fffffffe0
r10         0xffffa00046f2f5c0
r11         0xe033
r12         0x7f7fffffffe0
r13         0x7f7ffdffa000
r14         0xffffa00046f197c0
r15         0xffffa00046ebe010
rip         0xffffffff80137d7d  breakpoint+0x5
cs          0xe030
rflags      0x246
rsp         0xffffa00046f2f670
ss          0xe02b
netbsd:breakpoint+0x5:  leave
db> tr
breakpoint() at netbsd:breakpoint+0x5
cpu_Debugger() at netbsd:cpu_Debugger+0x9
panic() at netbsd:panic+0x279
__kernassert() at netbsd:__kernassert+0x46
wapbl_jlock_assert() at netbsd:wapbl_jlock_assert+0x45
wapbl_add_buf() at netbsd:wapbl_add_buf+0x8a
bdwrite() at netbsd:bdwrite+0x18c
bwrite() at netbsd:bwrite+0x14d
vn_bwrite() at netbsd:vn_bwrite+0x21
VOP_BWRITE() at netbsd:VOP_BWRITE+0x60
bawrite() at netbsd:bawrite+0x61
ffs_full_fsync() at netbsd:ffs_full_fsync+0x4dc
ffs_fsync() at netbsd:ffs_fsync+0x92
VOP_FSYNC() at netbsd:VOP_FSYNC+0x7f
ffs_sync() at netbsd:ffs_sync+0x2c2
VFS_SYNC() at netbsd:VFS_SYNC+0x2c
sys_sync() at netbsd:sys_sync+0xab
sy_call() at 0xffffffff803fe659
syscall() at netbsd:syscall+0x1ba
db> 

I see this both in a -current amd64 XEN3 DOMU, and in a -current i386
native install.

In ffs_full_fsync we are in the "this isn't a filesystem with
logging" mode, yet when we get to bdwrite() it somehow thinks the vp
belongs to a filesystem with logging.  At that point we call
wapbl_add_buf(), do a DIAGNOSTIC check to make sure that the locks are
fine, and then keel over.

>How-To-Repeat:

 1) Have / be a non-logging filesystem.
 2) Have /mnt be a logging filesystem  (e.g. 'mount -o log /dev/wd1g /mnt')
 3) As root: cd /mnt ; touch foo ; sync
 4) *boom*

Note that if / is a logging filesystem as well, then there are no
issues with /mnt in that case.

>Fix:

Unknown.  I note that 5.0_BETA does not have this issue.
Additional details available on request, as I can easily replicate
this panic.




Home | Main Index | Thread Index | Old Index