NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38673: race condition in block device handling on w/o fast softints



The following reply was made to PR kern/38673; it has been noted by GNATS.

From: Andrew Doran <ad%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/38673: race condition in block device handling on w/o fast 
softints
Date: Sat, 17 May 2008 13:39:50 +0100

 On Fri, May 16, 2008 at 08:55:00AM +0000, martin%duskware.de@localhost wrote:
 
 > >Synopsis:       race condition in block device handling on w/o fast softints
 
 I think it's not related to the soft interrupt.
 
 > panic: sdstart(): dequeued wrong buf
 > Stopped in pid 0.4 (system) at  netbsd:cpu_Debugger+0x4:        nop
 > db{0}> bt
 > sdstart(3a44800, 3a09200, 5bf, 11f04a0, 1814000, 3a89e00) at 
 > netbsd:sdstart+0x37
 > 4
 > scsipi_put_xs(3a44800, 0, 14396d0, 1198ae0, 7, ff898000) at 
 > netbsd:scsipi_put_xs
 > +0xe0
 > scsipi_complete(3a09200, 0, 46, 3a07ea0, c1, b) at 
 > netbsd:scsipi_complete+0x1c4
 > scsipi_done(3a40890, 3a07b30, ff888000, ff800000, ffffe000, 
 > 900040007ace8012) at
 >  netbsd:scsipi_done+0x1a4
 > ncr53c9x_done(3a40800, 3a07b30, 44, 0, 10000, 1) at netbsd:ncr53c9x_done+0xf8
 > ncr53c9x_intr(3a07b30, 0, e0017ed0, 10000, 1037600, 101) at 
 > netbsd:ncr53c9x_intr
 > +0x9a4
 > sparc_interrupt(0, 7, 0, 0, 1814000, 3fff) at netbsd:sparc_interrupt+0x23c
 > _kernel_lock(10575, 0, fffffff, 146f800, d04f2c0, 1) at 
 > netbsd:_kernel_lock+0x11
 > 4
 > biodone2(1, 11f02a0, 5fd, 11f0490, 1814000, d04fa90) at netbsd:biodone2+0x6c
 > biointr(0, d047ec0, 3, 6, 1814000, 180c000) at netbsd:biointr+0xa4
 > softint_thread(d68e008, d04f2c0, 11e9800, 11e9800, 11e9400, 11d6800) at 
 > netbsd:s
 > oftint_thread+0xd0
 > lwp_trampoline(f005eaf0, 111400, fffb1e28, 110418, fffb1df8, 1) at 
 > netbsd:lwp_tr
 > ampoline+0x8
 > db{0}> mach cpu 1
 > db{1}> bt
 > physio(0, 0, 1108, 100000, f, eb37bf0) at netbsd:physio+0x2f4
 > cdev_read(6, eb37bf0, 0, 11ec000, de98000, 4030dbf0) at netbsd:cdev_read+0x60
 > spec_read(eb37a48, 1166bf0, 11d0400, ea36fa0, de98000, 1) at 
 > netbsd:spec_read+0x
 > 1e0
 > nfsspec_read(eb37a48, 10001, badcafe, 146f800, ea36fa0, 1) at 
 > netbsd:nfsspec_rea
 > d+0x38
 > VOP_READ(e86e720, eb37bf0, 0, d04bd40, badcafe, badcafe) at 
 > netbsd:VOP_READ+0x40
 > 
 > vn_read(e7c8180, e7c8180, eb37bf0, d04bd40, 1, 11ea000) at 
 > netbsd:vn_read+0x88
 > dofileread(16, e7c8180, 40a00000, 100000, 3, 1) at netbsd:dofileread+0x60
 > sys_read(3, eb37dc0, eb37e00, badcafe, badcafe, badcafe) at 
 > netbsd:sys_read+0x60
 > 
 > syscall_plain(eb37ed0, 3, 4073c5a4, 166, 4073c5a4, 800) at 
 > netbsd:syscall_plain+
 > 0x11c
 > ?(3, 40a00000, 100000, 20, 0, 4030dbf0) at 0x10092fc
 
 It seems that cpu1 should be holding kernel_lock because it's working on an
 NFS vnode. You can verify that by digging 'vp' out of the arguments to
 VOP_READ or vn_read, and then running 'show vnode' on it. If VV_MPSAFE is
 clear, kernel_lock will have been taken. Or you can do 'show lock
 kernel_lock' if a LOCKDEBUG kernel.
 
 It also looks like cpu0 was waiting to acquire kernel_lock when the
 interrupt occurred. kernel_lock should be acquired for ncr53c9x_intr(), but
 I don't see intr_biglock_wrapper() in the backtrace. It would be useful to
 verify which CPU holds the lock, and to verify that ncr53c9x_intr() is
 actually occuring at IPL_VM.
 
 It's difficult to tell what is going on, because the backtraces from all
 CPUs but the one that panicked are always from some point after the event.
 
 Andrew
 


Home | Main Index | Thread Index | Old Index