NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/38673: race condition in block device handling on w/o fast softints
The following reply was made to PR kern/38673; it has been noted by GNATS.
From: Andrew Doran <ad%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/38673: race condition in block device handling on w/o fast
softints
Date: Sat, 17 May 2008 13:39:50 +0100
On Fri, May 16, 2008 at 08:55:00AM +0000, martin%duskware.de@localhost wrote:
> >Synopsis: race condition in block device handling on w/o fast softints
I think it's not related to the soft interrupt.
> panic: sdstart(): dequeued wrong buf
> Stopped in pid 0.4 (system) at netbsd:cpu_Debugger+0x4: nop
> db{0}> bt
> sdstart(3a44800, 3a09200, 5bf, 11f04a0, 1814000, 3a89e00) at
> netbsd:sdstart+0x37
> 4
> scsipi_put_xs(3a44800, 0, 14396d0, 1198ae0, 7, ff898000) at
> netbsd:scsipi_put_xs
> +0xe0
> scsipi_complete(3a09200, 0, 46, 3a07ea0, c1, b) at
> netbsd:scsipi_complete+0x1c4
> scsipi_done(3a40890, 3a07b30, ff888000, ff800000, ffffe000,
> 900040007ace8012) at
> netbsd:scsipi_done+0x1a4
> ncr53c9x_done(3a40800, 3a07b30, 44, 0, 10000, 1) at netbsd:ncr53c9x_done+0xf8
> ncr53c9x_intr(3a07b30, 0, e0017ed0, 10000, 1037600, 101) at
> netbsd:ncr53c9x_intr
> +0x9a4
> sparc_interrupt(0, 7, 0, 0, 1814000, 3fff) at netbsd:sparc_interrupt+0x23c
> _kernel_lock(10575, 0, fffffff, 146f800, d04f2c0, 1) at
> netbsd:_kernel_lock+0x11
> 4
> biodone2(1, 11f02a0, 5fd, 11f0490, 1814000, d04fa90) at netbsd:biodone2+0x6c
> biointr(0, d047ec0, 3, 6, 1814000, 180c000) at netbsd:biointr+0xa4
> softint_thread(d68e008, d04f2c0, 11e9800, 11e9800, 11e9400, 11d6800) at
> netbsd:s
> oftint_thread+0xd0
> lwp_trampoline(f005eaf0, 111400, fffb1e28, 110418, fffb1df8, 1) at
> netbsd:lwp_tr
> ampoline+0x8
> db{0}> mach cpu 1
> db{1}> bt
> physio(0, 0, 1108, 100000, f, eb37bf0) at netbsd:physio+0x2f4
> cdev_read(6, eb37bf0, 0, 11ec000, de98000, 4030dbf0) at netbsd:cdev_read+0x60
> spec_read(eb37a48, 1166bf0, 11d0400, ea36fa0, de98000, 1) at
> netbsd:spec_read+0x
> 1e0
> nfsspec_read(eb37a48, 10001, badcafe, 146f800, ea36fa0, 1) at
> netbsd:nfsspec_rea
> d+0x38
> VOP_READ(e86e720, eb37bf0, 0, d04bd40, badcafe, badcafe) at
> netbsd:VOP_READ+0x40
>
> vn_read(e7c8180, e7c8180, eb37bf0, d04bd40, 1, 11ea000) at
> netbsd:vn_read+0x88
> dofileread(16, e7c8180, 40a00000, 100000, 3, 1) at netbsd:dofileread+0x60
> sys_read(3, eb37dc0, eb37e00, badcafe, badcafe, badcafe) at
> netbsd:sys_read+0x60
>
> syscall_plain(eb37ed0, 3, 4073c5a4, 166, 4073c5a4, 800) at
> netbsd:syscall_plain+
> 0x11c
> ?(3, 40a00000, 100000, 20, 0, 4030dbf0) at 0x10092fc
It seems that cpu1 should be holding kernel_lock because it's working on an
NFS vnode. You can verify that by digging 'vp' out of the arguments to
VOP_READ or vn_read, and then running 'show vnode' on it. If VV_MPSAFE is
clear, kernel_lock will have been taken. Or you can do 'show lock
kernel_lock' if a LOCKDEBUG kernel.
It also looks like cpu0 was waiting to acquire kernel_lock when the
interrupt occurred. kernel_lock should be acquired for ncr53c9x_intr(), but
I don't see intr_biglock_wrapper() in the backtrace. It would be useful to
verify which CPU holds the lock, and to verify that ncr53c9x_intr() is
actually occuring at IPL_VM.
It's difficult to tell what is going on, because the backtraces from all
CPUs but the one that panicked are always from some point after the event.
Andrew
Home |
Main Index |
Thread Index |
Old Index