tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
getiobuf(x, false) can sleep ?
Hi,
A server with a large wedge device started panicing under high I/O load
(and I guess memory pressure) on a KASSERT(!ISSET(bp->b_oflags, BO_DONE))
in biodone(). The stack trace was always:
db{0}> tr
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x24d
__kernassert() at netbsd:__kernassert+0x2d
biodone() at netbsd:biodone+0xc4
dkiodone() at netbsd:dkiodone+0xa3
biodone2() at netbsd:biodone2+0x95
biointr() at netbsd:biointr+0x3c
so I started suspecting buffer list corruption at the dkwedge level.
I added instrumentation to check
- that buffer queue manipulations are done at splbio() with the
kernel_lock held in dkstart() and dkstrategy():
KASSERT(curcpu()->ci_biglock_count > 0);
KASSERT(curcpu()->ci_ilevel >= IPL_BIO);
KASSERT(curlwp->l_blcnt > 0);
- that the buffer doesn't change under us in dkstart():
KASSERT(BUFQ_GET(sc->sc_bufq) == bp);
and this last KASSERT fired:
db{7}> tr
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x24d
__kernassert() at netbsd:__kernassert+0x2d
dkstart() at netbsd:dkstart+0x2f2
dkstrategy() at netbsd:dkstrategy+0xd2
bdev_strategy() at netbsd:bdev_strategy+0x50
spec_strategy() at netbsd:spec_strategy+0x5e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x65
bwrite() at netbsd:bwrite+0x192
VOP_BWRITE() at netbsd:VOP_BWRITE+0x6e
ffs_full_fsync() at netbsd:ffs_full_fsync+0x292
ffs_fsync() at netbsd:ffs_fsync+0x5d
VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
sched_sync() at netbsd:sched_sync+0x15d
(FWIW, CPU 0 was doing:
db{7}> mach cpu 0
using CPU 0
db{7}> tr
_kernel_lock() at netbsd:_kernel_lock+0x12d
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
Xintr_ioapic_level2() at netbsd:Xintr_ioapic_level2+0xf7
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
ubc_release() at netbsd:ubc_release+0x87
ubc_uiomove() at netbsd:ubc_uiomove+0xe4
ffs_write() at netbsd:ffs_write+0x667
VOP_WRITE() at netbsd:VOP_WRITE+0x66
vn_write() at netbsd:vn_write+0xce
dofilewrite() at netbsd:dofilewrite+0x81
sys_write() at netbsd:sys_write+0x72
syscall() at netbsd:syscall+0xb6
other CPUs were in the idle loop).
Now, given that the other KASSERT didn't fire I guess the only way this can
happen is that the thread did sleep between the BUFQ_PEEK() and
BUFQ_GET(). The only candidate is getiobuf(sc->sc_parent->dk_rawvp, false).
When called this way getiobuf() will do pool_cache_get(bufio_cache, PR_NOWAIT).
Does anyone see if this can sleep somewhere despite the PR_NOWAIT ?
Maybe in some low-level UVM or pmap operation ?
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index