tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: getiobuf(x, false) can sleep ?
On Fri, Apr 02, 2010 at 11:01:10AM +0200, Manuel Bouyer wrote:
> Hi,
> A server with a large wedge device started panicing under high I/O load
> (and I guess memory pressure) on a KASSERT(!ISSET(bp->b_oflags, BO_DONE))
> in biodone(). The stack trace was always:
> db{0}> tr
> breakpoint() at netbsd:breakpoint+0x5
> panic() at netbsd:panic+0x24d
> __kernassert() at netbsd:__kernassert+0x2d
> biodone() at netbsd:biodone+0xc4
> dkiodone() at netbsd:dkiodone+0xa3
> biodone2() at netbsd:biodone2+0x95
> biointr() at netbsd:biointr+0x3c
>
> so I started suspecting buffer list corruption at the dkwedge level.
> I added instrumentation to check
> - that buffer queue manipulations are done at splbio() with the
> kernel_lock held in dkstart() and dkstrategy():
> KASSERT(curcpu()->ci_biglock_count > 0);
> KASSERT(curcpu()->ci_ilevel >= IPL_BIO);
> KASSERT(curlwp->l_blcnt > 0);
> - that the buffer doesn't change under us in dkstart():
> KASSERT(BUFQ_GET(sc->sc_bufq) == bp);
Should this not be bufq_peek()?
Hmm, it shouldn't be doing dkiodone -> dkstart -> VOP_STRATEGY.
VOP_STRATEGY should be called with process context (kthread, user).
Anyhow that's unlikely to fix your problem.
>
> and this last KASSERT fired:
> db{7}> tr
> breakpoint() at netbsd:breakpoint+0x5
> panic() at netbsd:panic+0x24d
> __kernassert() at netbsd:__kernassert+0x2d
> dkstart() at netbsd:dkstart+0x2f2
> dkstrategy() at netbsd:dkstrategy+0xd2
> bdev_strategy() at netbsd:bdev_strategy+0x50
> spec_strategy() at netbsd:spec_strategy+0x5e
> VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x65
> bwrite() at netbsd:bwrite+0x192
> VOP_BWRITE() at netbsd:VOP_BWRITE+0x6e
> ffs_full_fsync() at netbsd:ffs_full_fsync+0x292
> ffs_fsync() at netbsd:ffs_fsync+0x5d
> VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
> sched_sync() at netbsd:sched_sync+0x15d
>
> (FWIW, CPU 0 was doing:
> db{7}> mach cpu 0
> using CPU 0
> db{7}> tr
> _kernel_lock() at netbsd:_kernel_lock+0x12d
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
> Xintr_ioapic_level2() at netbsd:Xintr_ioapic_level2+0xf7
> --- interrupt ---
> Xspllower() at netbsd:Xspllower+0xe
> ubc_release() at netbsd:ubc_release+0x87
> ubc_uiomove() at netbsd:ubc_uiomove+0xe4
> ffs_write() at netbsd:ffs_write+0x667
> VOP_WRITE() at netbsd:VOP_WRITE+0x66
> vn_write() at netbsd:vn_write+0xce
> dofilewrite() at netbsd:dofilewrite+0x81
> sys_write() at netbsd:sys_write+0x72
> syscall() at netbsd:syscall+0xb6
> other CPUs were in the idle loop).
>
>
>
> Now, given that the other KASSERT didn't fire I guess the only way this can
> happen is that the thread did sleep between the BUFQ_PEEK() and
> BUFQ_GET(). The only candidate is getiobuf(sc->sc_parent->dk_rawvp, false).
>
> When called this way getiobuf() will do pool_cache_get(bufio_cache,
> PR_NOWAIT).
> Does anyone see if this can sleep somewhere despite the PR_NOWAIT ?
> Maybe in some low-level UVM or pmap operation ?
>
> --
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
Home |
Main Index |
Thread Index |
Old Index