Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD-current on amd64 with Dell PERC 4e/Di hangs under load



Hi, all.

On 2015/01/28 22:51, Christos Zoulas wrote:
> On Jan 28,  7:52am, tih%hamartun.priv.no@localhost (Tom Ivar Helbekkmo) wrote:
> -- Subject: Re: NetBSD-current on amd64 with Dell PERC 4e/Di hangs under load
> 
> | Christos Zoulas <christos%zoulas.com@localhost> writes:
> | 
> | > If LOCKDEBUG hangs, you have problems.... I'd try to get more
> | > information on that first.
> | 
> | It's the same hang; it just occurred a bit earlier with LOCKDEBUG,
> | possibly because the lowered performance of the kernel with that option
> | on meant that it got pushed over the edge by the /etc/rc startup.
> 
> You also have DEBUG and DIAGNOSTIC on?
> 
> | Tried that this morning, but it made no difference.
> 
> I kind of expected that. Can you boot with a single processor?
> Let's try to simplify the workload.
> 
> | I've also tried adding some of the more recent bug fixes and
> | improvements from the FreeBSD amr driver to ours, but they didn't make
> | any difference either.  The one I didn't apply was the one where they
> | monitor for repeated EAGAIN failures on the same operation, to detect a
> | hung controller.  Instead, I made sure that every timeout, and every
> | EAGAIN due to a busy controller, will complain to the console.  There's
> | been no complaining so far, so I'm starting to think the amr driver
> | isn't the actual problem, after all - unless it's getting stuck in a way
> | that keeps the kernel printf() from writing to the console.
> | 
> | Below is latest hang (provoked after applying your patch).  As usual,
> | there are processors working on disk I/O and networking, and there's at
> | least one processor in the _kernel_lock() function.  I notice a CPU
> | handling an uhci interrupt, too.  That's a USB attached serial port, and
> | things got significantly worse when I added that (it's NUT monitoring my
> | UPS, chatting more or less continuously with it over a 1200bps link).
> | 
> | CPU 0:
> | 
> | bus_space_read_2()
> | uhci_intr()
> | Xintr_ioapic_level4()
> | ---interrupt---
> | Xspllower()
> | kpreemtp_enable()
> | pmap_extract()
> | _bus_dmamap_load_buffer.isra.12()
> | bus_dmamap_load()
> | amr_ccb_map()
> | ld_amr_dobio()
> | ldstart()
> | lddone()
> | amr_intr()
> | intr_biglock_wrapper()
> | Xintr_ioapic_level1()
> | ---interrupt---
> | Xspllower()
> | spec_strategy()
> | VOP_STRATEGY()
> | bwrite()
> | VOP_BWRITE()
> | wapbl_flush()
> | ffs_sync()
> | VFS_SYNC()
> | sync_fsync()
> | VOP_FSYNC()
> | sched_sync()
> | 
> | CPU 1:
> | 
> | x86_pause()
> | sleepq_block()
> | cv_timedwait()
> | ipmi_thread()
> | 
> | CPU 2:
> | 
> | x86_pause()
> | frag6_fasttimo()
> | pffasttimo()
> | callout_softclock()
> | softint_dispatch()
> | Xsoftintr()
> | 
> | CPU 3:
> | 
> | _kernel_lock()
> | bdev_strategy()
> | spec_strategy()
> | VOP_STRATEGY()
> | ufs_strategy()
> | VOP_STRATEGY()
> | bio_doread.isra.4()
> | bread()
> | ffs_read()
> | VOP_READ()
> | ufs_readdir()
> | VOP_READDIR()
> | vn_readdir()
> | sys___getdents30()
> | syscall()
> | ---syscall 390---
> 
> My guess is that something is getting hang with amr?
> 
> christos

The original report from Tom Ivar Helbekkmo is not related to ixgbe
because the dmesg in his mail has not ixg(4)'s lines.

And, PR#49328 reported by Uwe Toenjes is from ixg(4) itself. The following
mail showed it's a lock related problem and the PR# was submitted:

	http://mail-index.netbsd.org/current-users/2014/10/11/msg025932.html

This problem is technically important, so I'll mail to this problem
to tech-kern@.


-- 
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index