Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD-current on amd64 with Dell PERC 4e/Di hangs under load



On Jan 28,  7:52am, tih%hamartun.priv.no@localhost (Tom Ivar Helbekkmo) wrote:
-- Subject: Re: NetBSD-current on amd64 with Dell PERC 4e/Di hangs under load

| Christos Zoulas <christos%zoulas.com@localhost> writes:
| 
| > If LOCKDEBUG hangs, you have problems.... I'd try to get more
| > information on that first.
| 
| It's the same hang; it just occurred a bit earlier with LOCKDEBUG,
| possibly because the lowered performance of the kernel with that option
| on meant that it got pushed over the edge by the /etc/rc startup.

You also have DEBUG and DIAGNOSTIC on?

| Tried that this morning, but it made no difference.

I kind of expected that. Can you boot with a single processor?
Let's try to simplify the workload.

| I've also tried adding some of the more recent bug fixes and
| improvements from the FreeBSD amr driver to ours, but they didn't make
| any difference either.  The one I didn't apply was the one where they
| monitor for repeated EAGAIN failures on the same operation, to detect a
| hung controller.  Instead, I made sure that every timeout, and every
| EAGAIN due to a busy controller, will complain to the console.  There's
| been no complaining so far, so I'm starting to think the amr driver
| isn't the actual problem, after all - unless it's getting stuck in a way
| that keeps the kernel printf() from writing to the console.
| 
| Below is latest hang (provoked after applying your patch).  As usual,
| there are processors working on disk I/O and networking, and there's at
| least one processor in the _kernel_lock() function.  I notice a CPU
| handling an uhci interrupt, too.  That's a USB attached serial port, and
| things got significantly worse when I added that (it's NUT monitoring my
| UPS, chatting more or less continuously with it over a 1200bps link).
| 
| CPU 0:
| 
| bus_space_read_2()
| uhci_intr()
| Xintr_ioapic_level4()
| ---interrupt---
| Xspllower()
| kpreemtp_enable()
| pmap_extract()
| _bus_dmamap_load_buffer.isra.12()
| bus_dmamap_load()
| amr_ccb_map()
| ld_amr_dobio()
| ldstart()
| lddone()
| amr_intr()
| intr_biglock_wrapper()
| Xintr_ioapic_level1()
| ---interrupt---
| Xspllower()
| spec_strategy()
| VOP_STRATEGY()
| bwrite()
| VOP_BWRITE()
| wapbl_flush()
| ffs_sync()
| VFS_SYNC()
| sync_fsync()
| VOP_FSYNC()
| sched_sync()
| 
| CPU 1:
| 
| x86_pause()
| sleepq_block()
| cv_timedwait()
| ipmi_thread()
| 
| CPU 2:
| 
| x86_pause()
| frag6_fasttimo()
| pffasttimo()
| callout_softclock()
| softint_dispatch()
| Xsoftintr()
| 
| CPU 3:
| 
| _kernel_lock()
| bdev_strategy()
| spec_strategy()
| VOP_STRATEGY()
| ufs_strategy()
| VOP_STRATEGY()
| bio_doread.isra.4()
| bread()
| ffs_read()
| VOP_READ()
| ufs_readdir()
| VOP_READDIR()
| vn_readdir()
| sys___getdents30()
| syscall()
| ---syscall 390---

My guess is that something is getting hang with amr?

christos


Home | Main Index | Thread Index | Old Index