NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-amd64/40159: can't boot with multiple cpus anymore
The following reply was made to PR port-amd64/40159; it has been noted by GNATS.
From: Andrew Doran <ad%netbsd.org@localhost>
To: Martin Husemann <martin%duskware.de@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-amd64/40159: can't boot with multiple cpus anymore
Date: Sun, 14 Dec 2008 11:01:10 +0000
On Sun, Dec 14, 2008 at 11:04:08AM +0100, Martin Husemann wrote:
> Deadlock on kernel lock?
Doesn't look like it to me, at least, not a simple one.
> db{0}> bt
> breakpoint() at netbsd:breakpoint+0x5
> comintr() at netbsd:comintr+0x53a
> Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xef
> --- interrupt ---
> bus_space_read_1() at netbsd:bus_space_read_1+0xe
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xef
> --- interrupt ---
> ttstart() at netbsd:ttstart+0x7
> DDB lost frame for netbsd:Xsoftintr+0x50, trying 0xffff80004bac3d70
> Xsoftintr() at netbsd:Xsoftintr+0x50
> --- interrupt ---
> 0:
> db{0}> mach cpu 1
> using CPU 1
> db{0}> bt
> _kernel_lock() at netbsd:_kernel_lock+0x11e
> sleepq_block() at netbsd:sleepq_block+0x1b3
> usb_delay_ms() at netbsd:usb_delay_ms+0x75
> ohci_root_ctrl_start() at netbsd:ohci_root_ctrl_start+0xb23
> usbd_transfer() at netbsd:usbd_transfer+0xaf
> usbd_do_request_flags_pipe() at netbsd:usbd_do_request_flags_pipe+0xce
> usbd_do_request_flags() at netbsd:usbd_do_request_flags+0x25
> usbd_reset_port() at netbsd:usbd_reset_port+0x54
> uhub_explore() at netbsd:uhub_explore+0x2b8
> usb_discover() at netbsd:usb_discover+0x42
> usb_event_thread() at netbsd:usb_event_thread+0xcf
..
> > 6 7 204 ffff80004b2acbc0 softser/0
> 5 7 204 ffff80004b2aa000 softclk/0
> 4 7 204 ffff80004b2aa3e0 softbio/0
> 2 7 205 ffff80004b2aaba0 idle/0
There are 4 threads running on cpu0, three of them are interrupts. The
bottom thread is idle/0, a proper kthread, from there there is a stack of
softints executing "over" it:
idle/0
softclk/0 (interrupted idle)
softbio/0 (interrupted softclk)
softser/0 (interrupted softbio)
softser/0 is at the top of the stack because it has higher interrupt
priority than all the others. One of these threads holds kernel_lock. It's
not the idle thread, it never takes it. It could be a callout running from
softclk/0, it could be whatever is calling ttstart() in softser/0. It could
be softbio/or softclk and the softser interrupt is just incidental. If you
always see softser running, it's likely the code running in softser that's
the problem.
So cpu1 is waiting on the kernel_lock and a soft interrupt on cpu0 is
looping while something on cpu0 holds kernel_lock.
bt gives you the trace from softser, you _should_ be able to get a trace
from the interrupted threads on cpu0 with 't/a $lwpaddress' because x86 does
a partial save of their state into the PCB on soft interrupt. I have not
verified this.
With lockdebug, 'show lock _kernel_lock' is unlikely to be helpful as it
could have been acquired by any of the threads and likely in a generic
spot, like softint_dispatch().
Thanks,
Andrew
Home |
Main Index |
Thread Index |
Old Index