Re: port-amd64/40159: can't boot with multiple cpus anymore

To: port-amd64-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,martin%duskware.de@localhost
Subject: Re: port-amd64/40159: can't boot with multiple cpus anymore
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Sun, 14 Dec 2008 11:05:04 +0000 (UTC)

The following reply was made to PR port-amd64/40159; it has been noted by GNATS.

From: Andrew Doran <ad%netbsd.org@localhost>
To: Martin Husemann <martin%duskware.de@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-amd64/40159: can't boot with multiple cpus anymore
Date: Sun, 14 Dec 2008 11:01:10 +0000

 On Sun, Dec 14, 2008 at 11:04:08AM +0100, Martin Husemann wrote:

 > Deadlock on kernel lock?

 Doesn't look like it to me, at least, not a simple one.

 > db{0}> bt
 > breakpoint() at netbsd:breakpoint+0x5
 > comintr() at netbsd:comintr+0x53a
 > Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xef
 > --- interrupt ---
 > bus_space_read_1() at netbsd:bus_space_read_1+0xe
 > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 > Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xef
 > --- interrupt ---
 > ttstart() at netbsd:ttstart+0x7
 > DDB lost frame for netbsd:Xsoftintr+0x50, trying 0xffff80004bac3d70
 > Xsoftintr() at netbsd:Xsoftintr+0x50
 > --- interrupt ---
 > 0:
 > db{0}> mach cpu 1
 > using CPU 1
 > db{0}> bt
 > _kernel_lock() at netbsd:_kernel_lock+0x11e
 > sleepq_block() at netbsd:sleepq_block+0x1b3
 > usb_delay_ms() at netbsd:usb_delay_ms+0x75
 > ohci_root_ctrl_start() at netbsd:ohci_root_ctrl_start+0xb23
 > usbd_transfer() at netbsd:usbd_transfer+0xaf
 > usbd_do_request_flags_pipe() at netbsd:usbd_do_request_flags_pipe+0xce
 > usbd_do_request_flags() at netbsd:usbd_do_request_flags+0x25
 > usbd_reset_port() at netbsd:usbd_reset_port+0x54
 > uhub_explore() at netbsd:uhub_explore+0x2b8
 > usb_discover() at netbsd:usb_discover+0x42
 > usb_event_thread() at netbsd:usb_event_thread+0xcf
 ..
 >            >   6 7       204   ffff80004b2acbc0          softser/0
 >                5 7       204   ffff80004b2aa000          softclk/0
 >                4 7       204   ffff80004b2aa3e0          softbio/0
 >                2 7       205   ffff80004b2aaba0             idle/0

 There are 4 threads running on cpu0, three of them are interrupts. The
 bottom thread is idle/0, a proper kthread, from there there is a stack of
 softints executing "over" it:

        idle/0
                softclk/0 (interrupted idle)
                        softbio/0 (interrupted softclk)
                                softser/0 (interrupted softbio)

 softser/0 is at the top of the stack because it has higher interrupt
 priority than all the others. One of these threads holds kernel_lock. It's
 not the idle thread, it never takes it. It could be a callout running from
 softclk/0, it could be whatever is calling ttstart() in softser/0. It could
 be softbio/or softclk and the softser interrupt is just incidental. If you
 always see softser running, it's likely the code running in softser that's
 the problem.

 So cpu1 is waiting on the kernel_lock and a soft interrupt on cpu0 is
 looping while something on cpu0 holds kernel_lock.

 bt gives you the trace from softser, you _should_ be able to get a trace
 from the interrupted threads on cpu0 with 't/a $lwpaddress' because x86 does
 a partial save of their state into the PCB on soft interrupt. I have not
 verified this.

 With lockdebug, 'show lock _kernel_lock' is unlikely to be helpful as it
 could have been acquired by any of the threads and likely in a generic
 spot, like softint_dispatch().

 Thanks,
 Andrew

Prev by Date: Re: kern/40002
Next by Date: PR/40163 CVS commit: [netbsd-5] src/sys/ufs/ufs
Previous by Thread: Re: port-amd64/40159: can't boot with multiple cpus anymore
Next by Thread: Re: port-amd64/40159: can't boot with multiple cpus anymore
Indexes:

Home | Main Index | Thread Index | Old Index