NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-amd64/38478 (panic on boot when attaching cpu17)
The following reply was made to PR port-amd64/38478; it has been noted by GNATS.
From: "Christoph Egger" <Christoph_Egger%gmx.de@localhost>
To: Andrew Doran <ad%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-amd64/38478 (panic on boot when attaching cpu17)
Date: Tue, 13 May 2008 18:21:14 +0200
> On Tue, May 13, 2008 at 03:45:42PM +0200, Christoph Egger wrote:
> >
> > > Synopsis: panic on boot when attaching cpu17
> > >
> > > State-Changed-From-To: open->feedback
> > > State-Changed-By: ad%NetBSD.org@localhost
> > > State-Changed-When: Sun, 11 May 2008 22:57:34 +0000
> > > State-Changed-Why:
> > > Can you confirm that this is fixed in -current?
> >
> > Yes, partially at least.
> > There's still a panic on boot related to many cpus:
> >
> > kernel double fault trap, code=0
> > Stopped in pid 0.12 (system) at netbsd:Xintr_lapic_ltimer+0x38: movq
> %rdi,0(%rsp)
> >
> > %rsp value is 0xffff80004677afd0
> >
> > The double fault comes from a stack corruption/overflow.
> > The interrupt stack is filled up with so many lapic timer
> > handler's running concurrently.
>
> At what point in the boot process does it occur?
Randomly. But always after all CPUs were initialized.
The more drivers get initialized, the more interrupts occur.
In particular "attimer1: attached to pcppi1" and
"wskbd0 at pckbd0" are heavily io-port driven and interrupts
keep the cpu away from processing them. This results in a
significant slow down in the boot process.
attimer1 and pcppi1, both attach on acpi0.
> Can you pick a couple of other CPUs and see what they
> are doing?
The "machine cpu" ddb command does not show useful information
with many cpus. The output does not stop after 24 lines with
"-- more --" like "dmesg" does.
> A backtrace shows them recursing?
> What other functions+offsets are involved in the recursion?
backtrace is this (of an other boot):
Stopped in pid 1.1 (init) at netbsd:Xresume_lapic_ltimer+0x38: movq
0x8(%rsp),%rsi
db{7}> bt
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x38
---interrupt---
Bad frame pointer: 0xffff800001efd8c0
0xffff8000046815108:
db{7}>
A new kernel with
src/sys/arch/amd64/amd64/vector.S r1.22
src/sys/arch/x86/x86/intr.c r1.53
seems to fix this.
> > I suggest to make the interrupt stack per-cpu.
>
> CPUs never share a stack.
good.
--
Christoph
Home |
Main Index |
Thread Index |
Old Index