NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/38478 (panic on boot when attaching cpu17)



The following reply was made to PR port-amd64/38478; it has been noted by GNATS.

From: "Christoph Egger" <Christoph_Egger%gmx.de@localhost>
To: Andrew Doran <ad%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-amd64/38478 (panic on boot when attaching cpu17)
Date: Tue, 13 May 2008 18:21:14 +0200

 > On Tue, May 13, 2008 at 03:45:42PM +0200, Christoph Egger wrote:
 > > 
 > > > Synopsis: panic on boot when attaching cpu17
 > > > 
 > > > State-Changed-From-To: open->feedback
 > > > State-Changed-By: ad%NetBSD.org@localhost
 > > > State-Changed-When: Sun, 11 May 2008 22:57:34 +0000
 > > > State-Changed-Why:
 > > > Can you confirm that this is fixed in -current?
 > > 
 > > Yes, partially at least.
 > > There's still a panic on boot related to many cpus:
 > > 
 > > kernel double fault trap, code=0
 > > Stopped in pid 0.12 (system) at netbsd:Xintr_lapic_ltimer+0x38: movq  
 > %rdi,0(%rsp)
 > > 
 > > %rsp value is 0xffff80004677afd0
 > > 
 > > The double fault comes from a stack corruption/overflow.
 > > The interrupt stack is filled up with so many lapic timer
 > > handler's running concurrently.
 > 
 > At what point in the boot process does it occur?
 
 Randomly. But always after all CPUs were initialized.
 The more drivers get initialized, the more interrupts occur.
 
 In particular "attimer1: attached to pcppi1" and
 "wskbd0 at pckbd0" are heavily io-port driven and interrupts
 keep the cpu away from processing them. This results in a
 significant slow down in the boot process.
 
 attimer1 and pcppi1, both attach on acpi0.
 
 > Can you pick a couple of other CPUs and see what they
 > are doing?
 
 The "machine cpu"  ddb command does not show useful information
 with many cpus. The output does not stop after 24 lines with
 "-- more --" like "dmesg" does.
 
 
 > A backtrace shows them recursing?
 > What other functions+offsets are involved in the recursion?
 
 backtrace is this (of an other boot):
 
 Stopped in pid 1.1 (init) at netbsd:Xresume_lapic_ltimer+0x38:  movq 
0x8(%rsp),%rsi
 db{7}> bt
 Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x38
 ---interrupt---
 Bad frame pointer: 0xffff800001efd8c0
 0xffff8000046815108:
 db{7}>
 
 
 A new kernel with
 
 src/sys/arch/amd64/amd64/vector.S  r1.22
 src/sys/arch/x86/x86/intr.c r1.53
 
 seems to fix this.
 
  
 > > I suggest to make the interrupt stack per-cpu.
 > 
 > CPUs never share a stack.
 
 good.
 
 -- 
 
 Christoph
 
 


Home | Main Index | Thread Index | Old Index