Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: fix dom0 crash on large machines




> On Mon, Nov 23, 2009 at 10:42:08PM +0100, Christoph Egger wrote:
> > 
> > Hi!
> > 
> > On large machines, the phycpu_info array is limited
> > to 32 and the array is used as apic index.
> > 
> > There are systems where the apic index of the first CPU
> > starts with 16 and ends beyond of 32.
> > 
> > On such machines the Dom0 boot fails with a panic:
> > 
> >     cpu12 at mainbus0 apid 32panic: cpu at apic id 32 already attached?
> > 
> > (See PR port-xen/41755)
> > 
> > 
> > This patch eliminates the limitation by removing
> > X86_MAXPROCS.
> > As a side effect, the patch also reduces the diff
> > to sys/arch/x86/x86/cpu.c
> > 
> > http://www.netbsd.org/~cegger/xen_phycpu.diff
> 
> Looks good. But dom0 doesn't manage the LAPIC itself at
> this time;
> if you're confident that the LAPIC will always be managed
> by the hypervisor; you could remove completely the code in
> #if NLAPIC > 0/#endif.

I committed it as is. I left the NLAPIC stuff in because
I have another patch in progress which merges xen/x86/cpu.c
and x86/x86/cpu.c

> > I test it on a 48 CPU machine where the apic index
> > goes from 16 to 75:
> 
> wow.

NetBSD Dom0 didn't boot through to the login, though.
The boot stopped with an endless loop of

probe(mpt0:0:1:0): command timeout
mpt0: recovered from command timeout

The mpt0 is a

mpt0 at pci2 dev 0 function 0: Symbios Logic SAS1064E
linkdev LN20 returned ACPI global irq 76, line 76
ioapic2: int20 0x1807b<vector=0x7b,delmode=0x0,level,masked,dest=0x0> 
0x10000000<target=0x10>
mpt0: interrupting at ioapic2 pin 20, event channel 8
mpt0: Phy 0: Link Rate 3.0 Gbps
scsibus0 at mpt0: 112 targets, 8 luns per target

and disabling mpt0 in userconf ended with

nfs_boot: trying DHCP/BOOTP
bge0: watchdog timeout -- resetting
nfs_boot: timeout...
bge0: watchdog timeout -- resetting
nfs_boot: timeout...
bge0: watchdog timeout -- resetting
nfs_boot: timeout...

bge0 is

ppb0: unsupported PCI Express version
pci1 at ppb0 bus 1
bge0 at pci1 dev 0 function 0: Broadcom BCM5751 Gigabit Ethernet
linkdev LN04 returned ACPI global irq 60, line 60
ioapic2: int4 0x180ba<vector=0xba,delmode=0x0,level,masked,dest=0x0> 
0x10000000<target=0x10>
bge0: interrupting at ioapic2 pin 4, event channel 7
bge0: ASIC BCM5750 C1 (0x4201), Ethernet address 00:10:18:30:a1:32
brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto

Christoph


Home | Main Index | Thread Index | Old Index