Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

-current amd64 does not boot on huge machine (80 cores, RAM 1TB)



Hi,

We just got, at work, a new toy ... This is a Supermicro SuperServer
5086B-TRF[1] machine, with 80 cores and RAM 1TB. Unfortunately, i
cannot boot -current amd64 on it.

Using a non DIAGNOSTIC kernel does not help, except that
i82489_icr_wait does not fire anymore as expected.

Normal boot hang when probing cpu0, SMP disabled boot fails with
KASSERT and ACPI disabled boot hang when probing cpu1.

Attached corresponding dmesg buffers.

Any idea where to look for ?
Thanks.

[1] http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm

-- 
Nicolas Joly

Projects and Developments in Bioinformatics
Institut Pasteur, Paris.
> boot -x
[...]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.99.56 (GENERIC) #3: Fri Sep 30 17:28:16 CEST 2011
        
njoly%lanfeust.sis.pasteur.fr@localhost:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/GENERIC
total memory = 1023 GB
avail memory = 994 GB
Prep module path=cd9660 len=628973 pa=fda000
No bootinfo commands at boot
SMBIOS rev. 2.6 @ 0x75b52018 (370 entries)
mainbus0 (root)
ACPI Warning: 32/64X FACS address mismatch in FADT - 
0x79B95F40/0x0000000079B95F80, using 32 (20110623/tbfadt-517)
cpu0 at mainbus0 apid 0/local/src/NetBSD/src/sys/arch/x86/x86/mtrr_i686.c: 
FIXME: more than 8 MTRRs (10)
[...HANG HERE...]
> boot -1
[...]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.99.56 (GENERIC) #3: Fri Sep 30 17:28:16 CEST 2011
        
njoly%lanfeust.sis.pasteur.fr@localhost:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/GENERIC
total memory = 1023 GB
avail memory = 994 GB
mainbus0 (root)
ACPI Warning: 32/64X FACS address mismatch in FADT - 
0x79B95F40/0x0000000079B95F80, using 32 (20110623/tbfadt-517)
cpu0 at mainbus0 apid 0: multiprocessor boot disabled
cpu1 at mainbus0 apid 2: multiprocessor boot disabled
cpu2 at mainbus0 apid 4: multiprocessor boot disabled
cpu3 at mainbus0 apid 16: multiprocessor boot disabled
cpu4 at mainbus0 apid 18: multiprocessor boot disabled
cpu5 at mainbus0 apid 32: multiprocessor boot disabled
cpu6 at mainbus0 apid 34: multiprocessor boot disabled
cpu7 at mainbus0 apid 36: multiprocessor boot disabled
cpu8 at mainbus0 apid 48: multiprocessor boot disabled
cpu9 at mainbus0 apid 50: multiprocessor boot disabled
cpu10 at mainbus0 apid 64: multiprocessor boot disabled
cpu11 at mainbus0 apid 66: multiprocessor boot disabled
cpu12 at mainbus0 apid 68: multiprocessor boot disabled
cpu13 at mainbus0 apid 80: multiprocessor boot disabled
cpu14 at mainbus0 apid 82: multiprocessor boot disabled
cpu15 at mainbus0 apid 96: multiprocessor boot disabled
cpu16 at mainbus0 apid 98: multiprocessor boot disabled
cpu17 at mainbus0 apid 100: multiprocessor boot disabled
cpu18 at mainbus0 apid 112: multiprocessor boot disabled
cpu19 at mainbus0 apid 114: multiprocessor boot disabled
cpu20 at mainbus0 apid 128: multiprocessor boot disabled
cpu21 at mainbus0 apid 130: multiprocessor boot disabled
cpu22 at mainbus0 apid 132: multiprocessor boot disabled
cpu23 at mainbus0 apid 144: multiprocessor boot disabled
cpu24 at mainbus0 apid 146: multiprocessor boot disabled
cpu25 at mainbus0 apid 160: multiprocessor boot disabled
cpu26 at mainbus0 apid 162: multiprocessor boot disabled
cpu27 at mainbus0 apid 164: multiprocessor boot disabled
cpu28 at mainbus0 apid 176: multiprocessor boot disabled
cpu29 at mainbus0 apid 178: multiprocessor boot disabled
cpu30 at mainbus0 apid 192: multiprocessor boot disabled
cpu31 at mainbus0 apid 194: multiprocessor boot disabled
cpu32 at mainbus0 apid 196: multiprocessor boot disabled
cpu33 at mainbus0 apid 208: multiprocessor boot disabled
cpu34 at mainbus0 apid 210: multiprocessor boot disabled
cpu35 at mainbus0 apid 224: multiprocessor boot disabled
cpu36 at mainbus0 apid 226: multiprocessor boot disabled
cpu37 at mainbus0 apid 228: multiprocessor boot disabled
cpu38 at mainbus0 apid 240: multiprocessor boot disabled
cpu39 at mainbus0 apid 242: multiprocessor boot disabled
cpu40 at mainbus0 apid 256: multiprocessor boot disabled
cpu41 at mainbus0 apid 258: multiprocessor boot disabled
cpu42 at mainbus0 apid 260: multiprocessor boot disabled
cpu43 at mainbus0 apid 272: multiprocessor boot disabled
cpu44 at mainbus0 apid 274: multiprocessor boot disabled
cpu45 at mainbus0 apid 288: multiprocessor boot disabled
cpu46 at mainbus0 apid 290: multiprocessor boot disabled
cpu47 at mainbus0 apid 292: multiprocessor boot disabled
cpu48 at mainbus0 apid 304: multiprocessor boot disabled
cpu49 at mainbus0 apid 306: multiprocessor boot disabled
cpu50 at mainbus0 apid 320: multiprocessor boot disabled
cpu51 at mainbus0 apid 322: multiprocessor boot disabled
cpu52 at mainbus0 apid 324: multiprocessor boot disabled
cpu53 at mainbus0 apid 336: multiprocessor boot disabled
cpu54 at mainbus0 apid 338: multiprocessor boot disabled
cpu55 at mainbus0 apid 352: multiprocessor boot disabled
cpu56 at mainbus0 apid 354: multiprocessor boot disabled
cpu57 at mainbus0 apid 356: multiprocessor boot disabled
cpu58 at mainbus0 apid 368: multiprocessor boot disabled
cpu59 at mainbus0 apid 370: multiprocessor boot disabled
cpu60 at mainbus0 apid 384: multiprocessor boot disabled
cpu61 at mainbus0 apid 386: multiprocessor boot disabled
cpu62 at mainbus0 apid 388: multiprocessor boot disabled
cpu63 at mainbus0 apid 400: multiprocessor boot disabled
cpu64 at mainbus0 apid 402: multiprocessor boot disabled
cpu65 at mainbus0 apid 416: multiprocessor boot disabled
cpu66 at mainbus0 apid 418: multiprocessor boot disabled
cpu67 at mainbus0 apid 420: multiprocessor boot disabled
cpu68 at mainbus0 apid 432: multiprocessor boot disabled
cpu69 at mainbus0 apid 434: multiprocessor boot disabled
cpu70 at mainbus0 apid 448: multiprocessor boot disabled
cpu71 at mainbus0 apid 450: multiprocessor boot disabled
cpu72 at mainbus0 apid 452: multiprocessor boot disabled
cpu73 at mainbus0 apid 464: multiprocessor boot disabled
cpu74 at mainbus0 apid 466: multiprocessor boot disabled
cpu75 at mainbus0 apid 480: multiprocessor boot disabled
cpu76 at mainbus0 apid 482: multiprocessor boot disabled
cpu77 at mainbus0 apid 484: multiprocessor boot disabled
cpu78 at mainbus0 apid 496: multiprocessor boot disabled
cpu79 at mainbus0 apid 498: multiprocessor boot disabled
ioapic0 at mainbus0 apid 0
ioapic1 at mainbus0 apid 2
ioapic2 at mainbus0 apid 3
acpi0 at mainbus0: Intel ACPICA 20110623
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
IOH (PNP0C01) at acpi0 not configured
SIO1 (PNP0C02) at acpi0 not configured
pckbc1 at acpi0 (PS2K, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc2 at acpi0 (PS2M, PNP0F03) (aux port): irq 12
SIO2 (PNP0C02) at acpi0 not configured
UAR1 (PNP0501) at acpi0 not configured
UAR2 (PNP0501) at acpi0 not configured
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
RMSC (PNP0C02) at acpi0 not configured
SPMI (IPI0001) at acpi0 not configured
ICH9 (PNP0C01) at acpi0 not configured
acpibut0 at acpi0 (SLPB, PNP0C0E): ACPI Sleep Button
IOH1 (PNP0C01) at acpi0 not configured
acpibut1 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
RMEM (PNP0C01) at acpi0 not configured
OMSC (PNP0C02) at acpi0 not configured
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x3407 (rev. 0x22)
ppb0 at pci0 dev 1 function 0: vendor 0x8086 product 0x3408 (rev. 0x22)
ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci1 at ppb0 bus 1
ppb1 at pci0 dev 3 function 0: vendor 0x8086 product 0x340a (rev. 0x22)
ppb1: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci2 at ppb1 bus 2
ppb2 at pci0 dev 7 function 0: vendor 0x8086 product 0x340e (rev. 0x22)
ppb2: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci3 at ppb2 bus 3
ppb3 at pci3 dev 0 function 0: vendor 0x10b5 product 0x8648 (rev. 0xbb)
ppb3: PCI Express 2.0 <Upstream Port of PCI-E Switch>
pci4 at ppb3 bus 4
ppb4 at pci4 dev 4 function 0: vendor 0x10b5 product 0x8648 (rev. 0xbb)
ppb4: PCI Express 2.0 <Downstream Port of PCI-E Switch>
pci5 at ppb4 bus 5
ppb5 at pci4 dev 5 function 0: vendor 0x10b5 product 0x8648 (rev. 0xbb)
ppb5: PCI Express 2.0 <Downstream Port of PCI-E Switch>
pci6 at ppb5 bus 6
ppb6 at pci4 dev 8 function 0: vendor 0x10b5 product 0x8648 (rev. 0xbb)
ppb6: PCI Express 2.0 <Downstream Port of PCI-E Switch>
pci7 at ppb6 bus 7
ppb7 at pci4 dev 9 function 0: vendor 0x10b5 product 0x8648 (rev. 0xbb)
ppb7: PCI Express 2.0 <Downstream Port of PCI-E Switch>
pci8 at ppb7 bus 8
vendor 0x13c1 product 0x1010 (RAID mass storage, revision 0x05) at pci8 dev 0 
function 0 not configured
vendor 0x8086 product 0x342d (interrupt system, interface 0x20, revision 0x22) 
at pci0 dev 19 function 0 not configured
vendor 0x8086 product 0x342e (interrupt system, revision 0x22) at pci0 dev 20 
function 0 not configured
vendor 0x8086 product 0x3422 (interrupt system, revision 0x22) at pci0 dev 20 
function 1 not configured
vendor 0x8086 product 0x3423 (interrupt system, revision 0x22) at pci0 dev 20 
function 2 not configured
vendor 0x8086 product 0x3438 (interrupt system, revision 0x22) at pci0 dev 20 
function 3 not configured
vendor 0x8086 product 0x342f (interrupt system, interface 0x20, revision 0x22) 
at pci0 dev 21 function 0 not configured
uhci0 at pci0 dev 26 function 0: vendor 0x8086 product 0x3a37 (rev. 0x00)
uhci0: interrupting at ioapic0 pin 16
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 26 function 1: vendor 0x8086 product 0x3a38 (rev. 0x00)
uhci1: interrupting at ioapic0 pin 21
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 26 function 2: vendor 0x8086 product 0x3a39 (rev. 0x00)
uhci2: interrupting at ioapic0 pin 18
usb2 at uhci2: USB revision 1.0
ehci0 at pci0 dev 26 function 7: vendor 0x8086 product 0x3a3c (rev. 0x00)
ehci0: interrupting at ioapic0 pin 18
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
uhci3 at pci0 dev 29 function 0: vendor 0x8086 product 0x3a34 (rev. 0x00)
uhci3: interrupting at ioapic0 pin 23
usb4 at uhci3: USB revision 1.0
uhci4 at pci0 dev 29 function 1: vendor 0x8086 product 0x3a35 (rev. 0x00)
uhci4: interrupting at ioapic0 pin 19
usb5 at uhci4: USB revision 1.0
uhci5 at pci0 dev 29 function 2: vendor 0x8086 product 0x3a36 (rev. 0x00)
uhci5: interrupting at ioapic0 pin 18
usb6 at uhci5: USB revision 1.0
ehci1 at pci0 dev 29 function 7: vendor 0x8086 product 0x3a3a (rev. 0x00)
ehci1: interrupting at ioapic0 pin 23
ehci1: companion controllers, 2 ports each: uhci3 uhci4 uhci5
usb7 at ehci1: USB revision 2.0
ppb8 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0x90)
pci9 at ppb8 bus 9
vga0 at pci9 dev 1 function 0: vendor 0x102b product 0x0532 (rev. 0x0a)
wsdisplay0 at vga0 kbdmux 1
drm at vga0 not configured
pcib0 at pci0 dev 31 function 0: vendor 0x8086 product 0x3a16 (rev. 0x00)
piixide0 at pci0 dev 31 function 2: Intel 82801JI Serial ATA Controller (ICH10) 
(rev. 0x00)
piixide0: using ioapic0 pin 19 for native-PCI interrupt
atabus0 at piixide0 channel 0
atabus1 at piixide0 channel 1
ichsmb0 at pci0 dev 31 function 3: vendor 0x8086 product 0x3a30 (rev. 0x00)
ichsmb0: interrupting at ioapic0 pin 18
iic0 at ichsmb0: I2C bus
piixide1 at pci0 dev 31 function 5: Intel 82801JI Serial ATA Controller (ICH10) 
(rev. 0x00)
piixide1: using ioapic0 pin 19 for native-PCI interrupt
atabus2 at piixide1 channel 0
atabus3 at piixide1 channel 1
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pci10 at mainbus0 bus 128
ppb9 at pci10 dev 0 function 0: vendor 0x8086 product 0x3420 (rev. 0x22)
ppb9: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci11 at ppb9 bus 129
ppb10 at pci10 dev 1 function 0: vendor 0x8086 product 0x3408 (rev. 0x22)
ppb10: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci12 at ppb10 bus 130
wm0 at pci12 dev 0 function 0: 82576 1000BaseT Ethernet, rev. 1
wm0: interrupting at ioapic0 pin 16
wm0: Ethernet address 00:30:48:ff:7c:a8
igphy0 at wm0 phy 1: i82566 10/100/1000 media interface, rev. 1
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
wm1 at pci12 dev 0 function 1: 82576 1000BaseT Ethernet, rev. 1
wm1: interrupting at ioapic0 pin 17
wm1: Ethernet address 00:30:48:ff:7c:a9
igphy1 at wm1 phy 1: i82566 10/100/1000 media interface, rev. 1
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
ppb11 at pci10 dev 3 function 0: vendor 0x8086 product 0x340a (rev. 0x22)
ppb11: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci13 at ppb11 bus 132
ppb12 at pci10 dev 7 function 0: vendor 0x8086 product 0x340e (rev. 0x22)
ppb12: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci14 at ppb12 bus 133
vendor 0x8086 product 0x342d (interrupt system, interface 0x20, revision 0x22) 
at pci10 dev 19 function 0 not configured
vendor 0x8086 product 0x342e (interrupt system, revision 0x22) at pci10 dev 20 
function 0 not configured
vendor 0x8086 product 0x3422 (interrupt system, revision 0x22) at pci10 dev 20 
function 1 not configured
vendor 0x8086 product 0x3423 (interrupt system, revision 0x22) at pci10 dev 20 
function 2 not configured
vendor 0x8086 product 0x3438 (interrupt system, revision 0x22) at pci10 dev 20 
function 3 not configured
panic: kernel diagnostic assertion "ci->ci_tlbstate != TLBSTATE_VALID" failed: 
file "/local/src/NetBSD/src/sys/arch/x86/x86/pmap.c", line 2496 
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80253065 cs 8 rflags 246 cr2  0 cpl 0 rsp 
ffff8007ac75ab90
Skipping crash dump on recursive panic
panic: i82489_icr_wait: busy
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80253065 cs 8 rflags 246 cr2  0 cpl 8 rsp 
ffff8007ac75a7a0
Skipping crash dump on recursive panic
panic: i82489_icr_wait: busy
[...]
> boot -2
[...]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.99.56 (GENERIC) #3: Fri Sep 30 17:28:16 CEST 2011
        
njoly%lanfeust.sis.pasteur.fr@localhost:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/GENERIC
total memory = 1023 GB
avail memory = 994 GB
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (   A M I       ALASKA)
cpu0 at mainbus0 apid 0cpu0: unable to reset apic id
/local/src/NetBSD/src/sys/arch/x86/x86/mtrr_i686.c: FIXME: more than 8 MTRRs 
(10)
: Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz, id 0x206f2
cpu1 at mainbus0 apid 2panic: i82489_icr_wait: busy
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80253065 cs 8 rflags 246 cr2  0 cpl 8 rsp 
ffffffff81078b20
Skipping crash dump on recursive panic
panic: i82489_icr_wait: busy
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80253065 cs 8 rflags 246 cr2  0 cpl 8 rsp 
ffffffff81078730
Skipping crash dump on recursive panic
panic: i82489_icr_wait: busy
[...]


Home | Main Index | Thread Index | Old Index