Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: -current amd64 does not boot on huge machine (80 cores, RAM 1TB)



I had a SuperMicro system (dual Opteron 12-core) a while back which had similar issues with interrupt routing. The attached patch got my box working, and did not break any of my other systems (I am still running the patch on my 5.99.55 systems).

On Thu, 6 Oct 2011, Christoph Egger wrote:

On 03.10.11 14:15, Nicolas Joly wrote:

Hi,

We just got, at work, a new toy ... This is a Supermicro SuperServer
5086B-TRF[1] machine, with 80 cores and RAM 1TB. Unfortunately, i
cannot boot -current amd64 on it.

Using a non DIAGNOSTIC kernel does not help, except that
i82489_icr_wait does not fire anymore as expected.

Normal boot hang when probing cpu0, SMP disabled boot fails with
KASSERT and ACPI disabled boot hang when probing cpu1.

Attached corresponding dmesg buffers.

Any idea where to look for ?
Thanks.

[1] http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm


From the dmesg it looks like this machine has two PCI host controllers.

If this is the case then the problem is in parsing the interrupt routing
from ACPI.
The parser does not deal with ACPI PCI segments. So when PCI bus, device
and function numbers are the same then the interrupt routing from the
first host controller is overriden with the information from the second
PCI host controller.

This lets the interrupt handler wait for interrupts coming from the
second PCI host controller while it actually came from the first one
=> hang at boot.

Christoph

!DSPAM:4e8d14b21969081810542!




-------------------------------------------------------------------------
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |                          | pgoyette at netbsd.org  |
-------------------------------------------------------------------------
Index: mpacpi.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/mpacpi.c,v
retrieving revision 1.87
diff -u -p -r1.87 mpacpi.c
--- mpacpi.c    27 Apr 2010 05:34:14 -0000      1.87
+++ mpacpi.c    8 Jul 2010 01:14:26 -0000
@@ -622,7 +622,9 @@ mpacpi_derive_bus(ACPI_HANDLE handle, st
                if (ACPI_FAILURE(rv))
                        goto out;
 
-               if (acpi_match_hid(devinfo, pciroot_hid)) {
+               if (acpi_match_hid(devinfo, pciroot_hid) &&
+                   ((devinfo->Valid & ACPI_VALID_STA) == 0 ||
+                   (devinfo->CurrentStatus & ACPI_STA_OK) == ACPI_STA_OK)) {
                        rv = mpacpi_get_bbn(acpi, parent, &bus);
                        if (ACPI_FAILURE(rv))
                                bus = 0;
@@ -783,6 +785,11 @@ mpacpi_pciroute(struct mpacpi_pcibus *mp
                    mpr->mpr_bus);
 
        mpb = &mp_busses[mpr->mpr_bus];
+
+       if (mpb->mb_name != NULL)
+               printf("mpacpi: PCI bus %d int routing already done!\n",
+                   mpr->mpr_bus);
+
        mpb->mb_intrs = NULL;
        mpb->mb_name = "pci";
        mpb->mb_idx = mpr->mpr_bus;


Home | Main Index | Thread Index | Old Index