tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: NUMA support



On Mon, Nov 10, 2008 at 05:11:37PM +0200, Christoph Egger wrote:
> 
> Hi!
> 
> I started to work on NUMA support. First step is to set up the
> topology.
> 
> It does this by scanning the ACPI SRAT table.
> If no ACPI SRAT table is present or if you boot w/ ACPI disabled,
> a one node NUMA system is faked.
> 
> The boot code also utilizes the ACPI MADT table to get more
> information so far possible.
> 
> I showed rmind@ my patch so far. I share his opinion, that
> it needs some more thoughts on the MI API side.
> Nonetheless, it's a start.
> 
> The next two items are to write a numactl(8) utility and
> to utilize the BIOS e820 memory map for more detailed
> and accurate information on the NUMA memory
> layout -  the memory holes in particular.
> 
> The dmesg snippet on a four-socket machine looks like this:
> 
> NUMA: SRAT table found
> NUMA: SLIT table not found
> ioapic0 at mainbus0 apid 0
> ioapic1 at mainbus0 apid 1
> numa0 at mainbus0
> cpu0 at numa0 apic 4 (BP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu1 at numa0 apic 5 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu2 at numa0 apic 6 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu3 at numa0 apic 7 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> numa0: memory: 0x0 - 0xa0000 (0xa0000, physical, raw, raw)
> numa0: memory: 0x100000 - 0x40000000 (0x3ff00000, physical, raw, raw)
> numa1 at mainbus0
> cpu4 at numa1 apic 8 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu5 at numa1 apic 9 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu6 at numa1 apic 10 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu7 at numa1 apic 11 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> numa1: memory: 0x40000000 - 0x80000000 (0x40000000, physical, raw, raw)
> numa2 at mainbus0
> cpu8 at numa2 apic 12 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu9 at numa2 apic 13 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu10 at numa2 apic 14 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu11 at numa2 apic 15 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> numa2: memory: 0x80000000 - 0xc0000000 (0x40000000, physical, raw, raw)
> numa3 at mainbus0
> cpu12 at numa3 apic 16 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu13 at numa3 apic 17 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu14 at numa3 apic 18 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> cpu15 at numa3 apic 19 (AP): AMD 686-class, 2300 MHz, id 0x100f40
> numa3: memory: 0xc0000000 - 0xd8000000 (0x18000000, physical, raw, raw)
> numa3: memory: 0x100000000 - 0x128000000 (0x28000000, physical, raw, raw)
> 
> I can also suspend and resume a full node:
> 
> # cpuctl list
> Num  HwId Unbound LWPs Interrupts     Last change
> ---- ---- ------------ -------------- ----------------------------
> 0    0    online       intr           Wed Nov  5 01:55:47 2008
> 1    1    online       intr           Wed Nov  5 01:55:47 2008
> 2    2    online       intr           Wed Nov  5 01:55:47 2008
> 3    3    online       intr           Wed Nov  5 01:55:47 2008
> 4    4    online       intr           Wed Nov  5 01:55:47 2008
> 5    5    online       intr           Wed Nov  5 01:55:47 2008
> 6    6    online       intr           Wed Nov  5 01:55:47 2008
> 7    7    online       intr           Wed Nov  5 01:55:47 2008
> 8    8    online       intr           Wed Nov  5 01:55:47 2008
> 9    9    online       intr           Wed Nov  5 01:55:47 2008
> 10   a    online       intr           Wed Nov  5 01:55:47 2008
> 11   b    online       intr           Wed Nov  5 01:55:47 2008
> 12   c    online       intr           Wed Nov  5 01:55:47 2008
> 13   d    online       intr           Wed Nov  5 01:55:47 2008
> 14   e    online       intr           Wed Nov  5 01:55:47 2008
> 15   f    online       intr           Wed Nov  5 01:55:47 2008
> 
> # drvctl -l mainbus0
> mainbus0 ioapic0
> mainbus0 ioapic1
> mainbus0 numa0
> mainbus0 numa1
> mainbus0 numa2
> mainbus0 numa3
> mainbus0 acpi0
> mainbus0 pci0
> mainbus0 pci8

Are pci0 and pci8 and the other peripheral buses more properly attached
to a NUMA node?

Currently, numa0..numaN are just aggregations of CPUs, at least as far
as pmf(9) is concerned.  AFAICT, a NUMA node is a real physical entity
with RAM attached.  Looking ahead, what will it mean for a NUMA node
to be suspended?  Will the system vacate that node's RAM and turn off
DRAM refresh?  What, for that matter, will it mean to detach a NUMA node?

Dave

-- 
David Young             OJC Technologies
dyoung%ojctech.com@localhost      Urbana, IL * (217) 278-3933 ext 24


Home | Main Index | Thread Index | Old Index