amd64: failed attempt at numa...

To: tech-kern%NetBSD.org@localhost
Subject: amd64: failed attempt at numa...
From: Maxime Villard <max%m00nbsd.net@localhost>
Date: Sat, 28 Jul 2018 19:53:03 +0200

... but I'm posting the code in case someone cares. The idea was to duplicate
the kernel .text and .rodata sections into each numa node. Also the page tree
itself was partly duplicated.

The goal was to speed up the instruction fetches and some parts of the VA->PA
lookup: now that they were local to each node, they didn't require a remote
snoop on node0.

In practice, though, my GENERIC_NUMA is at best equivalent to GENERIC, at
worst slower. My patch speeds up some memory accesses, but adds some overhead
in the context switch. In the end, the former only compensates for the latter
and there is no visible performance gain.

It was tested on a 64-core AMD numa machine by Manuel Bouyer (thanks).

The code is here [1]:

	GENERIC_NUMA -> src/sys/arch/amd64/conf/GENERIC_NUMA
	numa.c       -> src/sys/arch/x86/x86/numa.c
	numa.h       -> src/sys/arch/x86/include/numa.h
	numa.diff    -> to be applied in src/sys/arch/

We can probably recycle the x86_numa_pagealloc function, put it in UVM, and
start using it without the rest of my numa implementation.

For example we could allocate curcpu(), the GDT, and other per-cpu areas, in
the local node. This doesn't add any overhead, and does reduce the access time
on these areas (they are accessed virtually all the time). Probably a good
start.

Maxime

[1] http://m00nbsd.net/garbage/numa/

Prev by Date: Re: workqueues ....
Next by Date: Re: New getsockopt2() syscall
Previous by Thread: workqueues ....
Next by Thread: [PATCH] Tag each .c file with the options that might have brought it in.
Indexes:

Home | Main Index | Thread Index | Old Index