Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/uvm



Hi !

With this state of January 2nd we ran some tests for robustness and timing with our database setup:

Machine:

Mainboard: S2600WFT

CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

machdep.spectre_v1.mitigated = 0
machdep.spectre_v2.hwmitigated = 1
machdep.spectre_v2.swmitigated = 1
machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
machdep.spectre_v4.mitigated = 0
machdep.spectre_v4.method = (none)
machdep.mds.mitigated = 0
machdep.mds.method = (none)
machdep.taa.mitigated = 0
machdep.taa.method = [MDS]

Memory:

hw.physmem64 = 549446447104
hw.usermem64 = 549438365696

This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory and 32 cores.

Testcase is restoring a 1Tb Postgresql-11 database with varying degres of Postresql pg_restore parallelism.

Why did we do the tests? The machine was installed with 8.99.24 as that supported the memory setup.

The machine was not able to reliably copy with many db/restore processes and large memory - see

 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting

for details.

With Andrew Doran's work on the vm system we restarted the tests.

The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
TEST 1
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1826.599u 1752.878s 10:36:03.83 9.3%    0+0k 397+0io 1789pf+0w

Higher levels of parallelism lead to a higher probability for catatonic systems with increasing restore parallelism.
Trouble starts around -j8 and gets worse at higher levels.

TEST 2
9.99.33 from around Fri Jan  3 16:14:02 CET 2020
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
2047.925u 1191.878s 14:24:15.23 6.2%    0+0k 0+0io 5784pf+0w

This survived a -j28 run that was not possible with 8.99.24 - this a a big step forward, but ~4h slower real time.

TEST 3
FRESH BOOT
9.99.34 from around Mon Jan  6 14:43:01
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
1816.348u 1792.530s 10:56:02.56 9.1%    0+0k 395+0io 5620pf+0w

-j5 run to compare to 9.99.33 - big improvement in real run time though system time went up.

TEST 4
State after TEST 3 run to compare to 8.99.24
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1706.548u 1748.623s 11:26:38.87 8.3%    0+0k 0+0io 1420pf+0w

This ran faster that -j28 - probably due to less contention, but 50 min slower that 8.99.24 after fresh boot.

TEST 5:
re-run TEST 4 with fresh boot for 8.99.24 comparison
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1710.665u 1611.083s 9:14:56.86 9.9%     0+0k 398+0io 1504pf+0w

better the 8.99.24 for real time.

There seems no big difference in system time between 8.99.24 and 9.99.34, but a big improvement in robustness. The lockups don't seem to happen any more and there are a fewer short term system freezes and the systems remains
responsive with 9.99.34.

The big differences in real time are interesting but the cause for that may not be easy to pinpoint. The database
runs on an nvme:
nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: for admin queue interrupting at msix4 vec 0
nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
[...]
nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
ld0 at nvme0 nsid 1
ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168 sectors

And we are seeing transfer rates up to 300Mb/s and up 80% busy on the complex I/O (load) and CPU (build index) workload.

So in summary we a a big step forward in robustness.

Thanks to Andrew for the big improvements here.

Frank


On 01/02/20 03:00, Andrew Doran wrote:
> Module Name:    src
> Committed By:    ad
> Date:        Thu Jan  2 02:00:35 UTC 2020
>
> Modified Files:
>     src/sys/uvm: uvm_amap.c uvm_amap.h
>
> Log Message:
> Back out the amap allocation changes from earlier today - have seen a panic
> with them.  Retain the lock changes.
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.113 -r1.114 src/sys/uvm/uvm_amap.c
> cvs rdiff -u -r1.38 -r1.39 src/sys/uvm/uvm_amap.h
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>



Home | Main Index | Thread Index | Old Index