Source-Changes-D archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: CVS commit: src/sys/uvm
Hi !
With this state of January 2nd we ran some tests for robustness and 
timing with our database setup:
Machine:
Mainboard: S2600WFT
CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
machdep.spectre_v1.mitigated = 0
machdep.spectre_v2.hwmitigated = 1
machdep.spectre_v2.swmitigated = 1
machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
machdep.spectre_v4.mitigated = 0
machdep.spectre_v4.method = (none)
machdep.mds.mitigated = 0
machdep.mds.method = (none)
machdep.taa.mitigated = 0
machdep.taa.method = [MDS]
Memory:
hw.physmem64 = 549446447104
hw.usermem64 = 549438365696
This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory 
and 32 cores.
Testcase is restoring a 1Tb Postgresql-11 database with varying degres 
of Postresql pg_restore parallelism.
Why did we do the tests? The machine was installed with 8.99.24 as that 
supported the memory setup.
The machine was not able to reliably copy with many db/restore processes 
and large memory - see
 PR kern/54209: NetBSD 8 large memory performance extremely low
 PR kern/54210: NetBSD-8 processes presumably not exiting
for details.
With Andrew Doran's work on the vm system we restarted the tests.
The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
TEST 1
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1826.599u 1752.878s 10:36:03.83 9.3%    0+0k 397+0io 1789pf+0w
Higher levels of parallelism lead to a higher probability for catatonic 
systems with increasing restore parallelism.
Trouble starts around -j8 and gets worse at higher levels.
TEST 2
9.99.33 from around Fri Jan  3 16:14:02 CET 2020
FRESH BOOT
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
2047.925u 1191.878s 14:24:15.23 6.2%    0+0k 0+0io 5784pf+0w
This survived a -j28 run that was not possible with 8.99.24 - this a a 
big step forward, but ~4h slower real time.
TEST 3
FRESH BOOT
9.99.34 from around Mon Jan  6 14:43:01
time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
1816.348u 1792.530s 10:56:02.56 9.1%    0+0k 395+0io 5620pf+0w
-j5 run to compare to 9.99.33 - big improvement in real run time though 
system time went up.
TEST 4
State after TEST 3 run to compare to 8.99.24
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1706.548u 1748.623s 11:26:38.87 8.3%    0+0k 0+0io 1420pf+0w
This ran faster that -j28 - probably due to less contention, but 50 min 
slower that 8.99.24 after fresh boot.
TEST 5:
re-run TEST 4 with fresh boot for 8.99.24 comparison
time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
1710.665u 1611.083s 9:14:56.86 9.9%     0+0k 398+0io 1504pf+0w
better the 8.99.24 for real time.
There seems no big difference in system time between 8.99.24 and 
9.99.34, but a big improvement in robustness.
The lockups don't seem to happen any more and there are a fewer short 
term system freezes and the systems remains
responsive with 9.99.34.
The big differences in real time are interesting but the cause for that 
may not be easy to pinpoint. The database
runs on an nvme:
nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: for admin queue interrupting at msix4 vec 0
nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
[...]
nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
ld0 at nvme0 nsid 1
ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168 
sectors
And we are seeing transfer rates up to 300Mb/s and up 80% busy on the 
complex I/O (load) and CPU (build index) workload.
So in summary we a a big step forward in robustness.
Thanks to Andrew for the big improvements here.
Frank
On 01/02/20 03:00, Andrew Doran wrote:
> Module Name:    src
> Committed By:    ad
> Date:        Thu Jan  2 02:00:35 UTC 2020
>
> Modified Files:
>     src/sys/uvm: uvm_amap.c uvm_amap.h
>
> Log Message:
> Back out the amap allocation  changes from earlier today - have seen 
a panic
> with them.  Retain the lock changes.
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.113 -r1.114 src/sys/uvm/uvm_amap.c
> cvs rdiff -u -r1.38 -r1.39 src/sys/uvm/uvm_amap.h
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>
Home |
Main Index |
Thread Index |
Old Index