Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

~5 percent kernel performance loss in the last 3 weeks



After updating my -current kernel from 6.99.24 to 6.99.27 so I could
commit my ubsec(4) changes I noticed that under 6.99.27 I get between
3 and 8 percent less throughput on accelerated crypto ops.

Note that I am using the exact same ubsec(4) code[1] with both kernels, so
I think it is unlikely a problem with ubsec(4).
I did not change userland.

The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th.
The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz
running amd64.

There's nothing obvious to me in the dmesg diff that would give me a clue:

--- dmesg.6.99.24       2013-11-19 15:44:24.000000000 +0100
+++ dmesg.6.99.27       2013-11-19 15:39:29.000000000 +0100
@@ -4,7 +4,7 @@
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.
 
-NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013
+NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013
        
bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC
 total memory = 32759 MB
 avail memory = 31792 MB
@@ -133,11 +133,14 @@
 acpicpu0: T5: FFH, lat   1 us, pow   285 mW,  38 %
 acpicpu0: T6: FFH, lat   1 us, pow   190 mW,  25 %
 acpicpu0: T7: FFH, lat   1 us, pow    95 mW,  13 %
+coretemp0 at cpu0: thermal sensor, 1 C resolution
 acpicpu1 at cpu1: ACPI CPU
+coretemp1 at cpu1: thermal sensor, 1 C resolution
 acpicpu2 at cpu2: ACPI CPU
+coretemp2 at cpu2: thermal sensor, 1 C resolution
 acpicpu3 at cpu3: ACPI CPU
+coretemp3 at cpu3: thermal sensor, 1 C resolution
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
-timecounter: Timecounter "TSC" frequency 2394079440 Hz quality 3000
 uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
@@ -175,7 +178,9 @@
 audio0 at pad0: half duplex, playback, capture
 boot device: wd0
 root on wd0a dumps on wd0b
+/: replaying log to memory
 root file system type: ffs
+/: replaying log to disk
 ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1
 wsdisplay0: screen 1 added (80x25, vt100 emulation)
 wsdisplay0: screen 2 added (80x25, vt100 emulation)

Results from "openssl speed -evp des-ede3-cbc -elapsed":

NetBSD 6.99.24/bcm5862

Doing des-ede3-cbc for 3s on 16 size blocks: 115774 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 116420 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 256 size blocks: 99863 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 1024 size blocks: 77760 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 8192 size blocks: 29364 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) 
idea(int) blowfish(idx) 
compiler: gcc version 4.5.3 (NetBSD nb2 20111202) 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des-ede3-cbc       586.20k     2357.87k     8246.75k    25685.88k    79916.91k

NetBSD 6.99.27/bcm5862

Doing des-ede3-cbc for 3s on 16 size blocks: 111538 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 107742 des-ede3-cbc's in 3.15s
Doing des-ede3-cbc for 3s on 256 size blocks: 92502 des-ede3-cbc's in 3.09s
Doing des-ede3-cbc for 3s on 1024 size blocks: 73305 des-ede3-cbc's in 3.12s
Doing des-ede3-cbc for 3s on 8192 size blocks: 28729 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) 
idea(int) blowfish(idx) 
compiler: gcc version 4.5.3 (NetBSD nb2 20111202) 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des-ede3-cbc       564.75k     2189.04k     7663.60k    24059.08k    78188.69k

--chris

[1] the code has yet uncommitted changes to avoid calling
bus_dmamap_destroy() from interrupt context that I'm waiting to get
reviewed before committing.


Home | Main Index | Thread Index | Old Index