tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: ~5 percent kernel performance loss in the last 3 weeks
(2013/11/20 0:02), Christoph Badura wrote:
After updating my -current kernel from 6.99.24 to 6.99.27 so I could
commit my ubsec(4) changes I noticed that under 6.99.27 I get between
3 and 8 percent less throughput on accelerated crypto ops.
Note that I am using the exact same ubsec(4) code[1] with both kernels, so
I think it is unlikely a problem with ubsec(4).
I did not change userland.
The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th.
The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz
running amd64.
There's nothing obvious to me in the dmesg diff that would give me a clue:
--- dmesg.6.99.24 2013-11-19 15:44:24.000000000 +0100
+++ dmesg.6.99.27 2013-11-19 15:39:29.000000000 +0100
@@ -4,7 +4,7 @@
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
-NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013
+NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013
bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC
total memory = 32759 MB
avail memory = 31792 MB
@@ -133,11 +133,14 @@
acpicpu0: T5: FFH, lat 1 us, pow 285 mW, 38 %
acpicpu0: T6: FFH, lat 1 us, pow 190 mW, 25 %
acpicpu0: T7: FFH, lat 1 us, pow 95 mW, 13 %
+coretemp0 at cpu0: thermal sensor, 1 C resolution
acpicpu1 at cpu1: ACPI CPU
+coretemp1 at cpu1: thermal sensor, 1 C resolution
acpicpu2 at cpu2: ACPI CPU
+coretemp2 at cpu2: thermal sensor, 1 C resolution
acpicpu3 at cpu3: ACPI CPU
+coretemp3 at cpu3: thermal sensor, 1 C resolution
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
-timecounter: Timecounter "TSC" frequency 2394079440 Hz quality 3000
It's strange.
Could you revert the following change and test again?
and, show me the output of "cpuctl identify 0"
http://mail-index.netbsd.org/source-changes/2013/11/15/msg049232.html
Module Name: src
Committed By: msaitoh
Date: Fri Nov 15 08:47:55 UTC 2013
Modified Files:
src/sys/arch/x86/acpi: acpi_cpu_md.c
src/sys/arch/x86/include: specialreg.h
src/sys/arch/x86/pci: amdtemp.c
src/sys/arch/x86/x86: coretemp.c cpu.c cpu_topology.c cpu_ucode_amd.c
cpu_ucode_intel.c est.c identcpu.c intel_busclock.c lapic.c odcm.c
patch.c powernow.c tprof_amdpmi.c tprof_pmi.c tsc.c viac7temp.c
src/usr.sbin/cpuctl/arch: i386.c
Log Message:
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.
CPUID_TO_STEPPING(cpuid) (not changed)
CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)
Return the display family and the display model.
The macro names are the same as FreeBSD.
CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)
Only for the base field.
CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)
Only for the extended field.
See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
To generate a diff of this commit:
cvs rdiff -u -r1.72 -r1.73 src/sys/arch/x86/acpi/acpi_cpu_md.c
cvs rdiff -u -r1.71 -r1.72 src/sys/arch/x86/include/specialreg.h
cvs rdiff -u -r1.17 -r1.18 src/sys/arch/x86/pci/amdtemp.c
cvs rdiff -u -r1.30 -r1.31 src/sys/arch/x86/x86/coretemp.c
cvs rdiff -u -r1.105 -r1.106 src/sys/arch/x86/x86/cpu.c
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/x86/x86/cpu_topology.c \
src/sys/arch/x86/x86/powernow.c
cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/x86/cpu_ucode_amd.c \
src/sys/arch/x86/x86/viac7temp.c
cvs rdiff -u -r1.3 -r1.4 src/sys/arch/x86/x86/cpu_ucode_intel.c \
src/sys/arch/x86/x86/tprof_amdpmi.c
cvs rdiff -u -r1.27 -r1.28 src/sys/arch/x86/x86/est.c
cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/identcpu.c
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/x86/x86/intel_busclock.c
cvs rdiff -u -r1.46 -r1.47 src/sys/arch/x86/x86/lapic.c
cvs rdiff -u -r1.2 -r1.3 src/sys/arch/x86/x86/odcm.c
cvs rdiff -u -r1.21 -r1.22 src/sys/arch/x86/x86/patch.c
cvs rdiff -u -r1.12 -r1.13 src/sys/arch/x86/x86/tprof_pmi.c
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/x86/x86/tsc.c
cvs rdiff -u -r1.49 -r1.50 src/usr.sbin/cpuctl/arch/i386.c
uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
@@ -175,7 +178,9 @@
audio0 at pad0: half duplex, playback, capture
boot device: wd0
root on wd0a dumps on wd0b
+/: replaying log to memory
root file system type: ffs
+/: replaying log to disk
ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
Results from "openssl speed -evp des-ede3-cbc -elapsed":
NetBSD 6.99.24/bcm5862
Doing des-ede3-cbc for 3s on 16 size blocks: 115774 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 116420 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 256 size blocks: 99863 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 1024 size blocks: 77760 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 8192 size blocks: 29364 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial)
idea(int) blowfish(idx)
compiler: gcc version 4.5.3 (NetBSD nb2 20111202)
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des-ede3-cbc 586.20k 2357.87k 8246.75k 25685.88k 79916.91k
NetBSD 6.99.27/bcm5862
Doing des-ede3-cbc for 3s on 16 size blocks: 111538 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 107742 des-ede3-cbc's in 3.15s
Doing des-ede3-cbc for 3s on 256 size blocks: 92502 des-ede3-cbc's in 3.09s
Doing des-ede3-cbc for 3s on 1024 size blocks: 73305 des-ede3-cbc's in 3.12s
Doing des-ede3-cbc for 3s on 8192 size blocks: 28729 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial)
idea(int) blowfish(idx)
compiler: gcc version 4.5.3 (NetBSD nb2 20111202)
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des-ede3-cbc 564.75k 2189.04k 7663.60k 24059.08k 78188.69k
--chris
[1] the code has yet uncommitted changes to avoid calling
bus_dmamap_destroy() from interrupt context that I'm waiting to get
reviewed before committing.
--
-----------------------------------------------
SAITOH Masanobu (msaitoh%execsw.org@localhost
msaitoh%netbsd.org@localhost)
Home |
Main Index |
Thread Index |
Old Index