tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ~5 percent kernel performance loss in the last 3 weeks



(2013/11/20 0:02), Christoph Badura wrote:
After updating my -current kernel from 6.99.24 to 6.99.27 so I could
commit my ubsec(4) changes I noticed that under 6.99.27 I get between
3 and 8 percent less throughput on accelerated crypto ops.

Note that I am using the exact same ubsec(4) code[1] with both kernels, so
I think it is unlikely a problem with ubsec(4).
I did not change userland.

The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th.
The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz
running amd64.

There's nothing obvious to me in the dmesg diff that would give me a clue:

--- dmesg.6.99.24       2013-11-19 15:44:24.000000000 +0100
+++ dmesg.6.99.27       2013-11-19 15:39:29.000000000 +0100
@@ -4,7 +4,7 @@
  Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

-NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013
+NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013
        
bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC
  total memory = 32759 MB
  avail memory = 31792 MB
@@ -133,11 +133,14 @@
  acpicpu0: T5: FFH, lat   1 us, pow   285 mW,  38 %
  acpicpu0: T6: FFH, lat   1 us, pow   190 mW,  25 %
  acpicpu0: T7: FFH, lat   1 us, pow    95 mW,  13 %
+coretemp0 at cpu0: thermal sensor, 1 C resolution
  acpicpu1 at cpu1: ACPI CPU
+coretemp1 at cpu1: thermal sensor, 1 C resolution
  acpicpu2 at cpu2: ACPI CPU
+coretemp2 at cpu2: thermal sensor, 1 C resolution
  acpicpu3 at cpu3: ACPI CPU
+coretemp3 at cpu3: thermal sensor, 1 C resolution
  timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
-timecounter: Timecounter "TSC" frequency 2394079440 Hz quality 3000

 It's strange.

Could you revert the following change and test again?

and, show me the output of "cpuctl identify 0"

http://mail-index.netbsd.org/source-changes/2013/11/15/msg049232.html
Module Name:    src
Committed By:   msaitoh
Date:           Fri Nov 15 08:47:55 UTC 2013

Modified Files:
        src/sys/arch/x86/acpi: acpi_cpu_md.c
        src/sys/arch/x86/include: specialreg.h
        src/sys/arch/x86/pci: amdtemp.c
        src/sys/arch/x86/x86: coretemp.c cpu.c cpu_topology.c cpu_ucode_amd.c
            cpu_ucode_intel.c est.c identcpu.c intel_busclock.c lapic.c odcm.c
            patch.c powernow.c tprof_amdpmi.c tprof_pmi.c tsc.c viac7temp.c
        src/usr.sbin/cpuctl/arch: i386.c

Log Message:
 Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid)        (not changed)

CPUID_TO_FAMILY(cpuid)          (new)
CPUID_TO_MODEL(cpuid)           (new)

        Return the display family and the display model.
        The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid)      (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid)       (The old name was CPUID2MODEL)

        Only for the base field.

CPUID_TO_EXTFAMILY(cpuid)       (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid)        (The old name was CPUID2EXTMODEL)

        Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


To generate a diff of this commit:
cvs rdiff -u -r1.72 -r1.73 src/sys/arch/x86/acpi/acpi_cpu_md.c
cvs rdiff -u -r1.71 -r1.72 src/sys/arch/x86/include/specialreg.h
cvs rdiff -u -r1.17 -r1.18 src/sys/arch/x86/pci/amdtemp.c
cvs rdiff -u -r1.30 -r1.31 src/sys/arch/x86/x86/coretemp.c
cvs rdiff -u -r1.105 -r1.106 src/sys/arch/x86/x86/cpu.c
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/x86/x86/cpu_topology.c \
    src/sys/arch/x86/x86/powernow.c
cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/x86/cpu_ucode_amd.c \
    src/sys/arch/x86/x86/viac7temp.c
cvs rdiff -u -r1.3 -r1.4 src/sys/arch/x86/x86/cpu_ucode_intel.c \
    src/sys/arch/x86/x86/tprof_amdpmi.c
cvs rdiff -u -r1.27 -r1.28 src/sys/arch/x86/x86/est.c
cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/identcpu.c
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/x86/x86/intel_busclock.c
cvs rdiff -u -r1.46 -r1.47 src/sys/arch/x86/x86/lapic.c
cvs rdiff -u -r1.2 -r1.3 src/sys/arch/x86/x86/odcm.c
cvs rdiff -u -r1.21 -r1.22 src/sys/arch/x86/x86/patch.c
cvs rdiff -u -r1.12 -r1.13 src/sys/arch/x86/x86/tprof_pmi.c
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/x86/x86/tsc.c
cvs rdiff -u -r1.49 -r1.50 src/usr.sbin/cpuctl/arch/i386.c





  uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
  uhub0: 2 ports with 2 removable, self powered
  uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
@@ -175,7 +178,9 @@
  audio0 at pad0: half duplex, playback, capture
  boot device: wd0
  root on wd0a dumps on wd0b
+/: replaying log to memory
  root file system type: ffs
+/: replaying log to disk
  ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1
  wsdisplay0: screen 1 added (80x25, vt100 emulation)
  wsdisplay0: screen 2 added (80x25, vt100 emulation)

Results from "openssl speed -evp des-ede3-cbc -elapsed":

NetBSD 6.99.24/bcm5862

Doing des-ede3-cbc for 3s on 16 size blocks: 115774 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 116420 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 256 size blocks: 99863 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 1024 size blocks: 77760 des-ede3-cbc's in 3.10s
Doing des-ede3-cbc for 3s on 8192 size blocks: 29364 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) 
idea(int) blowfish(idx)
compiler: gcc version 4.5.3 (NetBSD nb2 20111202)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des-ede3-cbc       586.20k     2357.87k     8246.75k    25685.88k    79916.91k

NetBSD 6.99.27/bcm5862

Doing des-ede3-cbc for 3s on 16 size blocks: 111538 des-ede3-cbc's in 3.16s
Doing des-ede3-cbc for 3s on 64 size blocks: 107742 des-ede3-cbc's in 3.15s
Doing des-ede3-cbc for 3s on 256 size blocks: 92502 des-ede3-cbc's in 3.09s
Doing des-ede3-cbc for 3s on 1024 size blocks: 73305 des-ede3-cbc's in 3.12s
Doing des-ede3-cbc for 3s on 8192 size blocks: 28729 des-ede3-cbc's in 3.01s
OpenSSL 1.0.1c 10 May 2012
built on: NetBSD 6.1_STABLE
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) 
idea(int) blowfish(idx)
compiler: gcc version 4.5.3 (NetBSD nb2 20111202)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des-ede3-cbc       564.75k     2189.04k     7663.60k    24059.08k    78188.69k

--chris

[1] the code has yet uncommitted changes to avoid calling
bus_dmamap_destroy() from interrupt context that I'm waiting to get
reviewed before committing.



--
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index