Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 50% slowdown on 2 processor XEN 4.1.3 vs 1 processor system



On Mon, 26 Nov 2012 11:39:28 +0100
Roger Pau Monné <roger.pau%citrix.com@localhost> wrote:

> Hello,
> 
> On 26/11/12 10:00, Harry Waddell wrote:
> > 
> > I just built a new server with 2 x E5-2630 processors and was comparing the 
> > performance to a nearly identical xen server with 1 x E5-1660 processor and
> > found that on a per core basis, instead of being about 50% faster ( as one 
> > would expect from the clock speed given that the architecture is nearly 
> > identical ), the E5-1660 system is %300 faster, so I ran some benchmarks 
> > and started looking for a pattern. 
> > 
> > Both systems run NetBSD amd64 6.0-STABLE with xen 4.1.3. The 2
> > processor system is running only 1/2 as fast with the XEN3_DOM0 kernel
> > as with the GENERIC kernel, e.g. with simple single threaded benchmarks
> > like dhrystone and whetstone. ( 22624434.0 vs 10416667.0 dhrystones -- It's 
> > also clearly slower when compiling, etc... ) The one processor system works 
> > just fine. 
> 
> I'm a little bit lost here, are you comparing the speed of the Dom0 vs a
> baremetal install?

Yes. A xen domU and a xen dom0 are both less than 1/2 the speed of a baremetal 
install with a GENERIC kernel. I have a single physical cpu system that is very 
similar which does not show this disparity.


> 
> > So far, I've tried: 
> > 
> > 1. a netbsd 6.0 release XEN3_DOM0 kernel
> > 
> > 2. run the test in a netbsd domU
> > 
> > 3. disabled HT and NUMA
> > 
> > 4. used xl to create pools based on NUMA nodes and assigned/pinned dom0 to 
> > one
> > 
> > 5. compiled and installed xen 4.2.0rc4
> 
> Have you checked the output of xl info, to see if the number of CPUs,
> NUMA nodes and clock speed is consistent?
> 

looks fine, at least to me. 

release                : 6.0
version                : NetBSD 6.0 (XEN3_DOM0)
machine                : amd64
nr_cpus                : 24
max_cpu_id             : 23
nr_nodes               : 2
cores_per_socket       : 6
threads_per_core       : 2
cpu_mhz                : 2300
hw_caps                : 
bfebfbff:2c100800:00000000:00003f40:17bee3ff:00000000:00000001:00000000
virt_caps              : hvm hvm_directio
total_memory           : 65508
free_memory            : 60568
sharing_freed_memory   : 0
sharing_used_memory    : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 1
xen_extra              : .3
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=4096M dom0_max_vcpus=1
cc_compiler            : gcc (NetBSD nb2 20110806) 4.5.3
cc_compile_by          : root
cc_compile_domain      : 
cc_compile_date        : Mon Nov 26 08:09:33 UTC 2012
xend_config_format     : 4


> Also, I would recommend disabling any kind of energy savings in the BIOS
> and trying again.
> 

I had  the cpu power controls set to "energy efficient" and "balanced 
performance" in the BIOS. I just disabled all the cpu power management and it 
doesn't seem to have any effect.

> > 
> > and nothing seems to influence the disparity in the performance. 
> > 
> > Has anyone else seen similar behavior, or does anyone have any suggestions 
> > on how to proceed? Removing a cpu is kind of dangerous, so I'd like to 
> > avoid that, but if there is a good xen dom0 linux live CD, or something 
> > similar, I could try booting and testing under linux? It was pretty 
> > difficult getting netbsd to install and boot on my 6TB raid 5 that
> > I'd hate to blow that away for such a test, but I do have usb keys I could 
> > install into etc. I assume that if linux dom0's had this issue I would have 
> > found something about it during my searches, so I'm guessing this is a BSD 
> > issue, but that's still just an assumption, so if there's an easyish way to 
> > test that theory, I'll do it. 
> 
> I've checked some time ago the performance of Linux vs NetBSD as a PV
> guests on both NetBSD and Linux Dom0, and the difference was not that
> big: http://www.slideshare.net/xen_com_mgr/free-and-net-bsd-xen-roadmap
> (see the last part of the slides for the perf results)
> 
> 


since the slides use sysbench, I thought maybe I'll try sysbench's cpu test. 

XEN3_DOM0 kernel:
-----------------

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          27.0904s
    total number of events:              10000
    total time taken by event execution: 27.0786
    per-request statistics:
         min:                                  2.70ms
         avg:                                  2.71ms
         max:                                  3.39ms
         approx.  95 percentile:               2.71ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   27.0786/0.00


GENERIC
-------

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          12.5222s
    total number of events:              10000
    total time taken by event execution: 12.5140
    per-request statistics:
         min:                                  1.25ms
         avg:                                  1.25ms
         max:                                  1.35ms
         approx.  95 percentile:               1.25ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   12.5140/0.00

---------


It doesn't seem to matter how I measure it, the cpu performance is much lower 
running the XEN3_DOM0 kernel than GENERIC, but only on a two physical processor 
system. Here are the result for a very similar, but faster clock speed, 1 
processor system's dom0. Please note,
this systems is actually in use running several domU's.

release                : 6.0_STABLE
version                : NetBSD 6.0_STABLE (XEN3_DOM0)
machine                : amd64
nr_cpus                : 12
nr_nodes               : 1
cores_per_socket       : 6
threads_per_core       : 2
cpu_mhz                : 3300
hw_caps                : 
bfebfbff:2c100800:00000000:00003f40:13bee3ff:00000000:00000001:00000000
virt_caps              : hvm hvm_directio
total_memory           : 65508
free_memory            : 54439
free_cpus              : 0
xen_major              : 4
xen_minor              : 1
xen_extra              : .3

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          9.9198s
    total number of events:              10000
    total time taken by event execution: 9.9155
    per-request statistics:
         min:                                  0.98ms
         avg:                                  0.99ms
         max:                                 40.91ms
         approx.  95 percentile:               0.99ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   9.9155/0.00


I suppose I could still have missed something, but I'm pretty sure this is a 
bug.
What I don't know is if it's a XEN bug or a NetBSD bug yet, so I'm not sure 
where to
submit it.

Also, I booted the XEN live CD based on debian and XEN 3.2, but the os is too 
old to 
support the i350 ethernet on this new system, so without a network, I couldn't 
get what I needed to benchmark anything. I may be able to work around this with 
a USB ethernet device, but right now, I'm doing all of this remotely using IPMI.

Thanks for looking into this. 

Harry Waddell 




Home | Main Index | Thread Index | Old Index