Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 50% slowdown on 2 processor XEN 4.1.3 vs 1 processor system

On 27 November 2012 01:06, Harry Waddell 
<waddell%caravaninfotech.com@localhost> wrote:
> On Tue, 27 Nov 2012 00:09:32 -0600
> "Cherry G. Mathew" <cherry.g.mathew%gmail.com@localhost> wrote:
>> On 26 November 2012 03:00, Harry Waddell 
>> <waddell%caravaninfotech.com@localhost> wrote:
>> >
>> > I just built a new server with 2 x E5-2630 processors and was comparing the
>> > performance to a nearly identical xen server with 1 x E5-1660 processor and
>> > found that on a per core basis, instead of being about 50% faster ( as one 
>> > would expect from the clock speed given that the architecture is nearly 
>> > identical ), the E5-1660 system is %300 faster, so I ran some benchmarks 
>> > and started looking for a pattern.
>> >
>> > Both systems run NetBSD amd64 6.0-STABLE with xen 4.1.3. The 2
>> > processor system is running only 1/2 as fast with the XEN3_DOM0 kernel
>> I hope you realise that NetBSD dom0 is not MP capable ?
>> --
>> ~Cherry
> Not only do I know that, but I believe your name is next to that goal on the 
> xen roadmap. ;-) I did however setup a 24 vcpu domU just to see if it worked, 
> and I'm glad to say it did.
> Anyway, yes, all the performance measures I gathered were single threaded. 
> "top" showed that only one cpu core was engaged. The followup email I sent 
> earlier today with some sysbench results show that I'm only running single 
> threaded tests.
> Please let me know if you think of any info I can supply or any tests I can 
> run.

I'd be interested to know if this is a MP issue. If you could set
vcpus = 1 on the domU and then run the tests (I guess will have the
same results as dom0) would be useful to know.

There are multiple areas of loss of performance that we haven't tuned:

Specifically on 64bit.
i) Userland/Kernel switches do full TLB flushes. We can optimise this
trivially by using global pages.
ii) The interrupt/spl path on 6.0 has an spl escalation issue which I
suspect makes i/o "batch up" during spllower. I've fixed this in
-current. I'd be interested in i/o numbers from -current.
iii) Our spin mutices need optimising instead of tight looping.

I'm sure there are others that can be thought up.


Home | Main Index | Thread Index | Old Index