Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xennet performance collapses with multiple vCPU



On Wed, Mar 25, 2020 at 07:06:58PM +0100, Jarom?r Dole?ek wrote:
> Le mer. 25 mars 2020 ? 17:16, Jarom?r Dole?ek <jaromir.dolecek%gmail.com@localhost> a
> ?crit :
> 
> > Le mar. 24 mars 2020 ? 15:13, Stephen Borrill <netbsd%precedence.co.uk@localhost> a
> > ?crit :
> >
> >> All using NetBSD 9.99.51 (XEN3_DOMU) #0: Sun Mar 22 17:35:29 UTC 2020
> >>
> >> Note that increasing the vCPUs from 1 to 4 (2 gives same result as 4)
> >> drops the throughput to about 10%. No offloads configured in domU:
> >>
> >>
> > I found out the problem appeared between 2020-01-08 00:00 (that was still
> > good) and 2020-01-14 00:00 (that is already bad).
> >
> > There were some CPU scheduling changes by Andrew Doran around that time,
> > so it's likely the culprit.
> >
> > I'm working on bisecting this further. It's complicated, between 01-08 and
> > 01-14 there were some bugs which prevent the Xen DomU MP kernel from
> > booting.
> >
> 
> I've now tracked it down to this change:
> 
> Module Name:    src
> Committed By:   ad
> Date:           Mon Jan 13 20:30:08 UTC 2020
> 
> Modified Files:
>         src/sys/kern: subr_cpu.c
> 
> Log Message:
> Fix some more bugs in the topo stuff, that prevented it from working
> properly with fake topo info + MP.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.10 -r1.11 src/sys/kern/subr_cpu.c
> 
> After this change the DomU even boots visibly slower. Maybe this change
> makes MP system scheduler use all CPUs, but introduces too much switching
> between them? Andy, can you have a look?
> 
> I'll meanwhile check if there is anything obvious in the fake topology code.

I spent some time looking into this over the weekend.  It's easily
reproducible, and I don't see anything that looks strange on the various
systems involved.  I also don't see why it would be related to the
scheduler.

Native x86 with topology, fast softints and preemption disabled (by many
code hacks) doesn't exhibit this problem.  Setting CPUs offline in the domU,
various combinations of, doesn't have any effect either.

I'm going with the idea that it's something particular to interrupt dispatch
on Xen, although not necessarily in MD code.  I have not gotten dtrace to
work on Xen yet because it seems CTF is not available for the kernel.  I
will try upping HZ to 1000 and see if that is affects it, which might
provide a useful clue.

Anyway I'll keep looking into it as time permits.

Andrew


Home | Main Index | Thread Index | Old Index