Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xennet performance collapses with multiple vCPU



On Sun, Apr 05, 2020 at 08:57:45PM +0000, Andrew Doran wrote:
> > [...]
> > I've now tracked it down to this change:
> > 
> > Module Name:    src
> > Committed By:   ad
> > Date:           Mon Jan 13 20:30:08 UTC 2020
> > 
> > Modified Files:
> >         src/sys/kern: subr_cpu.c
> > 
> > Log Message:
> > Fix some more bugs in the topo stuff, that prevented it from working
> > properly with fake topo info + MP.
> > 
> > 
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.10 -r1.11 src/sys/kern/subr_cpu.c
> > 
> > After this change the DomU even boots visibly slower. Maybe this change
> > makes MP system scheduler use all CPUs, but introduces too much switching
> > between them? Andy, can you have a look?
> > 
> > I'll meanwhile check if there is anything obvious in the fake topology code.
> 
> I spent some time looking into this over the weekend.  It's easily
> reproducible, and I don't see anything that looks strange on the various
> systems involved.  I also don't see why it would be related to the
> scheduler.

Hello,
some more data on this issue. With ping I can notice a consistent 10ms delay:
PING nephtys.lip6.fr (195.83.118.1): 56 data bytes
64 bytes from 195.83.118.1: icmp_seq=0 ttl=253 time=8.964715 ms
64 bytes from 195.83.118.1: icmp_seq=1 ttl=253 time=10.080450 ms
64 bytes from 195.83.118.1: icmp_seq=2 ttl=253 time=10.079291 ms
64 bytes from 195.83.118.1: icmp_seq=3 ttl=253 time=10.079525 ms
64 bytes from 195.83.118.1: icmp_seq=4 ttl=253 time=10.083389 ms
64 bytes from 195.83.118.1: icmp_seq=5 ttl=253 time=10.080444 ms
64 bytes from 195.83.118.1: icmp_seq=6 ttl=253 time=10.079615 ms
64 bytes from 195.83.118.1: icmp_seq=7 ttl=253 time=10.081661 ms

Sometimes it drops and stays at 5ms.

With a single CPU, the RTT is less than one millisecond.
Keeping both CPUs busy wit a while(1) loop doesn't help.

It looks like something is delayed to the next clock tick.
Note that the dom0 is idle and no other VMs are running.

I'm seeing the same behavior in the bouyer-xenpvh branch, where Xen
now has fast softints and kpreempt. Disabling the later, or both, doens't
change anything. I'm also seeing the same with a kernel from
bouyer-xenpvh-base so it's not related to changes in the branch.

Any idea welcome.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index