Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dom0 xvif mbuf issues



On Thu, 4 Oct 2018 17:01:21 +0200
Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote:

> On Mon, Oct 01, 2018 at 05:10:19PM -0700, Harry Waddell wrote:
> > > Looks like temporary memory shortage in the dom0 (this is a MGETHDR failing,
> > > not MCLGET, so the nmbclusters limit is not relevant).
> > > How many mbufs were allocated ?
> > >   
> > At the time of the hang, I have no idea. 
> > 
> > It's around 512 whenever I check. 
> > 
> > [root@xen-09:conf]> netstat -m
> > 515 mbufs in use:
> > 	513 mbufs allocated to data
> > 	2 mbufs allocated to packet headers
> > 0 calls to protocol drain routines  
> 
> Looks like the receive buffers for the ethernet interface.
> 
> > > > It hung again, but with a new 
> > > > error scrolling on the console. "xennetback: got only 63 new mcl pages"    
> > > 
> > > This would point to a memory shortage in the hypervisor itself.
> > > Do you have enough free memory (xl info) ?
> > >   
> > total_memory           : 131037
> > free_memory            : 26601
> > sharing_freed_memory   : 0
> > sharing_used_memory    : 0
> > outstanding_claims     : 0  
> 
> that should be plenty. No idea why xennetback couldn't get the 64 pages
> it asked.
> 
> > > > This is a production system with about 30 guests. I just want it to work like it used to.     
> > > 
> > > how many vifs is there in the dom0 ?
> > >   
> > 
> > I expect this is not an ideal way to do this but ...
> > 
> > (for i in `xl list | awk '{print $1}'`;do xl network-list $i | grep vif ;done) | wc -l
> >       57
> > 
> > Several of the systems are part of a cluster where hosts are multihomed on 2 of 4 networks
> > to test a customer setup. Most of my systems have < 30, except for one other with 42. 
> > The others don't hang like this one does.   
> 
> I have more than 100 here.
> 
> Maybe you should try reverting
> kern.sbmax=1048576                   
> net.inet.tcp.recvbuf_max=1048576     
> net.inet.tcp.sendbuf_max=1048576     
> 
> to their default values.
> 
> -- 
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>      NetBSD: 26 ans d'experience feront toujours la difference
> --

No doubt sage advice, but it feels like we're missing something. All my servers have the same 
values, or higher, e.g.

[root@xen-12:~]> netstat -m
2050 mbufs in use:
	2049 mbufs allocated to data
	1 mbufs allocated to packet headers
200 calls to protocol drain routines

[root@xen-12:~]> egrep -v  '^#' /etc/sysctl.conf
ddb.onpanic?=0
kern.sbmax=4194304
net.inet.tcp.sendbuf_max=1048576
net.inet.tcp.recvbuf_max=1048576

[root@xen-12:~]> xl info | grep mem
total_memory           : 262032
free_memory            : 116527
sharing_freed_memory   : 0
sharing_used_memory    : 0
xen_commandline        : dom0_mem=8192M,max:8192M sched=credit2
dom0_nodes=1 dom0_max_vcpus=1 dom0_vcpus_pin

and no similar problems on any of those. 

I brought up the azure-fuse system and pushed a bunch of data through it. At no point 
did the mbuf use go above 700. I'm going to try and gather some more data during the upcoming
weekend automated testing to see if it's exercising things in a weird way before I change anything else
and return sbmax to the default, etc... 

I put this crude command into crontab: 

logger -p local0.warn `netstat -m | tr '\n' '|'`

so it might be interesting to see what happens this weekend. 

Thanks again. 

HW



Home | Main Index | Thread Index | Old Index