Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dom0 xvif mbuf issues



On Thu, 4 Oct 2018 15:39:47 -0700
Harry Waddell <waddell%caravaninfotech.com@localhost> wrote:

> On Thu, 4 Oct 2018 17:01:21 +0200
> Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote:
> 
> > On Mon, Oct 01, 2018 at 05:10:19PM -0700, Harry Waddell wrote:  
> > > > Looks like temporary memory shortage in the dom0 (this is a MGETHDR failing,
> > > > not MCLGET, so the nmbclusters limit is not relevant).
> > > > How many mbufs were allocated ?
> > > >     
> > > At the time of the hang, I have no idea. 
> > > 
> > > It's around 512 whenever I check. 
> > > 
> > > [root@xen-09:conf]> netstat -m
> > > 515 mbufs in use:
> > > 	513 mbufs allocated to data
> > > 	2 mbufs allocated to packet headers
> > > 0 calls to protocol drain routines    
> > 
> > Looks like the receive buffers for the ethernet interface.
> >   
> > > > > It hung again, but with a new 
> > > > > error scrolling on the console. "xennetback: got only 63 new mcl pages"      
> > > > 
> > > > This would point to a memory shortage in the hypervisor itself.
> > > > Do you have enough free memory (xl info) ?
> > > >     
> > > total_memory           : 131037
> > > free_memory            : 26601
> > > sharing_freed_memory   : 0
> > > sharing_used_memory    : 0
> > > outstanding_claims     : 0    
> > 
> > that should be plenty. No idea why xennetback couldn't get the 64 pages
> > it asked.
> >   
> > > > > This is a production system with about 30 guests. I just want it to work like it used to.       
> > > > 
> > > > how many vifs is there in the dom0 ?
> > > >     
> > > 
> > > I expect this is not an ideal way to do this but ...
> > > 
> > > (for i in `xl list | awk '{print $1}'`;do xl network-list $i | grep vif ;done) | wc -l
> > >       57
> > > 
> > > Several of the systems are part of a cluster where hosts are multihomed on 2 of 4 networks
> > > to test a customer setup. Most of my systems have < 30, except for one other with 42. 
> > > The others don't hang like this one does.     
> > 
> > I have more than 100 here.
> > 
> > Maybe you should try reverting
> > kern.sbmax=1048576                   
> > net.inet.tcp.recvbuf_max=1048576     
> > net.inet.tcp.sendbuf_max=1048576     
> > 
> > to their default values.
> > 
> > -- 
> > Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> >      NetBSD: 26 ans d'experience feront toujours la difference
> > --  
> 
> No doubt sage advice, but it feels like we're missing something. All my servers have the same 
> values, or higher, e.g.
> 
> [root@xen-12:~]> netstat -m
> 2050 mbufs in use:
> 	2049 mbufs allocated to data
> 	1 mbufs allocated to packet headers
> 200 calls to protocol drain routines
> 
> [root@xen-12:~]> egrep -v  '^#' /etc/sysctl.conf
> ddb.onpanic?=0
> kern.sbmax=4194304
> net.inet.tcp.sendbuf_max=1048576
> net.inet.tcp.recvbuf_max=1048576
> 
> [root@xen-12:~]> xl info | grep mem
> total_memory           : 262032
> free_memory            : 116527
> sharing_freed_memory   : 0
> sharing_used_memory    : 0
> xen_commandline        : dom0_mem=8192M,max:8192M sched=credit2
> dom0_nodes=1 dom0_max_vcpus=1 dom0_vcpus_pin
> 
> and no similar problems on any of those. 
> 
> I brought up the azure-fuse system and pushed a bunch of data through it. At no point 
> did the mbuf use go above 700. I'm going to try and gather some more data during the upcoming
> weekend automated testing to see if it's exercising things in a weird way before I change anything else
> and return sbmax to the default, etc... 
> 
> I put this crude command into crontab: 
> 
> logger -p local0.warn `netstat -m | tr '\n' '|'`
> 
> so it might be interesting to see what happens this weekend. 
> 
> Thanks again. 
> 
> HW
> 


Well it crashed again but I captured some data. It look like the number of call to 
drain routines which is normally zero. started climbing before the crash. 
There was also a spike in the number of allocated mbufs used at the beginning of the period 
of mbuf growth: 

Oct  5 19:45:00 xen-09 root: 513 mbufs in use:| 512 mbufs allocated to data| 1 mbufs allocated to packet headers|2 calls to protocol drain routines|
Oct  5 19:46:00 xen-09 root: 867 mbufs in use:| 866 mbufs allocated to data| 1 mbufs allocated to packet headers|4 calls to protocol drain routines|
Oct  5 19:47:00 xen-09 root: 513 mbufs in use:| 512 mbufs allocated to data| 1 mbufs allocated to packet headers|8 calls to protocol drain routines|
...
Oct  5 20:10:00 xen-09 root: 515 mbufs in use:| 514 mbufs allocated to data| 1 mbufs allocated to packet headers|76 calls to protocol drain routines|
Oct  5 20:11:00 xen-09 root: 515 mbufs in use:| 514 mbufs allocated to data| 1 mbufs allocated to packet headers|76 calls to protocol drain routines|
Oct  5 20:12:00 xen-09 root: 529 mbufs in use:| 528 mbufs allocated to data| 1 mbufs allocated to packet headers|78 calls to protocol drain routines|


HW




Home | Main Index | Thread Index | Old Index