Re: Dom0 xvif mbuf issues

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: Dom0 xvif mbuf issues
From: Harry Waddell <waddell%caravaninfotech.com@localhost>
Date: Mon, 1 Oct 2018 17:10:19 -0700

On Thu, 27 Sep 2018 13:13:27 +0200
Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote:

> On Wed, Sep 26, 2018 at 01:14:40PM -0700, Harry Waddell wrote:
> > 
> > I have a server where Dom0 started becoming unusable as of a few months ago
> > where previously it ran for years with few issues. 
> > 
> > netbsd-7 branch, never more than a month behind. 
> > BRIDGE_IPF is enabled and these options set in sysctl.conf: 
> > 
> > kern.sbmax=1048576
> > net.inet.tcp.recvbuf_max=1048576
> > net.inet.tcp.sendbuf_max=1048576
> > kern.mbuf.nmbclusters=300000
> > kern.maxfiles=3000
> > 
> > Xen 4.8.3 similarly updated.
> > 
> > One of the xvif devices "could not allocate a new mbuf". I enabled MBUF debugging
> > and netstat didn't seem to point to a leak on any of the devices.  
> 
> Looks like temporary memory shortage in the dom0 (this is a MGETHDR failing,
> not MCLGET, so the nmbclusters limit is not relevant).
> How many mbufs were allocated ?
> 
At the time of the hang, I have no idea. 

It's around 512 whenever I check. 

[root@xen-09:conf]> netstat -m
515 mbufs in use:
	513 mbufs allocated to data
	2 mbufs allocated to packet headers
0 calls to protocol drain routines

> 
> > It hung again, but with a new 
> > error scrolling on the console. "xennetback: got only 63 new mcl pages"  
> 
> This would point to a memory shortage in the hypervisor itself.
> Do you have enough free memory (xl info) ?
> 
total_memory           : 131037
free_memory            : 26601
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0



> > 
> > My suspicion is that either one of the guests started doing a lot more nfs activity OR
> > that a VM I created which uses a fuse filesystem to move large dumpfile to azure blob
> > storage may be what pushed this previously working system off the edge. 
> > 
> > I'm moving the azure fuse system to another server and plan to disable ipf on the bridge.   
> 
> ipf shouldn't be a problem, I'm using it extensively on bridges here.
> 

Good. Just grasping at straws. 

> > 
> > Beyond that, and any suggestions? Should I just upgrade to netbsd 8 and/or xen 4.11?
> > ( even if it's just to make debugging easier since this is where current work is taking place? )  
> 
> I'm not sure it would change something
> 
> > 
> > This is a production system with about 30 guests. I just want it to work like it used to.   
> 
> how many vifs is there in the dom0 ?
> 

I expect this is not an ideal way to do this but ...

(for i in `xl list | awk '{print $1}'`;do xl network-list $i | grep vif ;done) | wc -l
      57

Several of the systems are part of a cluster where hosts are multihomed on 2 of 4 networks
to test a customer setup. Most of my systems have < 30, except for one other with 42. 
The others don't hang like this one does. 

> -- 
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>      NetBSD: 26 ans d'experience feront toujours la difference
> --


Thanks for the followup. Answers inline above. 

HW

Follow-Ups:
- Re: Dom0 xvif mbuf issues
  - From: Manuel Bouyer

References:
- Dom0 xvif mbuf issues
  - From: Harry Waddell
- Re: Dom0 xvif mbuf issues
  - From: Manuel Bouyer

Prev by Date: Re: Dom0 xvif mbuf issues
Next by Date: Re: Dom0 xvif mbuf issues
Previous by Thread: Re: Dom0 xvif mbuf issues
Next by Thread: Re: Dom0 xvif mbuf issues
Indexes:

Home | Main Index | Thread Index | Old Index