Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dom0 xvif mbuf issues



On Wed, 26 Sep 2018 at 21:14, Harry Waddell <waddell%caravaninfotech.com@localhost> wrote:
>
> I have a server where Dom0 started becoming unusable as of a few months ago
> where previously it ran for years with few issues.
>
> netbsd-7 branch, never more than a month behind.
> BRIDGE_IPF is enabled and these options set in sysctl.conf:
>
> kern.sbmax=1048576
> net.inet.tcp.recvbuf_max=1048576
> net.inet.tcp.sendbuf_max=1048576
> kern.mbuf.nmbclusters=300000
> kern.maxfiles=3000
>
> Xen 4.8.3 similarly updated.
>
> One of the xvif devices "could not allocate a new mbuf". I enabled MBUF debugging
> and netstat didn't seem to point to a leak on any of the devices. It hung again, but with a new
> error scrolling on the console. "xennetback: got only 63 new mcl pages"
>
> My suspicion is that either one of the guests started doing a lot more nfs activity OR
> that a VM I created which uses a fuse filesystem to move large dumpfile to azure blob
> storage may be what pushed this previously working system off the edge.
>
> I'm moving the azure fuse system to another server and plan to disable ipf on the bridge.
>
> Beyond that, and any suggestions? Should I just upgrade to netbsd 8 and/or xen 4.11?
> ( even if it's just to make debugging easier since this is where current work is taking place? )
>
> This is a production system with about 30 guests. I just want it to work like it used to.

There have been some recent xen fixes in netbsd-8 - for me they fixed
a DOM0 hang-on-reboot issue, so I would definitely be tempted to try
getting to netbsd-8 and xen 4.11.

There is a caveat that NetBSD is moving away from the pretty much
unmaintained ipfilter to the mutiprocessor safe npf

I'm always wary about suggesting someone mess with a production
system, but if possibly I'd suggest backing up the OS and try in order
(apologies if this is all obvious or you already have a better plan)
each time with a reboot and run for a while to see that there are no
new issues
- extract latest DOM0 (to different file) & modules from
http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-8/ (or build your own)
then reboot into it (trivial to switch back)
- backup your userland and packages, then extract netbsd-8 userland
(switching back involves using /rescue to extract netbsd-7 then delete
new files)
- rename /usr/pkg & /var/db/pkg and extract netbsd-8 packages
(switching back is just a directory rename). If feeling cautious do
this with xen 4.8 first then 4.11

Best of luck - I've only ever had a xen box with up to ten guests, 30
is definitely a good number there :)

David


Home | Main Index | Thread Index | Old Index