Subject: Re: xen network issues
To: Johan Ihren <johani@johani.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 02/28/2006 19:23:36
On Sun, Feb 26, 2006 at 07:46:16AM +0100, Johan Ihren wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I have two physical servers each running about ten domUs. Everything
> (dom0s and domUs) are all NetBSD 3.0REL. To keep down the size of the
> fs images for the domUs I export /usr/share+/usr/X11R6+/usr/pkg via
> NFS from the dom0s to the (local) domUs. I then thread together all
> the domUs via a large number of VLANs.
>
> In testing this has worked like a charm. I've run parallell compiles
> on all the domUs, done all the various (mostly DNS related) stuff I
> need to (mixed v4/v6 transport, lot's of internal topology, packet
> filters, parallell dhcp environments, etc, etc).
>
> However, when doing this for real, with live students, recently I had
> som trouble. The students sit on individual desktops and ssh into
> their "own" domUs. Typically the physical server bogged down
> entirely on occasion, the interrupt rate on the dom0 reached 100% and
> the network interface started to have device timeouts.
>
> From there on things went downhill, almost impossible to get shell
> access to respond at all. In the end I typically unplugged the
> physical network (to separate the physical servers) to try to
> recover. Usually it did recover, although it took several minutes.
>
> Because this was a training environment I unfortunately did not have
> much opportunity to debug this (students to take care of), so I just
> left it on its own and hoped for recovery. Therefore there's not much
> hard data other than:
>
> * 100% interrupt rate
> * oodles of "sip0: FIFO ring overrun" on one server
> * "fxp0: device timeout" on the other server
> * oodles of "nfs_timer: ignoring error 64" on all domUs
>
> What I did *not* find was any massiv network traffic. I.e. no raging
> storms that I could see.
>
> One other thing I did notice was that on occasion the remote access
> to the domUs failed (i.e. the connection appeared to hang), while the
> actual machines were just fine. An "ifconfig fxp0 down / ifconfig
> fxp0 up" seemed to clear that usually if I was quick about it. If
> busy with other stuff I think that evolved into the general catatonic
> state.
>
> I'm really sorry I don't have more detailed information.
Hi,
how old are your dom0 and domU kernels ? An issue have been fixed recently
(virtual interfaces not checking the ethernet addresses in packets)
which could cause the kind of issue you're seeing.
The fix has been pulled up to the netbsd-3 and netbsd-3-0 branches.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--