Subject: Re: NetBSD/xen network problems (need help)
To: Mike M. Volokhov <mishka@NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 01/23/2006 12:41:54
On Mon, Jan 23, 2006 at 10:34:01AM +0200, Mike M. Volokhov wrote:
> Hello!
>
> I have a Xen 2.0.7 and NetBSD 3.0_STABLE (tested on 24.12.05 and
> 20.01.06 sources) setup with four domUs, all configured accordingly to
> Ports/xen/howto. So far, so good. After we have got yet another
> Internet connection I'm willing to setup one of domU as router/ipf/nat/
> ipsec/altq ipv4-only box. And this is why I'm got a lot of PITA here :-O
>
> Because system have two interfaces (details see below) I've used the
> following scheme (plus yet another two domains attached to bridge0, but
> not shown here):
>
>
> [LAN] === <bge0 ----- dom0 ---- bge1> ===== [WAN]
> | |
> bridge0 bridge1
> | | |
> xvif1.0 xvif2.0 xvif2.1
> | | |
> xennet0 xennet0 xennet1
> dom1 dom2 dom2
OK, I have similar setups (one of my domU have 6 interfaces, connected to
6 bridges)
>
>
> It's worked. The bge1/WAN configuration was added recently (dual NIC
> mobo), when bge0 was worked up for a weeks. But now often (once per few
> few minutes) all network interfaces are just hanged up for a few
> minutes. I've also noted that hangups are somehow intersected with a
> lot of duplicated packets produced by all domU machines. For example,
> ping statistics showing the following results:
>
> 4650 packets transmitted, 2949 packets received, +27752 duplicates, 36.6% packet loss
> round-trip min/avg/max/stddev = 16.536/43066.884/94286.876/29929.792 ms
From where to where is this ping ? Also it would be interesting
to run tcpdump on the other end, to see in which direction the
packet is dupliced.
>
> There is a 'netstat -i | grep -e Name -e Link' output for dom0 and for
> dom3 (dom5 is actually just restarted dom2, please see scheme above):
>
> Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
> bge0 1500 <Link> 00:30:48:84:cf:98 172243 862 406381 0 0
> bge1 1500 <Link> 00:30:48:84:cf:99 0 0 20 0 0
> lo0 33192 <Link> 1524 0 1524 0 0
> bridg 1500 <Link> 625468 0 1175737 181978 0
> bridg 1500 <Link> 17 0 27 0 0
> xvif1 1500 <Link> aa:00:00:21:be:8b 121192 87493 188969 0 0
> xvif3 1500 <Link> aa:00:00:05:e7:86 111157 88327 179549 0 0
> xvif4 1500 <Link> aa:00:00:27:74:18 69277 87680 144414 0 0
> xvif5 1500 <Link> aa:00:00:51:08:e4 166169 30719 252683 0 0
> xvif5 1500 <Link> aa:00:00:51:08:e5 11 0 2 0 0
Lots of errors on xvifs. Do you have any message in dmesg ? In the
driver, there are several places where there is a printf before the input
error counter is incremented.
However, one place where it's silently incremented is if it can't get
a mbuf (e.g. if you get "mclpool limit reached"). This would also explain
the network hangs.
>
> Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
> lo0 33192 <Link> 28 0 28 0 0
> xenne 1500 <Link> aa:00:00:04:e7:86 179545 0 111152 88327 0
The output errors on this side are probably related to the input errors
on the dom0.
> [...]
>
> Also, I've faced with kernel panics on domU machine (see below; btw,
> how to save core dump? "sync" isn't working - dump device bad, /netbsd
> is a copy of really booted kernel).
kernel core dump don't work yet on Xen, It's on my todo list.
>
> Panic message:
>
> panic: m_makewritable: length changed
> Stopped at netbsd:cpu_Debugger+0x4: leave
> cpu_Debugger(c03f8d38,38,c03f8ce8,c03f8d38,0) at netbsd:cpu_Debugger+0x4
> panic(c0331900,1,0,0,0) at netbsd:panic+0x121
> m_makewritable(c03f8d38,0,3b9aca00,1,c0871500) at netbsd:m_makewritable+0x6b
> fr_check_wrapper(0,c03f8d38,c072d038,1,c0871800) at netbsd:fr_check_wrapper+0x1b
This is an internal diagnostig to m_makewritable(). Now, I see this
check doesn't check the error code returned by m_copyback0(), so it's
possible it's triggered because of ressources shortage on mbuf pool.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--