Subject: NetBSD/xen network problems (need help)
To: None <port-xen@NetBSD.org>
From: Mike M. Volokhov <mishka@NetBSD.org>
List: port-xen
Date: 01/23/2006 10:34:01
Hello!
I have a Xen 2.0.7 and NetBSD 3.0_STABLE (tested on 24.12.05 and
20.01.06 sources) setup with four domUs, all configured accordingly to
Ports/xen/howto. So far, so good. After we have got yet another
Internet connection I'm willing to setup one of domU as router/ipf/nat/
ipsec/altq ipv4-only box. And this is why I'm got a lot of PITA here :-O
Because system have two interfaces (details see below) I've used the
following scheme (plus yet another two domains attached to bridge0, but
not shown here):
[LAN] === <bge0 ----- dom0 ---- bge1> ===== [WAN]
| |
bridge0 bridge1
| | |
xvif1.0 xvif2.0 xvif2.1
| | |
xennet0 xennet0 xennet1
dom1 dom2 dom2
It's worked. The bge1/WAN configuration was added recently (dual NIC
mobo), when bge0 was worked up for a weeks. But now often (once per few
few minutes) all network interfaces are just hanged up for a few
minutes. I've also noted that hangups are somehow intersected with a
lot of duplicated packets produced by all domU machines. For example,
ping statistics showing the following results:
4650 packets transmitted, 2949 packets received, +27752 duplicates, 36.6% packet loss
round-trip min/avg/max/stddev = 16.536/43066.884/94286.876/29929.792 ms
There is a 'netstat -i | grep -e Name -e Link' output for dom0 and for
dom3 (dom5 is actually just restarted dom2, please see scheme above):
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:30:48:84:cf:98 172243 862 406381 0 0
bge1 1500 <Link> 00:30:48:84:cf:99 0 0 20 0 0
lo0 33192 <Link> 1524 0 1524 0 0
bridg 1500 <Link> 625468 0 1175737 181978 0
bridg 1500 <Link> 17 0 27 0 0
xvif1 1500 <Link> aa:00:00:21:be:8b 121192 87493 188969 0 0
xvif3 1500 <Link> aa:00:00:05:e7:86 111157 88327 179549 0 0
xvif4 1500 <Link> aa:00:00:27:74:18 69277 87680 144414 0 0
xvif5 1500 <Link> aa:00:00:51:08:e4 166169 30719 252683 0 0
xvif5 1500 <Link> aa:00:00:51:08:e5 11 0 2 0 0
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
lo0 33192 <Link> 28 0 28 0 0
xenne 1500 <Link> aa:00:00:04:e7:86 179545 0 111152 88327 0
Please note the errors on dom3/xennet0 = dom0/xvif3.0. Another note is
that errors have a bursted nature. I.e. all works fine, then I got
hangup, and then stats show me a lot of errors. After that all works
good again.
Previously (9 days uptime, daily output):
Name Ipkts Ierrs Opkts Oerrs Colls
bge0 19158819 0 20321849 0 0
bge1 4813 0 53 0 0
lo0 42589 0 42589 0 0
bridge0 39464623 0 43880371 0 0
xvif1.0 278513 0 1390149 0 0
xvif3.0 329503 0 1419529 0 0
xvif4.0 529427 0 1291286 0 0
bridge1 4792 0 4762 0 0
xvif13.0 15701265 0 14390989 0 0
xvif13.1 0 0 4599 0 0
xvif15.0 0 0 407593 0 0
Also, I've faced with kernel panics on domU machine (see below; btw,
how to save core dump? "sync" isn't working - dump device bad, /netbsd
is a copy of really booted kernel).
WTF here?! Where I'm wrong? Any help or advice on how to debug this
would be very much appreciated.
--
Mishka.
P.S. So, there is some details about physical interfaces:
bge0 at pci2 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
bge0: interrupting at irq 16, event channel 7
bge0: ASIC BCM5751 A1 (0x4101), Ethernet address 00:30:48:84:cf:98
brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
bge1 at pci3 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
bge1: interrupting at irq 17, event channel 12
bge1: ASIC BCM5751 A1 (0x4101), Ethernet address 00:30:48:84:cf:99
brgphy1 at bge1 phy 1: BCM5750 1000BASE-T media interface, rev. 0
Panic message:
panic: m_makewritable: length changed
Stopped at netbsd:cpu_Debugger+0x4: leave
cpu_Debugger(c03f8d38,38,c03f8ce8,c03f8d38,0) at netbsd:cpu_Debugger+0x4
panic(c0331900,1,0,0,0) at netbsd:panic+0x121
m_makewritable(c03f8d38,0,3b9aca00,1,c0871500) at netbsd:m_makewritable+0x6b
fr_check_wrapper(0,c03f8d38,c072d038,1,c0871800) at netbsd:fr_check_wrapper+0x1b
pfil_run_hooks(c036e9e0,c03f8da0,c072d038,1,c03f8dc8) at netbsd:pfil_run_hooks+0x6e
ip_input(c0871800,c01142b2,9,202,0) at netbsd:ip_input+0x93b
ipintr(fffffffe,20,4,1,c03f8e10) at netbsd:ipintr+0xad
DDB lost frame for netbsd:Xsoftnet+0x4f, trying 0xc03f8dd0
Xsoftnet() at netbsd:Xsoftnet+0x4f
--- interrupt ---
emul_freebsd_object(c03f8e4c,0,3b9a0000,ca00) at 0xc03fe000
Bad frame pointer: 0xc02ad848
ds 0x11
es 0x11
fs 0x31
gs 0x11
edi 0x1
esi 0x100
ebp 0xc03f8c98 emul_freebsd_object+0x6f9c4
ebx 0x1
edx 0xc03fe000 emul_freebsd_object+0x74d2c
ecx 0xfffffff8
eax 0x9fd
eip 0xc02ab8fc cpu_Debugger+0x4
cs 0x9
eflags 0x202
esp 0xc03f8c98 emul_freebsd_object+0x6f9c4
ss 0x11
netbsd:cpu_Debugger+0x4: leave
Stopped at netbsd:cpu_Debugger+0x4: leave
db>