NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-xen/45728: NetBSD/xen network loss



>Number:         45728
>Category:       port-xen
>Synopsis:       NetBSD/xen loses network connectivity
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 21 09:30:01 +0000 2011
>Originator:     Brian Marcotte
>Release:        NetBSD 5.1
>Organization:
        Panix
>Environment:
System: NetBSD panix5.panix.com 5.1 NetBSD 5.1 (PANIX-XEN3U-USER) #3: Wed Mar 9 
20:45:57 EST 2011 
root%juggler.panix.com@localhost:/devel/netbsd/5.1-shellkernel/src/sys/arch/i386/compile/PANIX-XEN3U-USER
 i386
Architecture: i386
Machine: i386
>Description:

For some time, my machines have had very occasional network problems
which I have not been able to diagnose or reproduce. In the past I
thought it was specific to NFS, but now it looks like the NFS issues
are just a symptom of a network issue.

It only happens under Xen, or I can only reproduce it under Xen. I've
also tried -current and there is no change in behavior.

What happens is that the machine either goes off the net entirely (with
feature-rx-notify), or starts to experience major packet loss (without
feature-rx-notify). 

>How-To-Repeat:

Two servers are required to reproduce the problem. The first is the
NetBSD system to be diagnosed. The second needs to be running
telnetd. I used another NetBSD system for this, but that doesn't seem
to matter. The problem also happens when suspended processes continue to
receive data from the network, but this telnet example is a very simple
way to reproduce the problem.

   First, you need to make sure that flow control characters are
   making it to the system to be tested. I did this by ssh-ing in. It
   should probably also work if you had a local xterm or console. You
   should be able to enter Control-V, Control-S and see the "^S"
   appear.

   telnet to the machine running the server and log in as some user.

   run this on the remote end: while :; do date ; sleep .1; done

   Type Control-S. In my testing, this is processed on the system
   running the telnet client, not the remote system. This is key to
   reproducing the problem.

   Wait a few minutes. Running "netstat -f inet -n" should show the
   "Recv-Q" filling up on the connection. Eventually, the system
   should go off the network when it becomes full (NetBSD defaults to
   using feature-rx-notify).

   You may need to log in on the console and kill the telnet client to
   fix things.

The behavior when not using feature-rx-notify (by modifying
if_xennet_xenbus.c) is somewhat different. Instead of the machine
going off the network entirely, there is severe packet loss instead
(make NFS grind to a halt). I can see this by running "netstat -i" in my
Linux dom0.

>Fix:
   Unknown.



Home | Main Index | Thread Index | Old Index