Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

NetBSD/xen goes off the network - reproduceable



For some time, my machines have had very occasional network problems
which I have not been able to diagnose or reproduce. In the past I
thought it was specific to NFS, but now it looks like the NFS issues
are just a symptom of a network issue.

It only happens under Xen, or I can only reproduce it under Xen. I've
also tried -current and there is no change in behavior.

What happens that the machine either goes off the net entirely (with
feature-rx-notify), or starts to experience major packet loss (without
feature-rx-notify).

I can now reliably reproduce the problem using telnet. Note that this
is just for demonstration. I've seen this happen even when there is no
telnet running. Suspended processes that continue to receive from the
network sometimes cause this to happen, but I've not been able to
reproduce it that way.

-----

Two servers are required to reproduce the problem. The first is the
NetBSD system to be diagnosed. The second needs to be running
telnetd. I used another NetBSD system for this, but that doesn't seem
to matter.

   First, you need to make sure that flow control characters are
   making it to the system to be tested. I did this by ssh-ing in. It
   should probably also work if you had a local xterm or console. You
   should be able to enter Control-V, Control-S and see the "^S"
   appear.

   telnet to the machine running the server and log in as some user.

   run this on the remote end: while :; do date ; sleep .1; done

   Type Control-S. In my testing, this is processed on the system
   running the telnet client, not the remote system. This is key to
   reproducing the problem.

   Wait a few minutes. Running "netstat -f inet -n" should show the
   "Recv-Q" filling up on the connection. Eventually, the system
   should go off the network when it becomes full (NetBSD defaults to
   using feature-rx-notify).

   You may need to log in on the console and kill the telnet client to
   fix things.

The behavior when not using feature-rx-notify (by modifying
if_xennet_xenbus.c) is somewhat different. Instead of the machine
going off the network entirely, there is severe packet loss instead. I
can see this by running "netstat -i" in my Linux dom0.

----

I have packet traces while running the above test. The client is
166.84.1.74 and the server used is 166.84.1.3

With feature-rx-notify (NetBSD default):

  http://www.panix.com/~marcotte/with-rx-notify.pcap

Without feature-rx-notify:

  http://www.panix.com/~marcotte/without-rx-notify.pcap

In the middle of the second test, I started an ftp so you can see the
effect of the packet loss.

Thanks.

--
- Brian


Home | Main Index | Thread Index | Old Index