Subject: Fix for bug in vr(4)
To: None <tech-kern@NetBSD.org>
From: Julio M. Merino Vidal <jmmv84@gmail.com>
List: tech-kern
Date: 01/22/2005 17:27:25
Hi all,

vr(4) has been bothering me for a very long time (since I've been using
it, i.e., around last July).  It randomly crashes the system when
running dhclient with a:

kernel: page fault trap, code=0
Call trace: Xintr_ioapic_level5, vr_intr, vr_rxeof, bpf_mtap,
bpf_filter, m_xhalf

This happens very few times, but eventually happens.

However, there is an easier way to reproduce the problem.  If the card
is in promiscuous mode (running tcpdump, having a bridge configured...),
running dhclient makes it crash ~95% of the times.  In fact, I've been
using this script to make it crash at will:

dhclient vr0; sleep 5
pkill dhclient; sleep 5
ping -c 5 sun; sleep 5
tcpdump -i vr0 &; sleep 5
ping -c 1 sun; sleep 5
dhclient vr0

So after the whole day looking for the problem, I found it.  For some
reason, the card gets zero-length packets which, when passed to bpf,
cause the failure.  These packets always come with the VR_RXSTAT_RLINK
flag active, and sometimes with VR_RXSTAT_FIRSTFRAG too.  (I can
understand the meaning of the later, but not the former.)

I don't know why this happens, but ignoring these packets avoids the
crashes.  However, just ignoring them leaves the card in a state which
is inconsistent.  The driver will keep "receiving" these packets
periodically and the card does not work (it won't transmit anything).

Issuing a reset command after receiving any of these zero-length
packets solves the problem.  After that, the driver does not receive
them any more and the card works fine.  (I.e., the conditional gets
triggered only once.)

Here is the patch:

Index: if_vr.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_vr.c,v
retrieving revision 1.71
diff -u -r1.71 if_vr.c
--- if_vr.c	13 Jan 2005 14:51:28 -0000	1.71
+++ if_vr.c	22 Jan 2005 16:23:23 -0000
@@ -650,6 +650,13 @@
 
 		/* No errors; receive the packet. */
 		total_len = VR_RXBYTES(le32toh(d->vr_status));
+		if (total_len == 0) {
+			printf("%s: got packet of zero length; status = 0x%x\n",
+			       sc->vr_dev.dv_xname, d->vr_status);
+			printf("%s: restarting\n", sc->vr_dev.dv_xname);
+			(void) vr_init(ifp);
+			continue;
+		}
 
 #ifdef __NO_STRICT_ALIGNMENT
 		/*

I don't know if this is correct, but it avoids crashes and makes the
card work.

So unless someone else knows which might be the root cause of the
problems (if any), I'd like to commit it.

Thanks,

-- 
Julio M. Merino Vidal <jmmv84@gmail.com>
http://www.livejournal.com/users/jmmv/
The NetBSD Project - http://www.NetBSD.org/