Subject: Re: frozen networking with libpcap-0.9.3nb3
To: Jeremy C. Reed <reed@reedmedia.net>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-net
Date: 10/11/2005 19:34:34
Yes, you should send-pr, but it would help to do the following
experiments and include/reference results, because anyone who could
fix this will probably want to know the answers:

  Check outgoing traffic via tcpdump on another machine.  Probably
  it's there, but if not that's interesting.

  Run 'netstat -ain' to see what multicast groups are joined.

  With the system in the broken state, try tcpdump (instead of
  trafshow) w/o -p.  capture ifconfig; it should show PROMISCOUS but
  not ALLMULTI. capture netstat -ain.

  Reboot, but don't power cycle.  Don't start trafshow or ntcpdump.
  Does it work, or is it still in the no-packets-received state?

  If reboot doesn't result in a working interface, power cycle and
  retest.

  From a known good state, run tcpdump (no -p) and exit.  Working
  network, or broken?  Capture ifconfig and netstat
  before/during/after.

  If tcpdump doesn't break the interface, from a known good state, run
  ktrace trafshow and exit.  Is the failure repeatable?  Capture
  ifconfig and netstat before/during/after.

You might also read the cvs logs for sys/dev/ic/tulip.c, and in
particular see revision 1.133, and perhaps enable the debugging code
referenced in 1.127.

I think if you compile with TLP_DEBUG, and then 'ifconfig tlp0 debug',
you'll get debugging printfs of the filter setup code.  While it's
tough to say what's wrong, I'd focus on tlp_filter_setup.  What's odd
is that the chip seems to be in a permanently bad state such that it
is failing to receive the station address.  I'd expect perfect
filtering, though, because probably few multicast groups are joined.

See the comment about broken hashperfect filters:

	/*
	 * Some 21140 chips have broken Hash-Perfect modes.  On these
	 * chips, we simply use Hash-Only mode, and put our station
	 * address into the filter.
	 */

You might try adding "1 ||" to the expression to force hashonly mode
on your chip.

It seems like something is putting the ethernet chip in a bad state
so that the filter programming that used to work is now failing.  It
will be very interesting to see if tcpdump fails to provoke this but
trafshow does, and if so what trafshow is doing.

Apparently the ALLMULTI bit is set as a side affect of PROMISC, since
that forces the 'allmulti:' code path.

-- 
        Greg Troxel <gdt@ir.bbn.com>