Subject: Re: trouble with an rtk
To: Sean Finney <seanius@seanius.net>
From: Steven M. Bellovin <smb@research.att.com>
List: current-users
Date: 06/13/2002 12:17:04
In message <20020613113846.A4101@sccs.swarthmore.edu>, Sean Finney writes:
>so at home i have a setup along the lines of
>
>rest of the world <==> | rtk0 (router/firewall) rtk1 | <==> internal net
>
>recently, I've been getting really strange behavior from what I've pinned
>down to being my realtek 8139 card at rtk1.  the router has been up for a
>few months just quietly sitting in the corner of the living room doing
>what it does, but recently, I've been having trouble with sending a little
>over a MB of data over the wire.  Often, it will just stall and crap out
>a couple hundred kb into a transfer.  Sometimes, it will transfer all
>the data, just really slowly (this morning, 2MB took 5 minutes), and 
>sometimes it works just fine (though less of that lately)
>
>The really strange thing is that any existing connection independent of
>the individual crapping-out transfer is just fine.  For example, my
>xforwarded gaim and licq work just fine during these episodes, though
>they themselves will similarly periodically crap out over the course of
>the day.  For the necessities, a while loop and rsync can still get the
>job done, but it's excruciatingly annoying...
>
>I don't have this problem downloading stuff off the net straight onto
>the router/fw (over rtk0), and I don't have a proublem sending from
>one computer to another in the internal net, and I've checked the cable...
>
>could there be some cause of this other than just flaky hardware?  I've
>turned on the debug flag on rtk1 to see if it'd do anything, edited 
>syslog.conf to drop everything I could think of somewhere in /var/log,
>and grep'd through /var/log/* and don't see anything that points to a
>likely cause.  
>
>so, if it is dying hardware, is there anything I can do to make things
>more bearable while I'm waiting for my new (3com) cards?


The most common cause of slow TCP throughput is packet loss.  By 
design, TCP interprets that as congestion, and slows down, but it can't 
tell the difference between that and flakey hardware.  I'd expect 
things like gaim to keep working -- first, it's a separate connection; 
second, if it's not bulk data transfer, it would recover very quickly.

The issue is to figure out where the packet loss is.  For that, ping 
and traceroute are your friends (though instead of traceroute, mtr 
(from pkgsrc) is often better at this sort of thing).  Try different 
packet sizes, too.

		--Steve Bellovin, http://www.research.att.com/~smb (me)
		http://www.wilyhacker.com ("Firewalls" book)