Subject: Re: SOLVED! The cause of puzzling TCP (eg. WHOIS) connection failures with some InterNIC.net hosts
To: NetBSD Networking Technical Discussion List <tech-net@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: tech-net
Date: 11/21/1998 01:21:50
[ On Sat, November 21, 1998 at 00:34:50 (-0600), Henry Miller wrote: ]
> Subject: Re: SOLVED!  The cause of puzzling TCP (eg. WHOIS) connection failures with some InterNIC.net hosts
>
> > I do think it would be more friendly for a NetBSD router to optionally
> > ignore the "DF" bit if the same oversize packet is re-transmitted even
> > after the ICMP "needs frag" reply has been sent, perhaps after "N"
> > retransmissions where "N" is calculated based on some magical formula
> > that uses the packet size and the delay between retransmissions in order
> > to guess at how long it would take the ICMP reply to get back to the
> > originator and for a smaller packet to arrive.
> 
> And how do we remember which hosts (of the 4 billion possibal on the
> internet, lets not consider ipv6) we should ignore the DF bit.  How do we
> know that it isn't a case of someone setting the DF bit, and then getting
> disconnected (from their dynamic IP), and a different host comes online
> soon after and trys to find the problem.

No memory necessary, well, not very much anyway.  If you get a packet
with the DF *and* you have to return an ICMP "needs frag" reply for it,
*and* the packet is part of an already established TCP circuit, then you
record this instance in a small cache of such things.  If you get a
second packet of the same size and have to do the same again then you
increment a counter in the first record and mark the time of the second
event.  When the N'th identical sized packet comes through from the same
source then you set a flag in the cache item, turn off the DF bit, and
pass the packet on (in fragments).  The flag says to ignore the DF bit
on all packets from that source address.

If the cache is full you toss out the least recently used one and take
its place.

Periodically really old entries from the cache could be flushed by a
cleaner daemon/thread/whatever though this is just gravy (it might help
the logging of cache overflows a bit thus facilitating tuning).

In my case the cache wouldn't need to hold any more than about a half
dozen or so such records.  On a really busy host I'd guess a couple
hundred would be enough, though I've no idea how wide-spread this
problem really is.

There should of course be some logging of such things, esp. if the cache
keeps filling up, and of course some way to peek at the cache contents
would be handy if each new source isn't logged (so that a diligent admin
could bug those who are misbehaving).

The only serious potential for DoS I see at the moment would be in the
logging.

I realize this is a hack to work around stupid bugs and "broken"
firewalls and such, but it does seem like it's one of those instances
where the robustness principle can be safely applied, especially since
even if every implementation were fixed there's still going to be a few
cases of ignorant security admins to work around.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>