Subject: Re: pppoe problem
To: None <tech-net@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-net
Date: 12/16/2001 20:45:43
>>> [...PMTU-D black holes...]
>> I know of at least one person who's had trouble sending me large
>> emails because the relevant outgoing mailhost has the won't-frag
>> disease, [...]
> I wonder if it possible to locate the 'black hole' by sending large
> ICMP echo packets with 'dont fragment' and a small 'hop count',
> comparing the result to the usual 'traceroute' sequence.

I doubt it.

For ease of writing, I'll use the following names:
	Server: the machine (webserver, mailhost, ec) that wants to
		send large packets.
	Client: the machine that wants to receive what Server sends.
	GMTU: the gateway in the Server->Client direction that has to
		drop packets when DF is set.
	GICMP: the machine responsible for dropping need-to-frag ICMP
		unreachables in the Client->Server direction.

The problem arises only by (unwitting) collaboration of all these
machines.  The real problem, of course, lies with GICMP (which in one
case I know of proved to be the same host as Server - it had broken
filtering rules installed).  But it doesn't show up unless GMTU is
present, which is why it's a problem at all - GMTU doesn't exist for
most users; for them, the MTU of Server's outgoing link is no larger
than the MTU the rest of the way to Client.  (Server and Client are
involved because the problem doesn't arise until Server wants to send
bulk data to someone like Client on the far side of GMTU.)

Now, from Server, you can locate GMTU with traceroute -P or moral
equivalent.  But that won't tell you anything about where GICMP, the
real problem, is - and if GICMP is present, traceroute -P will not
actually "work", in that it won't be able to work out the real path
MTU; it will just star out ("* * *") when it gets to GMTU.

And from Client, your suggestion may locate the place on the
Client->Server path (if there is one) where MTU is lowered.  This may
or may not be the reverse direction of the link GMTU is at the
Server-side end of (in the common PPPoE case it will be, but the path
does not have to be symmetric).  And it does not necessarily have
anything to do with where GICMP is - in the common case it won't.

If GICMP is simply dropping _all_ ICMPs, you can locate it by doing a
traceroute in ICMP mode, or doing a normal traceroute and then pinging
successive hosts on it.  This will be defeated by a gateway closer to
Client that drops pings but not need-frag unreachables, and if GICMP
lets pings through it won't work either.

And you can't find GICMP by forging need-to-frag ICMPs with low TTLs,
because the routers "MUST NOT" (RFC 1122) send an ICMP in response to
an ICMP unreachable (or certain other types of ICMPs).  (The reason
traceroute -I works is that echo-request is not one of those "certain
other" types.)

Thus, I think there is no way to find GICMP, even with collaboration of
Server's admins, unless it happens to be part of Server's organization
(and thus Server's admins are also GICMP's admins).  Sniffing traffic
on the various hops will, of course, locate it, but that requires the
ability to so sniff.  Fortunately, in the few cases where I've been
able to identify GICMP, it's always been part of Server's organization,
and despite having a very small sample size, I would hazard a guess
that it is so for the majority of PMTU-D black holes in today's
Internet.

The reason having GMTU (or more often the host on Client's end of the
link to GMTU, which I'll call GCMTU) do MSS clamping is that it makes
Server start out using packets small enough that GMTU doesn't have to
fragment-or-drop them; it doesn't actually fix the real problem, though
from a naïve Client's perspective it may seem like it.  It does,
however, depend on GCMTU knowing what GMTU's outgoing MTU is; I suspect
existing MSS clamping implementations simply assume that it matches
GCMTU's outgoing MTU over that link.  (It also depends on Server paying
attention to the MSS option, and won't work in the presence of IPSEC
AH, never mind ESP....)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B