tech-net: Summary of earlier ICMP rate-limit discussion

Subject: Summary of earlier ICMP rate-limit discussion
To: None <tech-net@netbsd.org>
From: John Hawkinson <jhawk@MIT.EDU>
List: tech-net
Date: 07/09/2000 10:57:28
Back on Sat, 1 Jul 2000 22:56:43 -0400 (EDT), I wrote:

> I believe we still need to resolve the issue of type-based
> rate-limitting for 1.5; do you want me to summarize the earlier
> discussion for tech-net?

Some time later, here's the summary. This is Long.

An unrelated discussion on a private list took place between
devlopers, and slowly moved into one regarding ICMP rate-limitting;
I'll attempt to summarize it here. It actually took place in late May,
but I had not realized it was that long ago.

Some sections:

	I.   ==> BACKGROUND
	II.  ==> ISSUES
	III. ==> MOVING FORWARD

I.   ==> BACKGROUND

Kimmo Suominen <kim@tac.nyc.ny.us> observed that an operational annoyance
with the current ICMP rate-limitting code (for IPv4):

| I'm not sure if this is related, or if some design change was made,
| but ever since upgrading my gateway from 1.4H or something early
| like it to 1.4V I've seen it not sending ICMPs back for at least one
| of the three packets traceroute sends out (looking like a Cisco).

Darren Reed <darrenr@reed.wattle.id.au> explained why:

| This is due to rate-limiting being added to ICMP replies for IPv4,
| mirroring the same feature as available under IPv6.  There's a
| sysctl to control the rate of this.


II.  ==> ISSUES

John Hawkinson <jhawk@MIT.EDU> (me) raises a bunch of issues:

| It appears to have been added as:
| 
| netinet/ip_icmp.c:
| ----------------------------
| revision 1.40
| date: 2000/02/15 04:03:49;  author: thorpej;  state: Exp;  lines: +58 -3
| Add ICMP error rate limiting, based on the same for ICMP6.
| 
| Note, we're reusing the previously unused slot for "MTU discovery" (which
| was moved to the "net.inet.ip" branch of the sysctl tree quite some time
| ago).
| ----------------------------
| 
| 
| And the V6 code was in there initially when committed.
| 
| Anyhow, the v6 code asks a very astute question (XXX):
| 
| netinet6/icmp6.c:
|  2141  /*
|  2142   * Perform rate limit check.
|  2143   * Returns 0 if it is okay to send the icmp6 packet.
|  2144   * Returns 1 if the router SHOULD NOT send this icmp6 packet due to rate
|  2145   * limitation.
|  2146   *
|  2147   * XXX per-destination/type check necessary?
|  2148   */
|  2149  static int
|  2150  icmp6_ratelimit(dst, type, code)
|  
| 
| Indeed, at the very least a type check is necessary.
| 
| Suppose I have a default-free router that someone starts shoveling
| say 300pps worth of traffic to a garbage destination (say,
| 10.0.0.5); we'll start to generate ICMP host unreachables, and rate
| limit them appropriately.
| 
| Then I try to traceroute through that router. I'll get no replies
| back because all the ICMP resources are being used by the 300pps and
| my 3pps traceroute doesn't stand a chance.
| 
| Of course, a destination check would also seem to serve, but is
| inadequate.  Suppose I have a legit 300pps flow of streaming mumble
| to some internet destination, and then suddenly the route goes away
| (link goes down somewhere else). My flow is still going. I start
| debugging it fro mthe same host to try to find out why, and start
| off with traceroute. Unless there's a per-type check, I'll get
| extremely misleading information back.
| 
| 
| I think a destination check might be nice, but I don't know how it
| would scale reasonably. IOS doesn't do one, which is why it's hard
| to get a host unreachable out of a busy Internet router these days.

Jun-ichiro itojun Hagino <itojun@iijlab.net> replies with:

/ For IPv6, RFC2463 (ICMPv6) requires us to implement icmp6 rate
/ limit.  see very last part of page 5.  The document does not specify
/ how we should perform actual rate-limit, though.
...
/ I do keep thinking something should be done here, to allow
/ traceroute6 probes to success, for example.  However, thinking about
/ this always leave me some fuzzy feeling - if we put too much
/ processing here, it's not good as rate-limiting algorithm (it is
/ unwise to chew too much CPU time for rate-limiting).

In the same message, jhawk also says:

| Bill Sommerfeld pointed out that the rate should be smoother, at
| least for the case of traceroute. That is, we'd like all three
| traceroute probes (or all N if you use traceroute -q N, where N is a
| "reasonable value") to be returned if at all feasable.  So 3 (or 10)
| packets in one second is fine, but 30 packets over 10 seconds
| probably is not. This should be for both time exceededs generated
| (tracerouting through a router) and port unreachables (tracerouting
| to a host).

And Itojun said:

/ hmm.  do we need another ratecheck(9) variant?

Hence ppsratecheck(9) coming in to being recently.

Replying to Itojun's RFC2463 citation for IPv6, jhawk notes:

/ Concurrance. This is also well-accepted practice with commercial
/ IPv4 routers today. Note that RFC1812 (router requirements) labels
/ rate-limiting a SHOULD for most icmp types:
/ 
/ 4.3.2.8 Rate Limiting
/ 
/    A router which sends ICMP Source Quench messages MUST be able to
/    limit the rate at which the messages can be generated.  A router
/    SHOULD also be able to limit the rate at which it sends other
/    sorts of ICMP error messages (Destination Unreachable, Redirect,
/    Time Exceeded, Parameter Problem).  The rate limit parameters
/    SHOULD be settable as part of the configuration of the router.
/    How the limits are applied (e.g., per router or per interface) is
/    left to the implementor's discretion.
...
/ I do not think that different rate-limits by type incur significant
/ CPU overhead.
/ 
/ I concur that rate-limitting by destination may incur such an
/ overhead -- that's probably why it is in general not done. It's
/ certainly not a requirement.

Itojun queries about the utility of type-based limits:

/ type-based rate limiting does not help traceroute{,6} to success,
/ since intermediate router always transmit "time exceeded".  so
/ type-based will not become our choice.

Which lead to jhawk's clarification:

/ I'm not sure we're referring to the same case here.
/ 
/ My concern is that a large flow of traffic generating "host
/ unreachable"s should not prevent "time exceeded"s from being
/ transmitted. Type-based rate limit fixes this. I haven't seen
/ another solution proposed which does.
/ 
/ I am not terribly concerned with the case of huge numbers of users
/ tracerouting through such an intermediate router so as to overflow
/ the rate limit. I would think that, for an average router, allowing
/ a rate like 50pps of icmp time exceededs would handle all cases
/ without severely impacting performance.

Itojun proposes an implementation:

/ I'm now testing packet-per-second (like 100 error packets/s) +
/ minimal interval (like at least 100us between two error packets).
/ an error packet needs to pass both tests to leave the node.  does it
/ sound reasonable?

jhawk queries:
/ Suppose two traceroutes are run simultaneously (i.e. within 100us of
/ each other) -- will they not be affected by this limit? That doesn't
/ seem to be a good effect.

Some discussion was had regarding the user interface to these rate
limits. Itojun asks:
/ hmm, I understand your point.  how many configuration variables do
/ we need?  do we need per-type pps/rate configuration?  I'm still not
/ sure how much detailed icmp6 classification we want to put into the
/ kernel - maybe add some directive in ipf to do rate-limits?.

jhawk replies:

/ We certainly need to be able to specify the host unreachable limit
/ as distinct from the time exceeded limit.
/ 
/ It could be argued that we should have one limit per typecode. I
/ think that's a bit excessive.
/ 
/ The minimum necessary, I think, would be one limit for unreachables,
/ and one limit for everything else.
/ 
/ I would prefer something inbetween, with one limit for Unreachables
/ and Redirects (i.e. ICMP sent in response to regular traffic
/ designed to be routed), one limit for Echo Replies and Time
/ Exceededs ("probe" traffic, i.e. ping and traceroute), and one limit
/ for everything else.

III. ==> MOVING FORWARD

There was some further discussion, as well as some more complicated
proposals for rate-limitting implementations, and questioning the
utility of supporting traceroute. I don't think they are critical-path
at this point. There was also some discussion of why this code was
in there at all (denial of service issues).

At this point, I would like to convert the IPv4 code to ppsratecheck(9),
and then bring in type-specific limits. I think this work should then
be pulled up to 1.5, because I am particularly unhappy with the current
state of 1.5 where rate-limitting kicks in with a simple traceroute
to the box.

If a pullup is deemed unwise at this stage, I'd rather disable the
rate limits by default (router requirements/ipv6 rfcs notwithstanding)
for the 1.5 release.

Opinions?

--jhawk