Subject: Re: Summer of Code: Policy routing / Implement IPv6 ipflow_fastforward
To: Miles Nordin <carton@Ivy.NET>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 06/17/2005 01:08:33
In message <oqy8994j3k.fsf@castrovalva.Ivy.NET>,
Miles Nordin writes:

>I did some secret testing with a proprietary packet generator
>(Imperfect Networks ``ThreatEX''), and in my Alpha PWS433, I found the
>machine froze up (became completely unresponsive at the console, and
>lost RTC time) at:
>
>    interface      pps
>    tlp            6500
>    bge            70000 - 90000
>
>For the tlp, the serial console filled with ``receive overrun''---no
>rate limiting of the error messages, apparently.  For bge, I'd tuned
>the hw.bge.rx_lvl to 5, the highest value it would accept.  For the
>tlp, the thing collapsed at about the same spot whether it was
>forwarding the traffic or blocking it.  For bge, I had only one card,
>so I had to block the traffic---otherwise it'd be forwarded out a tlp,
>and that brought the pps down to 6500 again.



>I think that enshrining existing code ought to be done _after_ taking

[... snip one very large run-on single paragraph, mostly of questions,
some of which are, how to put it, already widely discussed here over
the past few days...]

>I think in the spirit of NetBSD's claiming we
>optimize things that matter rather than boasting about excessively
>clever hand-tuned algorithms in spots that aren't a bottleneck,

Huh?  Who said anything about hand-tuned algorithms?  Or about what is
or isn't a bottleneck?  Who says that a naive implementation of
policy-based' routing guided by someone ignorant of the last 10 years
of relevant research isn't a bottleneck?  Or do *you* think a linear-list
lookup is an adequate approach for a multi-dimensional lookup problem?


> this
>interrupt/softint/ipl thing needs to get better understood before
>applying excessive pressure to people working on the routing table,
>because it looks pretty clear that interrupts-per-second is the
>limiting value right now, not size/scalability of the routing table.

Huh?? What are you smoking? 

I have P4 /Xeon machines which sustain packet rates, with 1500-byte
packets, at about four times that rate without difficulty, on multiple
bge devices. The bge interrupt rate, at maximum interrupt mitigation
(set to 5, as above) never, *ever* reaches 4,000 interrupts/sec/bge;
never mind exceeding that value.  wm(4) can support similar packet
rates. if_wm shows rather higher interrupt rate, circa 6.7k/sec for
1500-byte packets filling a gigabit link, but somewhat lower CPU
utilization.

Your claim that interrupts-per-second is a limiting value isn't just
incorrect; it's unsupportable. At least, for normal, non-DOS traffic.
this is well-known, has been for years.  Or perhaps decades:
algorithmic analysis and profiling data of patricia-tree lookups goes
back to Knuth, iirc, and profiling data of the BSD routing
implementation goes back decades.

Now, lets consider policy-based routing.  Policy-based routing,
by definition, changes the lookup for "next hop" from a one-dimensional
lookup (longest-match look on destination address, for which BSD
kernels use a PATRICIA tree) to a multi-dimensional lookup
problem. That's inherently a much harder problem.  Commercial routers[
use content-addressable memory (CAM) hardware, typically ternary
CAMs. Software approaches include grid-of-tries, multi-bit vectors,
and other approaches.

Here are a few references that may serve as a useful starting point;
(not meant to be exhaustive: a small sample of recent work, not
complete or up-to-date by oany means, but it does touches several
recent authors in the field):
   * The SIGCOMM '97 paper by Waldvogel, Varghese et. al
   * The SIGCOMM '98 paper by Srinvasan, Suri, and Vargese 
   * The SIGCOMM '99 paper by  Baboescu and Vargese 
     (available at http://www-cse.ucsd.edu/users/varghese/publications.html,
     along with other more recent work)
   *  Any of several papers by p. Gupta and N. Mckeown 
      between 1999 and 2001.

Note that at least some of Varghese et al.'s work is patented, and may
therefore be unsuitable for an open-source project.

Now look at what David states as his requirement: *five* dimensions,
in addition to source and destination IP addresses.  In other words,
that's  a seven-dimensional lookup. Here it is again: 


On Tue, 14 Jun 2005 12:52:34 CDT, David Young wrote:
>On Tue, Jun 07, 2005 at 12:56:12PM +0200, Ivo Vachkov wrote:

[...]
>> Policy Routing:
>> - extend "struct rtentry" to include additional information for TOS
>> fields, Source based routing, maybe even protocol based routing, ttl
>> routing, packet lenght routing
>
>IMO, these are the bare minimum fields we must be able to route by:
>
>        * ToS field
>        * protocol/port number
>        * packet length
>        * packet labels (tokens attached to a packet by IPFilter, pf,
>          or the input interface---e.g., m_pkthdr.rcvif)
>
>A good solution should be easily extensible.  And fast.  A big question in
>my mind is, "what is the architecture of fast, extensible policy routing?"
>How is this (not) accomplished in other systems?

One can make a good running start at an answer to the first part of
that so-called "big question" by 5 minutes with Goegle or Citeseer
searches for the term "packet classification" , and then reading some
of the relevant literature.  The second part of the "big question" can
be addressed by examining either router vendor datasheets, or other
open-source projects, depending on just which kind of "other systems"
one is considering.

If David is short on clue with regard to policy-basd routing (and his
phrasing of some readily-answered questions, above, as being a "big
question" strongly suggests that he isn't) then how is David qualified
to be setting requirements for a project on policy-based routing?