Subject: Re: Summer of Code: Policy routing / Implement IPv6 ipflow_fastforward
To: NetBSD Network Mailing List <tech-net@netbsd.org>
From: Ivo Vachkov <ivo.vachkov@gmail.com>
List: tech-net
Date: 06/17/2005 12:25:35
Mr. Stone,

With all my respect, I think this is supposed to be one-summer student
project. So I think NetBSD team expects far less than a "perfect
solution" TM in the realisation of the project. I (and probably all
WE) agree and thank you for the directions and documentation on the
subject, but please don't start flamewars on that.

After all, IMHO, these projects are supposed to make CS students
familiar with NetBSD and optionally gain some ideas or implement some
feature. I doubt someone expects proffesional level in design and
algorithms or 100% bug free implementation.

On 6/17/05, Jonathan Stone <jonathan@dsg.stanford.edu> wrote:
>=20
> In message <oqy8994j3k.fsf@castrovalva.Ivy.NET>,
> Miles Nordin writes:
>=20
> >I did some secret testing with a proprietary packet generator
> >(Imperfect Networks ``ThreatEX''), and in my Alpha PWS433, I found the
> >machine froze up (became completely unresponsive at the console, and
> >lost RTC time) at:
> >
> >    interface      pps
> >    tlp            6500
> >    bge            70000 - 90000
> >
> >For the tlp, the serial console filled with ``receive overrun''---no
> >rate limiting of the error messages, apparently.  For bge, I'd tuned
> >the hw.bge.rx_lvl to 5, the highest value it would accept.  For the
> >tlp, the thing collapsed at about the same spot whether it was
> >forwarding the traffic or blocking it.  For bge, I had only one card,
> >so I had to block the traffic---otherwise it'd be forwarded out a tlp,
> >and that brought the pps down to 6500 again.
>=20
>=20
>=20
> >I think that enshrining existing code ought to be done _after_ taking
>=20
> [... snip one very large run-on single paragraph, mostly of questions,
> some of which are, how to put it, already widely discussed here over
> the past few days...]
>=20
> >I think in the spirit of NetBSD's claiming we
> >optimize things that matter rather than boasting about excessively
> >clever hand-tuned algorithms in spots that aren't a bottleneck,
>=20
> Huh?  Who said anything about hand-tuned algorithms?  Or about what is
> or isn't a bottleneck?  Who says that a naive implementation of
> policy-based' routing guided by someone ignorant of the last 10 years
> of relevant research isn't a bottleneck?  Or do *you* think a linear-list
> lookup is an adequate approach for a multi-dimensional lookup problem?
>=20
>=20
> > this
> >interrupt/softint/ipl thing needs to get better understood before
> >applying excessive pressure to people working on the routing table,
> >because it looks pretty clear that interrupts-per-second is the
> >limiting value right now, not size/scalability of the routing table.
>=20
> Huh?? What are you smoking?
>=20
> I have P4 /Xeon machines which sustain packet rates, with 1500-byte
> packets, at about four times that rate without difficulty, on multiple
> bge devices. The bge interrupt rate, at maximum interrupt mitigation
> (set to 5, as above) never, *ever* reaches 4,000 interrupts/sec/bge;
> never mind exceeding that value.  wm(4) can support similar packet
> rates. if_wm shows rather higher interrupt rate, circa 6.7k/sec for
> 1500-byte packets filling a gigabit link, but somewhat lower CPU
> utilization.
>=20
> Your claim that interrupts-per-second is a limiting value isn't just
> incorrect; it's unsupportable. At least, for normal, non-DOS traffic.
> this is well-known, has been for years.  Or perhaps decades:
> algorithmic analysis and profiling data of patricia-tree lookups goes
> back to Knuth, iirc, and profiling data of the BSD routing
> implementation goes back decades.
>=20
> Now, lets consider policy-based routing.  Policy-based routing,
> by definition, changes the lookup for "next hop" from a one-dimensional
> lookup (longest-match look on destination address, for which BSD
> kernels use a PATRICIA tree) to a multi-dimensional lookup
> problem. That's inherently a much harder problem.  Commercial routers[
> use content-addressable memory (CAM) hardware, typically ternary
> CAMs. Software approaches include grid-of-tries, multi-bit vectors,
> and other approaches.
>=20
> Here are a few references that may serve as a useful starting point;
> (not meant to be exhaustive: a small sample of recent work, not
> complete or up-to-date by oany means, but it does touches several
> recent authors in the field):
>    * The SIGCOMM '97 paper by Waldvogel, Varghese et. al
>    * The SIGCOMM '98 paper by Srinvasan, Suri, and Vargese
>    * The SIGCOMM '99 paper by  Baboescu and Vargese
>      (available at http://www-cse.ucsd.edu/users/varghese/publications.ht=
ml,
>      along with other more recent work)
>    *  Any of several papers by p. Gupta and N. Mckeown
>       between 1999 and 2001.
>=20
> Note that at least some of Varghese et al.'s work is patented, and may
> therefore be unsuitable for an open-source project.
>=20
> Now look at what David states as his requirement: *five* dimensions,
> in addition to source and destination IP addresses.  In other words,
> that's  a seven-dimensional lookup. Here it is again:
>=20
>=20
> On Tue, 14 Jun 2005 12:52:34 CDT, David Young wrote:
> >On Tue, Jun 07, 2005 at 12:56:12PM +0200, Ivo Vachkov wrote:
>=20
> [...]
> >> Policy Routing:
> >> - extend "struct rtentry" to include additional information for TOS
> >> fields, Source based routing, maybe even protocol based routing, ttl
> >> routing, packet lenght routing
> >
> >IMO, these are the bare minimum fields we must be able to route by:
> >
> >        * ToS field
> >        * protocol/port number
> >        * packet length
> >        * packet labels (tokens attached to a packet by IPFilter, pf,
> >          or the input interface---e.g., m_pkthdr.rcvif)
> >
> >A good solution should be easily extensible.  And fast.  A big question =
in
> >my mind is, "what is the architecture of fast, extensible policy routing=
?"
> >How is this (not) accomplished in other systems?
>=20
> One can make a good running start at an answer to the first part of
> that so-called "big question" by 5 minutes with Goegle or Citeseer
> searches for the term "packet classification" , and then reading some
> of the relevant literature.  The second part of the "big question" can
> be addressed by examining either router vendor datasheets, or other
> open-source projects, depending on just which kind of "other systems"
> one is considering.
>=20
> If David is short on clue with regard to policy-basd routing (and his
> phrasing of some readily-answered questions, above, as being a "big
> question" strongly suggests that he isn't) then how is David qualified
> to be setting requirements for a project on policy-based routing?
>=20


--=20
"UNIX is basically a simple operating system, but you have to be a
genius to understand the simplicity." Dennis Ritchie