Subject: Re: Multipath
To: None <tech-net@NetBSD.org>
From: David Young <dyoung@pobox.com>
List: tech-net
Date: 08/23/2007 03:59:09
On Thu, Aug 23, 2007 at 10:51:55AM +0300, Mihai Chelaru wrote:
> Hi,
> 
> I'm working on a small patch [1] to get route multipath working in order to 
> make some private tests. While I'm not happy with the approach (chained 
> rtentries), I did try to stay away from radix tree implementation because I 
> didn't feel very comfortable working directly there :)

It is good that you stayed away from the radix trie, and you should stay
far away. :-) Both experience and another developer have convinced me
that it is a bad place to add this function.

> But now I wonder how should a rtsock response to a RTM_GET query look like if 
> there are multiple paths for the same destination ? For now I return only one 
> path in a round-robin fashion but I don't know if this is the correct thing 
> to do. I wonder how other BSDs are doing this.

At least at one time, OpenBSD used RADIX_MPATH.  Besides the problem
that RADIX_MPATH fiddles with the radix trie, I found a couple of other
problems that arose when you had a mixture of direct & indirect routes
(link-level nexthop or IP nexthop) to the same destination.

IMO, RTM_GET should return the route that the kernel would choose, given
the information you provide (including source address...), or else an
error if its choice is ambiguous or inconstant.

BTW, round-robin is not ordinarily a good multipath routing policy.
RADIX_MPATH references gateway "selection by Modulo-N Hash (RFC2991)".
Check that out.

> [1] - http://kefren.ngnetworks.ro/multipath.diff

I see a couple of places where you clear the RNF_ACTIVE bit in
rtrequest1(); resist the temptation.  You also memcpy the radix_nodes.
Do not touch the radix trie data structures directly, but use the
radix_node_head methods, instead.  I suggest doing a delete+add with
the old and new "head" rtentry.

I don't think that you should RTFREE() chained rtentries in rtflush(),
unless you are going to increase the reference count on an rtentry's
chained rtentries when you rtcache() it.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933 ext 24