tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Improving use of rt_refcnt



On Fri, Jul 10, 2015 at 6:10 AM, Joerg Sonnenberger
<joerg%britannica.bec.de@localhost> wrote:
> On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
>> If you don't want it to work this way then you'll need to replace the
>> radix tree with something that permits changes while readers are
>> concurrently operating.  To take best advantage of a more modern data
>> structure, however, you are still not going to want readers to ever
>> write the shared data structure if that can be avoided.  The two
>> atomic operations needed to increment and decrement a reference count
>> greatly exceed the cost of a (well-cached) route lookup.
>
> Let me pick the discussion up at this point since David mentioned that
> my last reply was somewhat terse. I think the current radix tree serves
> three different purposes right now:
>
> (1) Manage the view of the connectivity to the outside world in a way
> coherent with the administrator's intention or some routing
> protocol/daemon.
>
> (2) Provide a mechanism for finding the next-hop for traffic to not
> directly attached networks.
>
> (3) Provide a mechanism for finding L2 addresses on directly attached
> networks.
>
> Using a single data structure for this has the advantage of code sharing
> and can make detailed accounting very easy. It has the problem of
> overhead and mixing data of different levels of volatility. I would like
> to see the three mechanisms to be separated with appropiate data
> structures for each case. The first point would be moved out completely
> from the hot path, the actual packet handling case. It would then be no
> longer as performance sensitive, so options for storage can be more
> focused on size.
>
> For finding the next-hop, the problem is simplified. The number of
> next-hop addresses is (normally) limited by the size of the network
> neighborhood. Even a backend router at one of the major Internet
> exchange points will not have more than a few thousand next-hops,
> compared to having 200k routes or more. This can be exploited to reduce
> the data size of the BMP lookup data structure and by removing redundant
> entries, e.g. a longer prefix with the same next-hop as a shorter
> prefix. As I mentioned in my earlier email, the next-hop entry can and
> should store a reference to whatever L2 data is needed, so that no
> additional search is needed.
>
> For the L3->L2 address mapping, the problem changes from BMP search to
> an exact match search. If the mapping is managed correctly, it makes
> sense to do this (cheap) search first and skip the whole BMP lookup on
> a match as redundant. Hash tables and the like have also nice properties
> for read-mostly updates and cache density.
>
> Joerg

http://www.netbsd.org/~ozaki-r/separate-l2-nexthop.diff

I've written a POC patch toward the nexthop cache separation.
It is mostly based on FreeBSD implementation.

It's still not mature; there remain debug codes and many places
are probably broken. But it works anyway, at least it passes
most of ATF tests.

This is a demonstration to show what we need to do if we go
the direction. I don't intend to commit it soon.

  ozaki-r


Home | Main Index | Thread Index | Old Index