Re: Improving use of rt_refcnt

To: Dennis Ferguson <dennis.c.ferguson%gmail.com@localhost>
Subject: Re: Improving use of rt_refcnt
From: Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
Date: Thu, 9 Jul 2015 23:10:54 +0200

On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
> If you don't want it to work this way then you'll need to replace the
> radix tree with something that permits changes while readers are
> concurrently operating.  To take best advantage of a more modern data
> structure, however, you are still not going to want readers to ever
> write the shared data structure if that can be avoided.  The two
> atomic operations needed to increment and decrement a reference count
> greatly exceed the cost of a (well-cached) route lookup.

Let me pick the discussion up at this point since David mentioned that
my last reply was somewhat terse. I think the current radix tree serves
three different purposes right now:

(1) Manage the view of the connectivity to the outside world in a way
coherent with the administrator's intention or some routing
protocol/daemon.

(2) Provide a mechanism for finding the next-hop for traffic to not
directly attached networks.

(3) Provide a mechanism for finding L2 addresses on directly attached
networks.

Using a single data structure for this has the advantage of code sharing
and can make detailed accounting very easy. It has the problem of
overhead and mixing data of different levels of volatility. I would like
to see the three mechanisms to be separated with appropiate data
structures for each case. The first point would be moved out completely
from the hot path, the actual packet handling case. It would then be no
longer as performance sensitive, so options for storage can be more
focused on size.

For finding the next-hop, the problem is simplified. The number of
next-hop addresses is (normally) limited by the size of the network
neighborhood. Even a backend router at one of the major Internet
exchange points will not have more than a few thousand next-hops,
compared to having 200k routes or more. This can be exploited to reduce
the data size of the BMP lookup data structure and by removing redundant
entries, e.g. a longer prefix with the same next-hop as a shorter
prefix. As I mentioned in my earlier email, the next-hop entry can and
should store a reference to whatever L2 data is needed, so that no
additional search is needed.

For the L3->L2 address mapping, the problem changes from BMP search to
an exact match search. If the mapping is managed correctly, it makes
sense to do this (cheap) search first and skip the whole BMP lookup on
a match as redundant. Hash tables and the like have also nice properties
for read-mostly updates and cache density.

Joerg

Follow-Ups:
- Re: Improving use of rt_refcnt
  - From: Ryota Ozaki

References:
- Improving use of rt_refcnt
  - From: Ryota Ozaki
- Re: Improving use of rt_refcnt
  - From: David Young
- Re: Improving use of rt_refcnt
  - From: Ryota Ozaki
- Re: Improving use of rt_refcnt
  - From: Joerg Sonnenberger
- Re: Improving use of rt_refcnt
  - From: Ryota Ozaki
- Re: Improving use of rt_refcnt
  - From: Dennis Ferguson

Prev by Date: Re: kgdb on amd64
Next by Date: Re: [PATCH] Fixing soft NFS umount -f, round 5
Previous by Thread: Re: Improving use of rt_refcnt
Next by Thread: Re: Improving use of rt_refcnt
Indexes:

Home | Main Index | Thread Index | Old Index