Subject: cloned route handling
To: None <tech-net@netbsd.org>
From: Jun-ichiro itojun Hagino <itojun@iijlab.net>
List: tech-net
Date: 01/27/2001 03:29:32
There are couple of open PRs regarding to cloned routes.
I'm trying to solve them, but there are way too many design
choices. So I try to summarize the current situation first.
sorry for low readability.
In the email, i'll use the word "clone parent" for the source of clone
routes (interface routes like 10.0.0.0/24), and "clone child" for
the cloned routes (10.0.0.1/32 generated by ARP or other things).
See the summary at the end for differences among *BSD.
Here are proposed changes: (1), (2) should be perfectly adequate
change. (3) needs some debate, I'm not sure at this moment.
Related to (4), we may want to put DoS prevention code for redirect
floods.
reason behind the proposal:
- Identification of clone children, as stated in (1). Seems to me
we don't need additional flag for marking cloned children. We
just need to check rtentry.rt_parent.
- I believe current NetBSD behavior for (2) is rather confusing. I
believe ARP entries should better go away when interface address goes
away (if really necessary we can issue ARP again to resolve again).
- If we introduce behavior (3), PR11916 will be solved. not sure if
it is okay to do this. It seems to me that (3) has to be decided by
each AF (for example, it should be safe to allow overwrite of
incomplete ARP entry for netinet). Not sure how it should be done.
- Inactivity timer implementation like (4). I believe the current
netbsd situation is fine. We have code to phase out redirected
routes, and also have high/low watermark on # of redirected routes,
to avoid remote (onlink) DoS.
- I don't like behavior (5) of freebsd/bsdi, local DoS is not nice.
It should be okay to clone when ICMP more fragment messages come.
We also have good validation code against ICMP more fragment packet,
as well as high/low watermark for PMTUD routes.
- (6): not really necessary, it seems to me.
also we will want to make sure rt_ifa do not point to obsolete
interface address, on interface address removal.
itojun
Among 4 BSDs, there are certain difference in cloned route handling:
- NetBSD and OpenBSD routing code is very close to 4.4BSD code.
(1) It only has RTF_CLONING for clone parent, and does not mark clone
children.
(2) Even when clone parent gets removed (by "ifconfig -alias" clone
children stays there (it is hard to do since we do not mark
clone children)
(3) Suppose we have incomplete ARP entry for 10.0.0.1. if we try to
add a host route manually for 10.0.0.1, it will fail with EEXIST.
(4) Inactivity timer for cloned children: supplied by rt_timer_add(),
but it only provides hooks to code outside of route.c.
sys/netinet or sys/netinet6 can give detailed control against
inactivity timer behavior (flexible but need customized code).
(5) in_pcbconnect() does not ask for cloned route.
(6) clone API: rtalloc() clones route if RTF_CLONING is set on parent.
we have no "force cloning" API.
- BSD/OS: has some tricks, implemented in sys/net/route.c.
(1) It has RTF_CLONING for clone parent, and RTF_CLONED for children.
rtentry.rt_parent refers parent from child.
(2) If clone parent route gets nuked, clone children will be removed
automatically by sys/net/route.c.
(3) If we try to add a route manually, it can overwirte clone child
routes or redirected routes (RTF_CLONED | RTF_DYNAMIC).
(4) sys/net has inactivity timer for any cloned children (those generated by
ARP, PMTUD and rtredirect), by default. Each AFs reuse
sys/net behavior. All AFs use same inactivity timer duration.
(5) in_pcbconnect() asks for cloned route. It has very bad sideeffect: local
denial-of-service (non-root user can fill up kernel routing table and
make it impossible for the node to use network, just by running tons of
sendto(2)). Good things are that we can collect PMTUD results easier,
inpcb.inp_route has host route so lookup is simpler, and we can verify
inbound ICMP more fragment messages by looking at routing table (if we have
clone child, we are sure that we have contacted the destination in the past)
(6) clone API: rtcalloc() always clone route, regardless from RTF_CLONING
bit on parent. rtalloc() clones route if parent has RTF_CLONING set.
last arg of rtalloc1() is overloaded to mean multiple things (i find it
hard to read).
- FreeBSD: has some tricks, implemented mostly in sys/netinet/in_rmx.c.
(1) It has RTF_CLONING and RTF_PRCLONING for clone parent,
and RTF_WASCLONED for children. RTF_PRCLONING means "the protocol asks for
cloning", but why is it separate from RTF_CLONING? Not sure...
rtentry.rt_parent refers parent from child.
(2) If clone parent route gets nuked, clone children will be removed
automatically by sys/net/route.c.
(3) If we try to add a route manually, it can overwirte clone child
routes.
(4) sys/netinet/in_rmx.c has inactivity timer for cloned children
generated by ARP or PMTUD (no timer for redirected routes).
(5) in_pcbconnect() asks for cloned route. It has bad sideeffect and
some good points just like BSD/OS.
(6) clone API: rtcalloc1() has additional argument, "ignflags".
rtalloc() and rtalloc_ign() are supplied. With rtalloc_ign() caller
can ask it to ignore certain cloning flag bit (RTF_CLONING or RTF_PRCLONING).