Subject: Bugs in PF_KEY marshalling, socket-buffer overflow
To: None <tls@rek.tjls.com, tech-net@NetBSD.org>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 05/19/2004 17:33:20
In message <20040518025301.GA13564@panix.com>Thor Lancelot Simon writes
>On Mon, May 17, 2004 at 03:59:30PM -0400, der Mouse wrote:

>The PF_KEY interface is _broken_ due to a bug in the kernel code for
>marshalling SAs into sequences of messages.  Rather than fix that,
>some months ago a developer chose to make kernfs _required_ for correct 
>operation instead.
>
>Now you can use kernfs (which is optional) or sysctl (which isn't, but
>which is *not* the standards-defined interface for this task) but you
>still can't use PF_KEY.

The situation is even worse than that.  Even with kernfs/sysctl as a
kludge-around for the underlying bugs with dump requests, IPsec still
just doesn't work properly under moderate-to-heavy load. Why? Because
the same basic design flaws pop up in other places.

I'm not looking to flame; I'm looking for ideas on how to fix the
inadequacies of our PF_KEY: basically the same inadequacies across
all four of {Free,Net}BSD * {KAME,FAST-} IPsec.

Racoon still uses the PF_KEY API to import the entire SPD on startup.
So a quite modest SPD (modest by the performance standards FAST_IPSEC
can sustain), racoon will still hit the KAME bug in handling PF_KEY
DUMP requests.  I've modified the default PF_KEY socketbuffer limits,
but even I find racoon fails to load the SPD at just over 2,000 SPD entries.

Its also ... trivial to trigger ACQUIREs to racoon at a sufficiently
high rate that (at least for my FAST_IPSEC tree), racoon stats
warnings about malformed ACQUIREs.

I beleive its the same basic flaw in the KAME PF_KEY implementation:
PF_KEY sockets are not reliable, and the implementation is not robust
against socket-buffer overflow (truncation or message drop).  If you
send enough ACQUIREs fast enough, the ACQUIRE messages build up faster
than racoon can read the ACQUIRE and process IKE exchanges.
Eventually the ACQUIREs overflow the socket queue, leading (in my
tree, at least) to truncated or invalid messages.

This happens very reproducibly at levels that are really quite modest,
compared to the performance FAST_IPSEC can deliver. I see it with
simultaneous connection requests from between 400 and 600 IPsec peers
(more easily reproduced from one peer, with 600 distinct IP addresses,
600 matching SPDs, and 600 carefully-crafted ICMP echo requests).

Again: I'm not looking to flame. I am looking for ways to fix these
problems.