Subject: netkey API has severe problems
To: None <tech-kern@netbsd.org>
From: Charles M. Hannum <root@ihack.net>
List: tech-kern
Date: 04/03/2000 12:03:16
Let's review some of the bad things that happen with racoon.  Many of
these problems will affect *any* IKE/ISAKMP daemon, not just racoon.

* It's possible to get the SPD and racoon out of sync, because racoon
  does not manage the SPD itself.

* When this happens, key_acquire() starts continuously sending acquire
  requests to racoon, causing it to negotiate many SAs.  (E.g. try a
  ping flood in this case.  You'll end up with thousands of useless
  SAs.)

* It's possible for SAD entries to expire (or reach their soft limit)
  at a different time in the kernel than they do in racoon (c.f. the
  timeout() lossage mention in my racoon PR).

* Because we always use the oldest SA to send a packet, when one
  machine is rebooted, or the SAs get out of sync in some other way,
  no communication is possible until all of the stale SAs time out.
  This is unacceptable.

* Furthermore, even in the `working' case, you can still lose packets
  while the old SAs are expiring.  This is an obvious race condition.

In short, the netkey API appears to be extremely suboptimal.

I propose a few things:

* If racoon is running, SPD entries should be managed by racoon.  It's
  certainly quite wrong to have done in both places, and it is better
  to have all configuration done by racoon in this case.

* This bit of lossage should go away.  We should NEVER resend an
  acquire message unless racoon has been restarted, or the previous
  message was lost due to lack of buffer space in the kernel.  (This
  is very similar to one of the key mistakes made in the Morris worm,
  actually...)

                if (key_blockacq_count < newacq->count) {
                        /* reset counter and do send message. */
                        newacq->count = 0;
                } else { 

* Expiration of keys (both soft and hard limits) should be handled
  entirely by the kernel -- sending a message to racoon when a new key
  needs to be negotiated.

* Rekeying MUST improve.  In theory, a key must remain valid for at
  least 2MSL after we stop sending with it.  This implies that either
  the hard limit must be implicitly extended for 2MSL, or we must
  switch to the new key 2MSL before the hard limit.  (We'll probably
  need a new `state' value for this.)

* The above suggestion does not, however, deal with the case of one
  machine rebooting.  I merely note that the race condition during SA
  expiration is just as bad as the race condition during SA setup, and
  therefore there is no reason -- even with the current code -- to
  choose the oldest SA over the newest SA.

These problems need to be solved ASAP if we are going to support IPsec
in 1.5.