Subject: Re: FAST_IPsec policy refcnt: "refcount" or "TTL", but not both
To: None <tech-net@NetBSD.org>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-net
Date: 05/19/2004 14:15:24
I reproduced the problem, I've backported a lightly-modified version
of the split key_sp_dead()/key_sp_unlink() from sys/netkey/key.c rev 1.119.

But I beleive there are still two quite elementary bugs in our KAME
sys/netkey/key.c.  The first is in key_spdflush(), at line 2267.

       for (sp = TAILQ_FIRST(&sptailq); sp; sp = nextsp) {
                nextsp = TAILQ_NEXT(sp, tailq);
                if (sp->persist)
                        continue;
                if (sp->state == IPSEC_SPSTATE_DEAD)
                        continue;
                key_sp_dead(sp);
                key_sp_unlink(sp);
                sp = NULL;
        }

When the loop hits the first non-dead deletable entry, it calls
key_sp_dead(), key_sp_unlink, and sets sp to NULL. Since now sp ==
NULL, the loop will exit immediately. Thus flushing the SPD will mark
at most SPD entry for deletion per SADB_SPDFLUSH operation.

Note that if the SPD entry is in use, it will have a nonzero refcount
after the unlink.  As far as I can tell, key_timehandler has another
instance of the same elementary bug:

	for (dir = 0; dir < IPSEC_DIR_MAX; dir++) {
                for (sp = LIST_FIRST(&sptree[dir]);
                     sp != NULL;
                     sp = nextsp) {
                        nextsp = LIST_NEXT(sp, chain);

                        if (sp->state == IPSEC_SPSTATE_DEAD) {
                                key_sp_unlink(sp);      /*XXX*/
                                sp = NULL;
                                continue;
                        }
		....
	}

The loop will delete at most one dead-but-not-deleted SPD, set sp to NULL,
and continue; sp == NULL so the loop exits.

If you manually delete a modest number (say, 2000) in-use SPD entries,
and key_timehanlder() is called at 1Hz, it'll take about 40 minutes to
drain them all for them.

When I patch this KAME (ill)logic into FAST_IPSEC, thats exactly what
I see.  So I beleive the patch below should be applied, and pulled up
to NetBSD 2.0.

[[Note: I haven't tested this with our KAME. I have tried the KAME key.c
rev 1.119 code in FAST_IPSEC, confirmed that it behaves as described,
and that the patch below fixes both problems.]]


--- key.c	2004-05-17 19:05:55.000000000 -0700
+++ key.c.new	2004-05-19 14:13:22.000000000 -0700
@@ -2264,7 +2264,8 @@
 			continue;
 		key_sp_dead(sp);
 		key_sp_unlink(sp);
-		sp = NULL;
+		/* Continue loop at nextsp */
+		sp = nextsp;	/* anything non-NULL */
 	}
 
 	/* invalidate all cached SPD pointers on pcb */
@@ -4334,7 +4335,8 @@
 
 			if (sp->state == IPSEC_SPSTATE_DEAD) {
 				key_sp_unlink(sp);	/*XXX*/
-				sp = NULL;
+				/* continue loop from nextsp */
+				sp = nextsp;	/* anything non-NULL */
 				continue;
 			}
 

The comments about sp == nextsp could be improved. Any other comments?