Subject: Re: Racoon (or UVM?) problem with -current
To: Urban Boquist <urban@boquist.net>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 02/11/2002 14:54:40
On Mon, Jan 14, 2002 at 09:39:52PM +0100, Urban Boquist wrote:
> >>>>> Paul Dokas writes:
> 
> Paul> I had a rather strange panic on Friday night.  My work x86
> Paul> -current (128.101.AAA.BBB) machine paniced while I was
> Paul> attempting to create an IPSec session between it and my X86
> Paul> -current home machine (66.41.CCC.DDD).
> 
> This panic look vaguely similar to those in PR kern/13813 (by Luke
> Mewburn). I personally haven't been able to use ipsec at all for about
> half a year. At the end of the PR you'll see the exact cvs commit that
> started the breakage for me.

I think that I might have found the problem here (and possibly a sol'n).


In /sys/netinet/in_pcb.c, line 438, there is the following bit of code:

------------------------------SNIP-SNIP------------------------------
void
in_pcbdisconnect(v)
        void *v;
{
        struct inpcb *inp = v;
 
        inp->inp_faddr = zeroin_addr;
        inp->inp_fport = 0;
        in_pcbstate(inp, INP_BOUND);
        if (inp->inp_socket->so_state & SS_NOFDREF)
                in_pcbdetach(inp);
#ifdef IPSEC 
        ipsec_pcbdisconn(inp->inp_sp);
#endif  
}
------------------------------SNIP-SNIP------------------------------


From my reading of in_pcbdetach() and ipsec_pcbdisconn(), it seems that
they both end up calling key_freesp() on what I think are the same structures.
I'm pretty sure that they're the same because of how the cache is filled.
Here's the ipsec_fillpcbcache() from /sys/netinet6/ipsec.c:

------------------------------SNIP-SNIP------------------------------
static int
ipsec_fillpcbcache(pcbsp, m, sp, dir) 
        struct inpcbpolicy *pcbsp;
        struct mbuf *m;
        struct secpolicy *sp;
        int dir;
{ 

        switch (dir) {
        case IPSEC_DIR_INBOUND:
        case IPSEC_DIR_OUTBOUND:
                break;
        default:
                return EINVAL;
        }
#ifdef DIAGNOSTIC
        if (dir >= sizeof(pcbsp->cache)/sizeof(pcbsp->cache[0]))
                panic("dir too big in ipsec_checkpcbcache");
#endif

        if (pcbsp->cache[dir])
                key_freesp(pcbsp->cache[dir]);
        pcbsp->cache[dir] = NULL;
        if (ipsec_setspidx(m, &pcbsp->cacheidx[dir], 1) != 0) {
                return EINVAL;
        }
        pcbsp->cache[dir] = sp;
        if (pcbsp->cache[dir]) {
                pcbsp->cache[dir]->refcnt++;
                KEYDEBUG(KEYDEBUG_IPSEC_STAMP,
                        printf("DP ipsec_fillpcbcache cause refcnt++:%d SP:%p\n",
                        pcbsp->cache[dir]->refcnt, pcbsp->cache[dir]));
        }
        pcbsp->cachegen[dir] = sp_cachegen;

        return 0;
}
------------------------------SNIP-SNIP------------------------------

Notice that when a the cache line is filled, it's with an assignment:

        pcbsp->cache[dir] = sp;

This is what's then eventually passed to key_freesp(), twice.  Once via
in_pcbdetach() and then once via ipsec_pcbdisconn().


Also, all of the stack traces that I've seen seem to revolve around key_freesp()
and/or they seem to indicate that the memory has already been freed.

I haven't tested a fix yet, but I suspect that it might be as simple as not
calling key_freesp() from ipsec_fillpcbcache(), ipsec_invalpcbcache() or any
of the other places that it's called from.  See the patches mentioned in
kern/13813.


Could someone more familiar with this section of code please take a look and
see if I'm anywhere near right on this?


Paul
-- 
Paul Dokas                                            dokas@cs.umn.edu
======================================================================
Don Juan Matus:  "an enigma wrapped in mystery wrapped in a tortilla."