Subject: Re: Racoon (or UVM?) problem with -current
To: None <itojun@iijlab.net>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 02/19/2002 15:49:03
On Thu, Feb 14, 2002 at 07:38:43AM +0900, itojun@iijlab.net wrote:
> >> On Tue, Feb 12, 2002 at 10:20:08AM -0800, Bill Studenmund wrote:
> >> >
> >> > Another solution is, do the freers check to see if the value is NULL
> >> > before freeing? If so, have the pointer set to NULL after it is freed.
> >> > Then it gets freed only once.
> >>
> >> Looking over the code, I kinda doubt that key_freesp() is being called
> >> twice on the same entry outside of the cache.  I'm pretty sure that it's
> >> the cache that's doing it.  Also, the pointers to the entry appear to be
> >> copied all over, not just in the cache code.  I don't know that it would
> >> be possible to set all of the pointer to NULL, let alone know where they
> >> all are.
> >
> >Then we might need to start doing refcounting.
> 
> 	we do reference-count struct inpcbpolicy.  also, I couldn't find the
> 	problem described in earlier email (pcbsp->cache[x] freed twice).
> 	key_freesp() is called in ipsec_invalpcbcache(), however, after calling
> 	key_freesp() pcbsp->cache[x] is set to NULL.

Yes, you're right.  From a much more indepth examination of the code, I
don't think that the problem is that the same SP is being freed twice.
The ref. counting appears sound.

However, there *is* a bug in there somewhere, I just intentionally caused my
NetBSD-current machine to panic with this bug.  Here's what I've found:

  
Backtrace:

  key_delsp(c070f000,c6e3ec48,c0657000,c03b2100,c06af80c at key_delsp+0x63
  key_freesp(c070f000,c6e3ec48,c6e3ed50,c01ef95c) at key_freesp+0x57
  ipsec_invalpcbcache(c06af800,1,5857,87640101,c0657300) at ipsec_invalpcbcache+0x44
  gcc2_compiled.(c0657000,c006af800,1,1,c0657300) at gcc2_compiled.+0x46
  ipsec4_getpolicybysock(c0657000,1,c06ae294,c6e3ed0c,c0657300) at ipsec4_getpolicybysock+0x7e
  ipsec4_in_reject_so(c0657000,c06ae294,c0657300,c0657300) at ipsec4_in_reject_so+0x35
  rip_input() at rip_input+0x196
  icmp_input() at icmp_input+0x449
  ip_input() at ip_input+0x668
  ipintr() at ipintr+0x6b
  Xsoftnet() at Xsoftnet+0x2c


Here's the assembly for key_freesp() as created by 'cc -S ...'
I've marked the place that it's dying with a '<=-'

===========================================================
.section        .rodata
        .align 32
.LC27:
        .ascii "key_delsp: NULL pointer is passed.\12\0"
.LC28:
        .ascii "DP delsp calls free SA:%p\12\0"
.text
        .align 4
        .type    key_delsp,@function
key_delsp:
        pushl %ebp
        movl %esp,%ebp
        subl $28,%esp
        pushl %edi
        pushl %esi
        pushl %ebx
        movl 8(%ebp),%edi
        testl %edi,%edi
        jne .L368
        addl $-12,%esp
        pushl $.LC27
        call panic
        .align 4
.L368:
        movl $0,280(%edi)
        cmpl $0,8(%edi)
        jg .L367
        movl cpl,%eax
        movl %eax,-4(%ebp)
        orl imask+28,%eax
        movl %eax,cpl
#APP
#NO_APP
        movl (%edi),%edx
        testl %edx,%edx
        jne .L390
        cmpl $0,4(%edi)
        je .L371
        jmp .L376
        .align 4
.L390:
        movl 4(%edi),%eax
        movl %eax,4(%edx)
.L376:
        movl 4(%edi),%edx
        movl (%edi),%eax
        movl %eax,(%edx)       <=-
.L371:
        movl 288(%edi),%esi
        movl -4(%ebp),%eax
        movl %eax,-8(%ebp)
        notl -8(%ebp)
        testl %esi,%esi
        je .L379
        .align 4
.L380:
        movl 272(%esi),%edx
        testl %edx,%edx
        je .L381
        movl key_debug_level,%eax
        andl $65,%eax
        cmpl $65,%eax
        jne .L383
        addl $-8,%esp
        pushl %edx
        pushl $.LC28
        call printf
        addl $16,%esp
        .align 4
.L383:
        addl $-12,%esp
        pushl 272(%esi)
        call key_freesav
        movl $0,272(%esi)
        addl $16,%esp
.L381:
        movl (%esi),%ebx
        addl $-8,%esp
        pushl $95
        pushl %esi
        call free
        movl %ebx,%esi
        addl $16,%esp
        testl %esi,%esi
        jne .L380
.L379:
        addl $-12,%esp
        pushl %edi
        call keydb_delsecpolicy
        addl $16,%esp
#APP
#NO_APP
        movl -4(%ebp),%eax
        movl %eax,cpl
        movl ipending,%eax
        testl %eax,-8(%ebp)
        je .L367
        call Xspllower
.L367:
        leal -40(%ebp),%esp
        popl %ebx
        popl %esi
        popl %edi
        leave
        ret
.Lfe10:
        .size    key_delsp,.Lfe10-key_delsp
.section        .rodata
        .align 32
.LC29:
        .ascii "key_getsp: NULL pointer is passed.\12\0"
===========================================================

Here's the registers after the panic:

ds	0x10
es	0x10
fs	0x10
gs	0x10
edi	0xc070f000 end+0x352ba0
esi	0x01
ebp	0xc6e3eb10 end+0x6a826b0
ebx	0xc070f000 end+0x352ba0
edx	0x01
ecx	0
eax	0xc0676800 end+0x2ba3a0
eip	0xc0225637 key_delsp+0x63
cs	0x8
eflags	0x10286
esp	0xc6e3eae8 end+0x6a82688
ss	0x10

===========================================================

And the contents of memory pointed to by %edi and %eax
0xc070f000:	c0676800	1		0		c05ffc00	c0fd600		0
0xc070f018:	0		0		0		0		0		0

0xc0676800:	0xc0773c00	1		1		c05ff880	c05fd600	0



What's interesting is that %edi and %eax appear to point to
secpolicy structs, in which the first 2 long words are le_next
and le_prev pointers from LIST_ENTRY(secpolicy).  And, the place
that it's dying appears to be in the middle of the LIST_REMOVE(sp, chain),
found in key_freesp().

So, why is it dying?  Well, it looks like *le_prev is 0x01, which is
obviously not a valid pointer (le_prev is a pointer to a pointer).


Perhaps there's a missing LIST_INIT() somewhere?  Or worse yet, there's
a buffer overflow somewhere that is overwriting one of the secpolicy structs.

Personally, I'm a little suspicious of a buffer overflow, le_prev is the
*2nd* entry in the structure and the first is not a problem.

One other thing that might be happening here is that perhaps there are
some missing calls to splsoftnet()/splx(), like, for instance, the
call to LIST_INSERT_TAIL() at line 1621 of netkey/key.c, or the call to
LIST_INSERT_HEAD() at line 1998 of netkey/key.c, neither of which are
protected by splsoftnet()/splx().

In fact, there appear to be several more LIST_() operations that are
not splsoftnet()/splx() protected throughout netkey/key.c.  Is it not necessary?
I'm just not experienced enough with this code to know, but I would imagine
that it would be a bad thing to be half way through a LIST_FOREACH() that is
not protected by splsoftnet()/splx() and then have LIST_REMOVE() called from
elsewhere.


Paul
-- 
Paul Dokas                                            dokas@cs.umn.edu
======================================================================
Don Juan Matus:  "an enigma wrapped in mystery wrapped in a tortilla."