Subject: Re: Racoon (or UVM?) problem with -current
To: None <email@example.com>
From: Paul Dokas <firstname.lastname@example.org>
Date: 02/19/2002 15:49:03
On Thu, Feb 14, 2002 at 07:38:43AM +0900, email@example.com wrote:
> >> On Tue, Feb 12, 2002 at 10:20:08AM -0800, Bill Studenmund wrote:
> >> >
> >> > Another solution is, do the freers check to see if the value is NULL
> >> > before freeing? If so, have the pointer set to NULL after it is freed.
> >> > Then it gets freed only once.
> >> Looking over the code, I kinda doubt that key_freesp() is being called
> >> twice on the same entry outside of the cache. I'm pretty sure that it's
> >> the cache that's doing it. Also, the pointers to the entry appear to be
> >> copied all over, not just in the cache code. I don't know that it would
> >> be possible to set all of the pointer to NULL, let alone know where they
> >> all are.
> >Then we might need to start doing refcounting.
> we do reference-count struct inpcbpolicy. also, I couldn't find the
> problem described in earlier email (pcbsp->cache[x] freed twice).
> key_freesp() is called in ipsec_invalpcbcache(), however, after calling
> key_freesp() pcbsp->cache[x] is set to NULL.
Yes, you're right. From a much more indepth examination of the code, I
don't think that the problem is that the same SP is being freed twice.
The ref. counting appears sound.
However, there *is* a bug in there somewhere, I just intentionally caused my
NetBSD-current machine to panic with this bug. Here's what I've found:
key_delsp(c070f000,c6e3ec48,c0657000,c03b2100,c06af80c at key_delsp+0x63
key_freesp(c070f000,c6e3ec48,c6e3ed50,c01ef95c) at key_freesp+0x57
ipsec_invalpcbcache(c06af800,1,5857,87640101,c0657300) at ipsec_invalpcbcache+0x44
gcc2_compiled.(c0657000,c006af800,1,1,c0657300) at gcc2_compiled.+0x46
ipsec4_getpolicybysock(c0657000,1,c06ae294,c6e3ed0c,c0657300) at ipsec4_getpolicybysock+0x7e
ipsec4_in_reject_so(c0657000,c06ae294,c0657300,c0657300) at ipsec4_in_reject_so+0x35
rip_input() at rip_input+0x196
icmp_input() at icmp_input+0x449
ip_input() at ip_input+0x668
ipintr() at ipintr+0x6b
Xsoftnet() at Xsoftnet+0x2c
Here's the assembly for key_freesp() as created by 'cc -S ...'
I've marked the place that it's dying with a '<=-'
.ascii "key_delsp: NULL pointer is passed.\12\0"
.ascii "DP delsp calls free SA:%p\12\0"
movl %eax,(%edx) <=-
.ascii "key_getsp: NULL pointer is passed.\12\0"
Here's the registers after the panic:
edi 0xc070f000 end+0x352ba0
ebp 0xc6e3eb10 end+0x6a826b0
ebx 0xc070f000 end+0x352ba0
eax 0xc0676800 end+0x2ba3a0
eip 0xc0225637 key_delsp+0x63
esp 0xc6e3eae8 end+0x6a82688
And the contents of memory pointed to by %edi and %eax
0xc070f000: c0676800 1 0 c05ffc00 c0fd600 0
0xc070f018: 0 0 0 0 0 0
0xc0676800: 0xc0773c00 1 1 c05ff880 c05fd600 0
What's interesting is that %edi and %eax appear to point to
secpolicy structs, in which the first 2 long words are le_next
and le_prev pointers from LIST_ENTRY(secpolicy). And, the place
that it's dying appears to be in the middle of the LIST_REMOVE(sp, chain),
found in key_freesp().
So, why is it dying? Well, it looks like *le_prev is 0x01, which is
obviously not a valid pointer (le_prev is a pointer to a pointer).
Perhaps there's a missing LIST_INIT() somewhere? Or worse yet, there's
a buffer overflow somewhere that is overwriting one of the secpolicy structs.
Personally, I'm a little suspicious of a buffer overflow, le_prev is the
*2nd* entry in the structure and the first is not a problem.
One other thing that might be happening here is that perhaps there are
some missing calls to splsoftnet()/splx(), like, for instance, the
call to LIST_INSERT_TAIL() at line 1621 of netkey/key.c, or the call to
LIST_INSERT_HEAD() at line 1998 of netkey/key.c, neither of which are
protected by splsoftnet()/splx().
In fact, there appear to be several more LIST_() operations that are
not splsoftnet()/splx() protected throughout netkey/key.c. Is it not necessary?
I'm just not experienced enough with this code to know, but I would imagine
that it would be a bad thing to be half way through a LIST_FOREACH() that is
not protected by splsoftnet()/splx() and then have LIST_REMOVE() called from
Paul Dokas firstname.lastname@example.org
Don Juan Matus: "an enigma wrapped in mystery wrapped in a tortilla."