tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: rtsock.c:rt_msg2() appears to be overwriting buffers it does not own



In article <20081024012342.GG12416%baea.com.au@localhost>,
Brett Lymn  <blymn%baesystems.com.au@localhost> wrote:
>-=-=-=-=-=-
>
>
>Folks,
>
>I have been looking at a reported memory corruption problem when pppd
>dies.  This was the motivation for the malloclog changes I posted
>previously.  With those changes in place all the reported instances of
>data being modified on the freelist had a neighbour allocation that
>was done in rt_sock.c at line 365.
>
>Inspecting the code there we can see that just before rt_msg2() is
>called to calculate the length of the buffer and then the buffer is,
>effectively, reallocated if it is too small.  Then rt_msg2() is called
>again with the resized buffer.  The way that rt_msg2() is called
>results in a memcpy() at line 658 being called, the comments indicate
>that it is assumed that the buffer is large enough to hold the
>information.  From what I can see there is nothing preventing the
>routing information being update between when rt_msg2() is called the
>first time to calculate the size of the buffer and the second time
>when the allocated buffer is used.
>
>To try and test this theory, I put a few printfs around the suspect
>code (see attached patch...) and it produced the following results:
>
>Oct 23 14:15:27 t61 pppd[969]: Terminating on signal 15
>Oct 23 14:15:27 t61 pppd[969]: Connect time 0.6 minutes.
>Oct 23 14:15:27 t61 pppd[969]: Sent 9936 bytes, received 13408 bytes.
>Oct 23 14:15:27 t61 /netbsd: first rt_msg2, len = 224, rtm_msglen = 208
>Oct 23 14:15:27 t61 /netbsd: Second rt_msg2, len = 224
>Oct 23 14:15:27 t61 /netbsd: len + dlen exceeds *lenp, 400 > 224
>Oct 23 14:15:27 t61 /netbsd: len + dlen exceeds *lenp, 416 > 224
>Oct 23 14:15:27 t61 /netbsd: len + dlen exceeds *lenp, 432 > 224
>Oct 23 14:15:27 t61 /netbsd: len + dlen exceeds *lenp, 448 > 224
>....
>Oct 23 14:15:27 t61 /netbsd: Data modified on freelist: word 0 of object 
>0xffff8
>000045ba800 size 172 previous type bar (0x80bfcce0 != 0xdeadbeef)
>Oct 23 14:15:27 t61 /netbsd: malloc log entry 34694:
>....
>Oct 23 14:16:06 t61 pppd[1663]: Terminating on signal 15
>Oct 23 14:16:06 t61 pppd[1663]: Connect time 0.1 minutes.
>Oct 23 14:16:06 t61 pppd[1663]: Sent 65 bytes, received 0 bytes.
>Oct 23 14:16:06 t61 /netbsd: first rt_msg2, len = 224, rtm_msglen = 208
>Oct 23 14:16:06 t61 /netbsd: Second rt_msg2, len = 224
>Oct 23 14:16:06 t61 /netbsd: len + dlen exceeds *lenp, 400 > 224
>Oct 23 14:16:06 t61 /netbsd: len + dlen exceeds *lenp, 416 > 224
>Oct 23 14:16:06 t61 /netbsd: len + dlen exceeds *lenp, 432 > 224
>Oct 23 14:16:06 t61 /netbsd: len + dlen exceeds *lenp, 448 > 224
>...
>Oct 23 14:16:07 t61 /netbsd: Data modified on freelist: word 0 of object 
>0xffff8
>000045cf100 size 176 previous type bar (0x80bfcce0 != 0xdeadbeef)
>
>I think that this does pretty firmly point the finger at a routing
>table update that is exceeding its allocated memory and stomping the
>memory after it.  If this is really the case, I am not sure how to fix
>it - the whole routing update thing looks like a bit of a MP disaster
>as there appears to be no locking about any of the updates.  Ideas?

Are you sure it is the routing code? It could be the sysctl_iflist() one too.
Can you put some more debugging code to verify? I am saying that because
the sysctl_rtable() is protected with splsoftnet(), but the sysctl_iflist()
is not, and perhaps it should.

christos



Home | Main Index | Thread Index | Old Index