Subject: Re: Further news on gated 3.5b4 vs. NetBSD-1.2
To: None <gated-people@merit.edu, tech-net@NetBSD.ORG>
From: Curt Sampson <cjs@portal.ca>
List: tech-net
Date: 03/10/1997 20:59:43
On Fri, 7 Mar 1997, Curt Sampson wrote:

> After some work with gdb, it appears that the source of the problem
> lies in the call to krt_ifread(task_state) in krt_recv_route.
> Commenting that call out fixes the problem.
> 
> I found this because I noticed that something very strange is
> happening with the adip parameter passed to krt_recv_route. In this
> function before the call to krt_ifread the adip parameter is
> `normal,' similar to:
> {rti_addrs = 3, rti_info = {0x124dd0, 0x124dd8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
> 
> After the call, the addresses are different, similar to:
> {rti_addrs = 16, rti_info = {0x0, 0x0, 0x0, 0x0, 0x1248b0, 0x0, 0x0, 0x0}}

I believe I've found the problem. 

When an interface address add or delete occurs, a message gets sent
on the kernel routing socket, and krt_recv() is called. Krt_recv()
processes this message, and one of the first things it does is call
krt_xaddrs() to get the addresses, if any, from that message:

    while (n_packets-- && !task_receive_packet(tp, &size)) {
        struct rt_msghdr *rtp = task_get_recv_buffer(struct rt_msghdr *);
        krt_addrinfo *adip;
 
        adip = krt_xaddrs(rtp, size);
	...

The value returned by krt_xaddrs(), which I will call adip, is a
pointer into a static buffer of type krt_addrinfo.

After that, since this is an address add/delete, krt_recv_route()
is called and passed adip. Krt_recv_route() passes adip to
krt_rtaddrs(), which sets some pointers to sockaddrs from adip in
variables rtparms and author:

    switch (krt_rtaddrs(adip, &rtparms, &author, (flag_t) rtp->rtm_flags)) {

So now rtparms also has pointers into krt_xaddrs()'s static buffer.
krt_recv_route() then goes on its merry way and eventually does a
rescan of the interfaces:

    /* Rescan the interfaces to make sure we have the latest information */
    /* Even if we are getting interface status change messages, there are */
    /* cases where the kernel does not send them, or we could have missed one */
    krt_ifread(task_state);

Krt_ifread() does the rescan by calling sysctl to fill up a buffer
with routing-socket-type messages giving information on all the
addresses currently assigned to interfaces in the system. Once it
has got its buffer-full, it loops through that buffer processing
each address. Part of the processing involves calling krt_xaddrs():

         /* Pick out the addresses */
        adip = krt_xaddrs((struct rt_msghdr *) ifap, ifap->ifm_msglen);

(This is a different adip here.)

Krt_xaddrs() stomps all over its own static buffer many times as
it's called throughout this loop, and leaves behind there whatever
it leaves when all this is done.

So then krt_ifread() is done and we return back to krt_recv_route().
This goes on to use the information in rtparms, parts of which use
sockaddr pointers in krt_xaddrs()'s now more-or-less-randomised
static buffer. If you're lucky something mildly wierd and possibly
unnoticable happens. If you're not lucky, as I'm not when I've got
a lot of interfaces going, one of the sockaddr pointers that adip
and rtparms are lifting out of that buffer is zeroed. When that
zeroed rtparms pointer is used by someone else, you get an abort
trap (if it checked it first) or a segfault (if it didn't).

I think this is all correct. However, it's very late, I've spent
far too much time working on this, and it seems very odd that such
a major problem could go undetected for so long. If I've gotten
anything wrong here, please let me know. If this analysis is correct,
and you've got a good solution, let me know about it too, because
I'm far too tired to think of one right now.

cjs

Curt Sampson    cjs@portal.ca	   Info at http://www.portal.ca/
Internet Portal Services, Inc.	   Through infinite myst, software reverberates
Vancouver, BC  (604) 257-9400	   In code possess'd of invisible folly.