Subject: Gated assertion "len" failed at str.c line 810
To: None <gated-people@merit.edu, tech-net@NetBSD.ORG>
From: Curt Sampson <cjs@portal.ca>
List: tech-net
Date: 01/17/1997 15:13:03
I've recently upgraded a couple of machines here from NetBSD 1.0
to NetBSD 1.2 and NetBSD 1.2B (current) respectively. Since then,
I've been seeing a problems with gated on the 1.2 machine. It occurs
in both the old binary compiled under 1.0 and new binary compiled
under 1.2, and it occurs in both gated 3.5beta2 and 3.5beta4.
These machines are using RIP.

This machine on which this occurs has one ethernet card and quite
a few aliases. As well as the IP address for the local network,
the Ethernet interface also has a single address (broadcast as a
host route) from an entirely different subnet (this is for backward
compatability with some dial-in users who have an old nameserver
address in their configurations) and about 160 addresses aliased
on the loopback interface. These latter addresses are all from a
separate subnet dedicated to virtual web servers on this machine.

After running for anything from half a minute to several days, gated
gives me the following messages and dumps core:

    Assertion failed gated[7298]: file "str.c", line 810: "len"
    Abort gated[7298] version R3_5Beta_4: Invalid argument

Looking at it with gdb, I get the following backtrace:

#0  0x101397bb in kill ()
#1  0x10139641 in abort ()
#2  0x2847a in task_quit (code=22) at task.c:2089
#3  0x28566 in task_assert (file=0x22c6b "str.c", line=810, 
    test=0x22c78 "*Unspecified") at task.c:2118
#4  0x2362c in gd_vsprintf (dest=0x104a10 "", 
    fmt0=0x18d44 "REDIRECT: redirect from %A: %A/%A via %A", ap=0xf7bfd48c "")
    at str.c:818
#5  0x32969 in parse_low_bit_set () at parse.c:1156
#6  0x18f26 in redirect (dst=0x117500, mask=0x104f48, gateway=0x117508, 
    src=0x117510) at rt_redirect.c:92
#7  0x10adf in krt_recv_route (tp=0x118348, rtp=0x123000, adip=0xf53a8)
    at krt_rt_sock.c:1328
#8  0x10fea in krt_recv (tp=0x118348) at krt_rt_sock.c:1639
#9  0x2b2cc in task_process_sockets (numset=0, read_bits=0xf7bfd634, 
    write_bits=0x0, except_bits=0x0) at task.c:3352
#10 0x307c7 in trace_types () at trace.c:195

The direct cause of the abort is the following bit of code:

                len = socksize(addr);
                assert(len);

Looking at the sockaddr structure, it does indeed have a length of
zero.  Tracing back, this goes back quite some ways; in this
particular instance redirect() was passed pointer to a sockaddr
struct with a len and family of zero as the `src' argument. I've
also seen this happen with the `dst' argument.

Now I'm not really sure what's going on in krt_recv_route that's
causing this, and not being terribly familiar with gated internals,
it looks to me like I've got several hours of work ahead if I want
to track this down and understand it. Before I do that, has anyone
seen this problem before, or does anyone have any pointers as to
why this might be happening?

cjs

Curt Sampson    cjs@portal.ca		Info at http://www.portal.ca/
Internet Portal Services, Inc.	
Vancouver, BC   (604) 257-9400		De gustibus, aut bene aut nihil.