Subject: stray ifnet pointers in mcast membership records & cloning -> crash
To: None <tech-net@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-net
Date: 03/01/2005 15:43:33
I have some systems running current from 20050202.

Two systems run pppd(8), and once a day one end brings down the link
and reestablishes it (to work around a buggy PBX that drops long calls
at inconvenient times).  The system runs quagga, with ospfd and
ripngd.  The system has frequently paniced around the link down/up
time.

One crash occurred when apparently trying to leave a group that had
been joined on the ppp interface.  The ifp had if_softc and others set
to 0xdeadbeef, indicating it had been freed, and on a similar crash
the other end had pppioctl as if_ioctl.  But, the struct in_multi
still had an ifp reference.

Other crashes have occured in in6_selecthlim.  I didn't manage to get
dumps from those, but I suspect that ifp->af_data[AF_INET6] has been
freed and therefore this code:

	else if (ifp)
		return (ND_IFINFO(ifp)->chlim);

dereferences a NULL pointer.  The DDB backtrace looked like the ripng
transmit path.

The machine that receives the ppp calls also crashes, due to the stray
ifp in multicast memberships.


Long ago, struct ifnets were created and never destroyed.  With the
cloning changes, they are more aggressively freed.  I note that there
is code in sys/net/if.c:if_detach to prune a lot of state that might
reference a struct ifnet that is about to be destroyed, but it seems
that some references remain.

I can see three strategies and one kludge:

a) refcount ifp, and add a IFP_PRESENT macro so dangling references
can be chceked and discarded.  This is of course unappealing.

b) Find all the rest of the ifp references and be able to prune them.
This probably makes ifnet deletion more expensive, but it isn't that
frequent.

c) add a routine 'int ifp_valid(struct ifnet *)' that returns 1 if the
given pointer is in the index2ifnet array.  Use this routine whenever
presented with an ifp that might not be valid, because it's in a place
not cleaned up during if_detach.  This feels awkward, but could be
quick to code.

d) Make pppd take the interface down, and then wait 10 seconds or so,
to allow quagga's down interface cleanup procedures to run before the
struct ifnet is freed.  This doesn't fix the underlying problem, but
it might make my boxes crash less often.

With option c, group memberships that can't be left will continue,
since there's no way to come up with the 'ifnet *' in the multicast
membership structure in a join/leave call once detached (address can't
match, and ifindex is not valid).

Doing (b) implies automatic leaving of groups.  Or perhaps just
updating the ifp in the membership structure to point someplace else,
perhaps lo0.  But then leaving based on the join information will
fail, so perhaps the group membership record should just be dropped.

This means that 

  join group on ppp0
  ppp0 destroyed
  ppp0 created
  [group is not joined]

But perhaps that's correct, and routing protocol implementations like
quagga see this as a potentially new interface, and deal with
interfaces coming/going anyway.




The igmp crash: 

#4  0xc0102de1 in calltrap ()
#5  0xc010f6db in igmp_sendpkt (inm=0xc0955080, type=23)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/igmp.c:575
#6  0xc010f3b6 in igmp_leavegroup (inm=0xc0955080)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/igmp.c:454
#7  0xc0110ea1 in in_delmulti (inm=0xc0955080)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/in.c:1142
#8  0xc011ad90 in ip_freemoptions (imo=0xc0b29680)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/ip_output.c:1829
#9  0xc0111758 in in_pcbdetach (v=0xc0b4c514)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/in_pcb.c:501
#10 0xc0128379 in udp_usrreq (so=0xc0b53528, req=1, m=0x0, nam=0x0, 
    control=0x0, p=0x0)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/udp_usrreq.c:1059
#11 0xc0332bb6 in soclose (so=0xc0b53528)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/uipc_socket.c:604
#12 0xc0324029 in soo_close (fp=0xc8e8a008, p=0xc8a5bb30)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/sys_socket.c:238
#13 0xc02f6d0f in closef (fp=0xc8e8a008, p=0xc8a5bb30)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/kern_descrip.c:1424
#14 0xc02f6b0b in fdfree (p=0xc8a5bb30)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/kern_descrip.c:1290
#15 0xc02fac2d in exit1 (l=0xc80ebce4, rv=15)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/kern_exit.c:267
#16 0xc030b234 in postsig (signum=15)
    at /n0/gdt/SINEW-current/netbsd/src/sys/kern/kern_sig.c:1852
#17 0xc03b0a2c in syscall_plain (frame=0xc8b78fa8)
    at /n0/gdt/SINEW-current/netbsd/src/sys/sys/userret.h:93
(gdb) 

(gdb) fr 5
#5  0xc010f6db in igmp_sendpkt (inm=0xc0955080, type=23)
    at /n0/gdt/SINEW-current/netbsd/src/sys/netinet/igmp.c:575
(gdb) print *inm->inm_ifp
$5 = {if_softc = 0xdeadbeef, if_list = {tqe_next = 0xc062a820, 
    tqe_prev = 0xc0bfe000}, if_addrlist = {tqh_first = 0xdeadbeef, 
    tqh_last = 0xdeadbeef}, if_xname = "ï¾­Þï¾­Þï¾­Þ\0\0\0", if_pcount = 0, 
  if_bpf = 0x0, if_index = 0, if_timer = 0, if_flags = 0, if_extflags = 0, 
  if_data = {ifi_type = 0 '\0', ifi_addrlen = 0 '\0', ifi_hdrlen = 0 '\0', 
    ifi_link_state = 0, ifi_mtu = 0, ifi_metric = 0, ifi_baudrate = 0, 
    ifi_ipackets = 0, ifi_ierrors = 0, ifi_opackets = 0, ifi_oerrors = 0, 
    ifi_collisions = 0, ifi_ibytes = 0, ifi_obytes = 0, ifi_imcasts = 0, 
    ifi_omcasts = 0, ifi_iqdrops = 0, ifi_noproto = 0, ifi_lastchange = {
      tv_sec = 0, tv_usec = 0}}, if_output = 0, if_input = 0, if_start = 0, 
  if_ioctl = 0, if_init = 0, if_stop = 0, if_watchdog = 0, if_drain = 0, 
  if_snd = {ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 0, 
    ifq_drops = 0, altq_type = 0, altq_flags = 0, altq_disc = 0x0, 
    altq_ifp = 0x0, altq_enqueue = 0, altq_dequeue = 0, altq_request = 0, 
    altq_clfier = 0x0, altq_classify = 0, altq_tbr = 0x0, altq_cdnr = 0x0}, 
  if_sadl = 0x0, if_broadcastaddr = 0x0, if_bridge = 0x0, if_dlt = 0, 
  if_pfil = {ph_in = {tqh_first = 0x0, tqh_last = 0x0}, ph_out = {
      tqh_first = 0x0, tqh_last = 0x0}, ph_ifaddr = {tqh_first = 0x0, 
      tqh_last = 0x0}, ph_ifnetevent = {tqh_first = 0x0, tqh_last = 0x0}, 
    ph_type = 0, ph_un = {phu_val = 0, phu_ptr = 0x0}, ph_list = {
      le_next = 0x0, le_prev = 0x0}}, if_capabilities = 0, if_capenable = 0, 
  if_csum_flags_tx = 0, if_csum_flags_rx = 0, if_afdata = {
    0x0 <repeats 33 times>}, if_mowner = 0x0}
(gdb) 

The ip_freemoptions crash:

#0  0x1fd07000 in ?? ()
#1  0xc03a8923 in cpu_reboot (howto=260, bootstr=0x0)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/arch/i386/i386/machdep.c:754
#2  0xc031cb48 in panic (fmt=0xc05a935f "trap")
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/subr_prf.c:242
#3  0xc03b10e5 in trap (frame=0xcca8f890)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/arch/i386/i386/trap.c:296
#4  0xc0102de1 in calltrap ()
#5  0xc0168585 in ipsec4_in_reject_so (m=0xc184ed00, so=0xc148adb8)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet6/ipsec.c:1825
#6  0xc011aef3 in rip_input (m=0xc184ed00)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/raw_ip.c:208
#7  0xc0113e53 in ip_input (m=0xc184ed00)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/ip_input.c:1028
#8  0xc0113866 in ipintr ()
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/ip_input.c:467
#9  0xc0102aa1 in Xsoftnet ()
#10 0xc03a3e61 in softintr_dispatch (which=0) at x86/intr.h:160
#11 0xc0102af6 in Xsoftclock ()
#12 0xc0341456 in vfs_shutdown () at x86/intr.h:160
#13 0xc03a8937 in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/arch/i386/i386/machdep.c:740
#14 0xc031cb48 in panic (fmt=0xc05a935f "trap")
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/subr_prf.c:242
#15 0xc03b10e5 in trap (frame=0xcca8fc40)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/arch/i386/i386/trap.c:296
#16 0xc0102de1 in calltrap ()
#17 0xc011ad90 in ip_freemoptions (imo=0xc1421500)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/ip_output.c:1829
#18 0xc0111758 in in_pcbdetach (v=0xc13ed4a8)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/in_pcb.c:501
#19 0xc011b77d in rip_usrreq (so=0xc148adb8, req=1, m=0x0, nam=0x0, 
    control=0x0, p=0x0)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/netinet/raw_ip.c:579
#20 0xc0332bb6 in soclose (so=0xc148adb8)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/uipc_socket.c:604
#21 0xc0324029 in soo_close (fp=0xccaf319c, p=0xccdaf00c)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/sys_socket.c:238
#22 0xc02f6d0f in closef (fp=0xccaf319c, p=0xccdaf00c)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/kern_descrip.c:1424
#23 0xc02f6b0b in fdfree (p=0xccdaf00c)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/kern_descrip.c:1290
#24 0xc02fac2d in exit1 (l=0xccaa1320, rv=11)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/kern_exit.c:267
#25 0xc030b234 in postsig (signum=11)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/kern/kern_sig.c:1852
#26 0xc03b1330 in trap (frame=0xcca8ffa8)
    at /usr/home/gdt/SINEW-current/netbsd/src/sys/sys/userret.h:93

print * inp->inp_moptions->imo_membership[2].inm_ifp
$30 = {if_softc = 0xdeadbeef, if_list = {tqe_next = 0xc06236a0, 
    tqe_prev = 0xc1a10400}, if_addrlist = {tqh_first = 0xdeadbeef, 
    tqh_last = 0xdeadbeef}, if_xname = "ï¾­Þï¾­Þï¾­Þ\0\0\0", if_pcount = 0, 
  if_bpf = 0x0, if_index = 0, if_timer = 0, if_flags = 0, if_extflags = 0, 
  if_data = {ifi_type = 0 '\0', ifi_addrlen = 0 '\0', ifi_hdrlen = 0 '\0', 
    ifi_link_state = 0, ifi_mtu = 0, ifi_metric = 0, ifi_baudrate = 0, 
    ifi_ipackets = 0, ifi_ierrors = 0, ifi_opackets = 0, ifi_oerrors = 0, 
    ifi_collisions = 0, ifi_ibytes = 0, ifi_obytes = 0, ifi_imcasts = 0, 
    ifi_omcasts = 0, ifi_iqdrops = 0, ifi_noproto = 0, ifi_lastchange = {
      tv_sec = 0, tv_usec = 0}}, if_output = 0, if_input = 0, if_start = 0, 
  if_ioctl = 0, if_init = 0, if_stop = 0, if_watchdog = 0, if_drain = 0, 
  if_snd = {ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 0, 
    ifq_drops = 0, altq_type = 0, altq_flags = 0, altq_disc = 0x0, 
    altq_ifp = 0x0, altq_enqueue = 0, altq_dequeue = 0, altq_request = 0, 
    altq_clfier = 0x0, altq_classify = 0, altq_tbr = 0x0, altq_cdnr = 0x0}, 
  if_sadl = 0x0, if_broadcastaddr = 0x0, if_bridge = 0x0, if_dlt = 0, 
  if_pfil = {ph_in = {tqh_first = 0x0, tqh_last = 0x0}, ph_out = {
      tqh_first = 0x0, tqh_last = 0x0}, ph_ifaddr = {tqh_first = 0x0, 
      tqh_last = 0x0}, ph_ifnetevent = {tqh_first = 0x0, tqh_last = 0x0}, 
    ph_type = 0, ph_un = {phu_val = 0, phu_ptr = 0x0}, ph_list = {
      le_next = 0x0, le_prev = 0x0}}, if_capabilities = 0, if_capenable = 0, 
  if_csum_flags_tx = 0, if_csum_flags_rx = 0, if_afdata = {
    0x0 <repeats 33 times>}, if_mowner = 0x0}
(gdb)