Subject: Re: stray ifnet pointers in mcast membership records & cloning -> crash
To: Bill Studenmund <wrstuden@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-net
Date: 03/02/2005 21:47:03
> From: Bill Studenmund <wrstuden@netbsd.org>
> 
> On Wed, Mar 02, 2005 at 12:30:10PM -0500, Greg Troxel wrote:
> >=20
> > Sure.  My real concern is that interface deletion is not safe.  Given
> > that these invariants are becoming kind of hairy, I think the only
> > safe thing to do (from a software maintenance viewpoint) is to have
> > interface destruction always scan all data structures that might have
> > ifnbet pointers.
> 
> Perhaps I'm reading too much detail into your suggestion. I agree that we
> need to nudge everything that has an ifp, so that it will notice that the
> interface's gone. I think though that the ifdetach routine may not be the
> best place to do it; it may be happening in an interrupt routine. So we do
> enough to get the interface coming down (no new traffic for instnace) then
> start things draining off.

That's not what I was getting at.

Currently, sys/net/if.c:if_detach() (which says it must be called with
thread context) does a lot of things to find and remove references to
the ifp.  I think it's almost correct - it just doesn't scan protocols
with address families that weren't on the interface.  So the following
sequence leads to trouble, and I think it's what happening in my case:

  ppp0 created

  v4 address configure on ppp0

  join of 224.0.0.5 with local address of ppp0, or via ifindex,
  resulting in mcast membership (224.0.0.5, ppp0-ifp)

  pppd gets sighup and shuts down gracefully

  v4 address removed on ppp0

  ppp0 destroyed
     if_detach doesn't invoke PRU_PURGEIF on rip_usrreq, since there
     are no AF_INET addresses

  Now, there is bad/stale ifp in multicast options on the socket
  still.  quagga tries to leave the group, or ospfd hits a bug of its
  own and segfaults and the system-invoked close of the socket leaves
  the group.

I see two ways to fix this:

a) in if_detach, invoke PRU_PURGEIF on all protocols in all AFs,
regardless of what addresses are configured.

b) Declare and ensure an invariant that is stronger than the current
"all reachable ifp pointers point to a struct ifnet that is valid and
in ifindex2ifnet", which would additionally require

  all ifp pointers in use in storage belonging to AF x point to a
  struct ifnet which has an address of AF x configured

To get b, we would have to do purgeif processing on address removal.
This seems like an overly complicated invariant, and one not as likely
to remain true in the face of future code changes.  Having a central
"all places that have ifps get called from here to clean up" place
seems preferable.