Subject: Re: stray ifnet pointers in mcast membership records & cloning -> crash
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-net
Date: 03/02/2005 17:58:30
--6sX45UoQRIJXqkqR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 01, 2005 at 03:43:33PM -0500, Greg Troxel wrote:
>=20
> Long ago, struct ifnets were created and never destroyed.  With the
> cloning changes, they are more aggressively freed.  I note that there
> is code in sys/net/if.c:if_detach to prune a lot of state that might
> reference a struct ifnet that is about to be destroyed, but it seems
> that some references remain.

I don't think it was the cloning changes but the changes to support=20
detaching interfaces. The cloning changes just make this more common.

> I can see three strategies and one kludge:
>=20
> a) refcount ifp, and add a IFP_PRESENT macro so dangling references
> can be chceked and discarded.  This is of course unappealing.

My initial if_detach() changes did reference counting. However you don't=20
need an IFP_PRESENT macro - you just don't free the ifp until all the=20
refcounts are gone. So deleting goes: you start shutting down the=20
interface (unhook it, set flags, etc.), then you let stuff drain, then the=
=20
last reference goes away and you free the structure then.

> b) Find all the rest of the ifp references and be able to prune them.
> This probably makes ifnet deletion more expensive, but it isn't that
> frequent.

You'll need to do something like this no matter what. The original=20
reference counting idea had the initial detach, which say would happen in=
=20
interrupt context, trigger everything else noticing the interface's going=
=20
away and cleanup.

> c) add a routine 'int ifp_valid(struct ifnet *)' that returns 1 if the
> given pointer is in the index2ifnet array.  Use this routine whenever
> presented with an ifp that might not be valid, because it's in a place
> not cleaned up during if_detach.  This feels awkward, but could be
> quick to code.

I think you don't need such a pointer, you just need to look and see if=20
the bits in the ifp are still valid.

> d) Make pppd take the interface down, and then wait 10 seconds or so,
> to allow quagga's down interface cleanup procedures to run before the
> struct ifnet is freed.  This doesn't fix the underlying problem, but
> it might make my boxes crash less often.
>=20
> With option c, group memberships that can't be left will continue,
> since there's no way to come up with the 'ifnet *' in the multicast
> membership structure in a join/leave call once detached (address can't
> match, and ifindex is not valid).
>=20
> Doing (b) implies automatic leaving of groups.  Or perhaps just
> updating the ifp in the membership structure to point someplace else,
> perhaps lo0.  But then leaving based on the join information will
> fail, so perhaps the group membership record should just be dropped.
>=20
> This means that=20
>=20
>   join group on ppp0
>   ppp0 destroyed
>   ppp0 created
>   [group is not joined]
>=20
> But perhaps that's correct, and routing protocol implementations like
> quagga see this as a potentially new interface, and deal with
> interfaces coming/going anyway.

I'd expect that needing to join the group when the new "ppp0" shows up is=
=20
the right thing to do. Among other thing, you could be connected to an=20
entirely different endpoint, so you need to redo the group membership=20
anyway. :-)

Take care,

Bill

--6sX45UoQRIJXqkqR
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD4DBQFCJm9GWz+3JHUci9cRAjoJAJj+7gSOSQaMDxVhvl7PNyNVkSsPAKCFeGBS
cyFD57lsR110gJ7P9QeZ8g==
=pEWi
-----END PGP SIGNATURE-----

--6sX45UoQRIJXqkqR--