Subject: Re: stray ifnet pointers in mcast membership records & cloning -> crash
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-net
Date: 03/02/2005 17:58:30
--6sX45UoQRIJXqkqR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Mar 01, 2005 at 03:43:33PM -0500, Greg Troxel wrote:
>=20
> Long ago, struct ifnets were created and never destroyed. With the
> cloning changes, they are more aggressively freed. I note that there
> is code in sys/net/if.c:if_detach to prune a lot of state that might
> reference a struct ifnet that is about to be destroyed, but it seems
> that some references remain.
I don't think it was the cloning changes but the changes to support=20
detaching interfaces. The cloning changes just make this more common.
> I can see three strategies and one kludge:
>=20
> a) refcount ifp, and add a IFP_PRESENT macro so dangling references
> can be chceked and discarded. This is of course unappealing.
My initial if_detach() changes did reference counting. However you don't=20
need an IFP_PRESENT macro - you just don't free the ifp until all the=20
refcounts are gone. So deleting goes: you start shutting down the=20
interface (unhook it, set flags, etc.), then you let stuff drain, then the=
=20
last reference goes away and you free the structure then.
> b) Find all the rest of the ifp references and be able to prune them.
> This probably makes ifnet deletion more expensive, but it isn't that
> frequent.
You'll need to do something like this no matter what. The original=20
reference counting idea had the initial detach, which say would happen in=
=20
interrupt context, trigger everything else noticing the interface's going=
=20
away and cleanup.
> c) add a routine 'int ifp_valid(struct ifnet *)' that returns 1 if the
> given pointer is in the index2ifnet array. Use this routine whenever
> presented with an ifp that might not be valid, because it's in a place
> not cleaned up during if_detach. This feels awkward, but could be
> quick to code.
I think you don't need such a pointer, you just need to look and see if=20
the bits in the ifp are still valid.
> d) Make pppd take the interface down, and then wait 10 seconds or so,
> to allow quagga's down interface cleanup procedures to run before the
> struct ifnet is freed. This doesn't fix the underlying problem, but
> it might make my boxes crash less often.
>=20
> With option c, group memberships that can't be left will continue,
> since there's no way to come up with the 'ifnet *' in the multicast
> membership structure in a join/leave call once detached (address can't
> match, and ifindex is not valid).
>=20
> Doing (b) implies automatic leaving of groups. Or perhaps just
> updating the ifp in the membership structure to point someplace else,
> perhaps lo0. But then leaving based on the join information will
> fail, so perhaps the group membership record should just be dropped.
>=20
> This means that=20
>=20
> join group on ppp0
> ppp0 destroyed
> ppp0 created
> [group is not joined]
>=20
> But perhaps that's correct, and routing protocol implementations like
> quagga see this as a potentially new interface, and deal with
> interfaces coming/going anyway.
I'd expect that needing to join the group when the new "ppp0" shows up is=
=20
the right thing to do. Among other thing, you could be connected to an=20
entirely different endpoint, so you need to redo the group membership=20
anyway. :-)
Take care,
Bill
--6sX45UoQRIJXqkqR
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
iD4DBQFCJm9GWz+3JHUci9cRAjoJAJj+7gSOSQaMDxVhvl7PNyNVkSsPAKCFeGBS
cyFD57lsR110gJ7P9QeZ8g==
=pEWi
-----END PGP SIGNATURE-----
--6sX45UoQRIJXqkqR--