tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Locking strategy for device deletion (also see PR kern/48536)



   Date: Wed, 8 Jun 2016 17:34:33 +0000
   From: David Holland <dholland-tech%netbsd.org@localhost>

   On Tue, Jun 07, 2016 at 06:28:11PM +0800, Paul Goyette wrote:
    > Can anyone suggest a reliable way to ensure that a device-driver module can
    > be _really_ safely detached?

   There's a pserialize scheme for this; see e.g. an old thread called
   "kicking everybody out of the softc".

Skimming that thread prompted me to write a sketch -- attached -- of
the per-CPU counts I've mentioned several times in this thread, which
is a bit simpler than dyoung@'s proposal back in 2010.

The reason I use per-CPU counts here instead of psref(9) or per-CPU
reference lists is to allow the caller of device_lookup_acquire to
migrate to another CPU before device_release, which is fairly likely
for operations that wait for I/O, but which is forbidden for psref(9).

   The catch for arbitrary drivers is that you need to do the enter
   business using the softc *before* using [cb]devsw to jump into the
   driver, but the current way all of that works is the other way
   around. This really needs to be restructured...

This is not a catch -- this is just the distinction between autoconf
device state (device_lookup) and userland-exposed device nodes
(bdev/cdev).  Detach both before unmapping the module code and you're
safe.

I think even the order doesn't even matter: if you detach the autoconf
device first, then xyzread's call to device_lookup will fail; if you
detach the devsw first, then no more calls to xyzread can happen.  But
there may be other constraints that imply some order, which I don't
have in my head at the moment.
struct device {
	...
	struct percpu	*dv_usecounts;		/* int64_t */
	int64_t		*dv_usecount_total;
	bool		dv_dying;
	...
};

/*
 * device_diediedie:
 *
 *	Prevent new references to dv and wait for all existing
 *	references to drain.  Caller must hold alldevs_mtx, which will
 *	be released during the wait.
 *
 *	XXX To be called from config_devunlink, I think, as the first
 *	thing it does.
 */
device_t
device_diediedie(device_t dev)
{
	int64_t total = 0;

	KASSERT(mutex_owned(&alldevs_mtx));
	KASSERT(!dev->dv_dying);

	/* Mark it complete and record where the total tally goes.  */
	dev->dv_usecount_total = &total;
	dev->dv_dying = true;

	/*
	 * Count up the total on all CPUs.  xc_broadcast serves as a
	 * global memory barrier to ensure visibility of dv_dying = 1
	 * on all CPUs before the cv_wait.
	 */
	mutex_exit(&alldevs_mtx);
	xc_broadcast(0, &device_usecount_xc, dev, NULL);
	mutex_enter(&alldevs_mtx);

	KASSERTMSG((0 <= total), "negatively referenced device: %p, %"PRId64,
	    dev, total);
	while (total != 0)
		cv_wait(&device_release_cv, &alldevs_mtx);

	/* Cause subsequent attempts to add to the total to crash.  */
	dev->dv_usecount_total = NULL;
}

static void
device_usecount_xc(void *cookie0, void *cookie1 __unused)
{
	device_t dev = cookie0;
	int64_t *usecountp;

	mutex_enter(&alldevs_mtx);
	usecountp = percpu_getref(dev->dv_usecounts);
	*dev->dv_usecount_total += *usecountp;
	percpu_putref(dev->dv_usecounts);
	mutex_exit(&alldevs_mtx);
}

/*
 * device_lookup_acquire:
 *
 *	Look up a device instance for a given driver and add a
 *	reference to it.
 *
 *	If found, return the device private, and store the device
 *	pointer in *devp, which the caller must later release with
 *	device_release.  Otherwise, return NULL and store NULL in
 *	*devp.
 *
 *	XXX In principle, this could be done with pserialize(9) and
 *	involve no interprocessor synchronization at all, by changing
 *	the management of cd->cd_ndevs and cd->cd_devs.
 */
void *
device_lookup_acquire(cfdriver_t cd, int unit, device_t *devp)
{
	device_t dev;

	mutex_enter(&alldevs_mtx);
	if (unit < 0 || unit >= cd->cd_ndevs) {
		dev = NULL;
	} else if ((dev = cd->cd_devs[unit]) != NULL && dev->dv_del_gen != 0) {
		dev = NULL;
	} else if (dev->dv_dying) {
		dev = NULL;
	} else {
		int64_t *usecountp = percpu_getref(dev->dv_usecounts);
		*usecountp += 1;
		percpu_putref(dev->dv_usecounts);
	}
	mutex_exit(&alldevs_mtx);

	*devp = dev;
	return dev ? device_private(dev) : NULL;
}

/*
 * device_release:
 *
 *	Release a device acquired with device_lookup_acquire.  If dev
 *	is not in the process of dying, involves no interprocessor
 *	synchronization.
 */
void
device_release(device_t dev)
{
	int64_t *usecountp;

	if (__predict_false(dv->dv_dying)) {
		/*
		 * Slow path: adjust local delta and global total in
		 * serial with respect to device_usecount_xc so we
		 * don't inadvertently double-count.
		 *
		 * Broadcast unconditionally, not conditional on
		 * whether the local delta is zero, because our
		 * reference may have migrated from another CPU.
		 */
		mutex_enter(&alldevs_mtx);
		usecountp = percpu_getref(dev->dv_usecounts);
		*usecountp -= 1;
		percpu_putref(dev->dv_usecounts);
		*dev->dv_usecount_total -= 1;
		cv_broadcast(&device_release_cv);
		mutex_exit(&alldevs_mtx);
	} else {
		/* Fast path: adjust only local delta.  */
		usecountp = percpu_getref(dev->dv_usecounts);
		*usecountp -= 1;
		percpu_putref(dev->dv_usecounts);
	}
}


Home | Main Index | Thread Index | Old Index