tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PROPOSAL: config_* with device_t references



I propose to add new config_*_acquire/release functions:

	dev = config_found_acquire(...);
is essentially
	dev = config_found(...);
	device_acquire(dev);

and

	config_detach_release(dev);
is essentially
	device_release(dev);
	config_detach(dev);

but they work atomically without races with, e.g., concurrent drvctl
or physical device removal in order to eliminate a class of driver
bugs.  Similarly config_attach_acquire, config_attach_pseudo_acquire.

Longer-term, we can eliminate the non-acquire/release versions (or
make config_attach/found return void since that's safe) once every
driver is converted, but adding new names enables the compiler to
assist with auditing the conversion until we're done.

Thoughts?


DETAILS

The logic

	sc->sc_child = config_found(...);

and

	config_detach(sc->sc_child, flags);

found in many drivers, is fundamentally buggy on any system with MP or
kpreemption.

It's buggy because, for example, a parallel call to `drvctl -d childN'
can detach the device and free it, before sc->sc_child has been
initialized on return from config_found or after it has been read but
before control has entered config_detach.  This can lead to
use-after-free.

The kernel lock provides weak protection against this class of bugs
via

	KERNEL_LOCK(1, NULL);
	sc->sc_child = config_found(...);
	KERNEL_UNLOCK_ONE(NULL);

because it has the opportunity to exclude concurrent .ca_childdetached
callbacks.  But we need to get more things off the kernel lock, not
add new dependencies on it.  And it only helps as long as nothing in
config_found blocks (e.g., on an adaptive mutex) before it returns the
device_t.


We can rewrite this instead as:

	child = config_found_acquire(...);

	mutex_enter(&sc->sc_childlock);
	sc->sc_child = child;	
	mutex_exit(&sc->sc_childlock);

	device_release(child);

and

	mutex_enter(&sc->sc_childlock);
	child = sc->sc_child;
	device_acquire(child);
	mutex_exit(&sc->sc_childlock);

	config_detach_release(child);

Here sc->sc_childlock is also taken by the device's .ca_childdetached
routine:

	mutex_enter(&sc->sc_childlock);
	if (dev == sc->sc_child)
		sc->sc_child = NULL;
	mutex_exit(&sc->sc_childlock);

Other use of the child, e.g. in a keyboard interrupt routine calling
wskbd_input, can safely use sc_childlock to coordinate access to
sc_child without worrying about spinning for the kernel lock or
sleeping while another thread runs a detach routine -- it's only
needed to serialize access to sc->sc_child, so it can use a driver's
own fine-grained locks.

(It's the driver's responsibility to ensure that .ca_detach routine
can't return while device_acquire is still possible, e.g. by quiescing
interrupt handlers that could call device_acquire.)

A minor additional change in this is to pass a cookie through
config_attach_pseudo_acquire as the aux argument to the .ca_attach
function.
From 066f934f6c0dd3c3884b827d1b78f02c4b50fc3e Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Fri, 26 Aug 2022 10:35:08 +0000
Subject: [PATCH] autoconf(9): New functions for referenced attach/detach.

New functions:

- config_found_acquire(dev, aux, print, cfargs)
- config_attach_acquire(parent, cf, aux, print, cfargs)
- config_attach_pseudo_acquire(cf, aux)
- config_detach_release(dev, flags)
- device_acquire(dev)

The config_*_acquire functions are like the non-acquire versions, but
they return a referenced device_t, which is guaranteed to be safe to
use until released.  The device's detach function may run while it is
referenced, but the device_t will not be freed and the parent's
.ca_childdetached routine will not be called.

=> config_attach_pseudo_acquire additionally lets you pass an aux
   argument to the device's .ca_attach routine, unlike
   config_attach_pseudo which always passes NULL.

=> Eventually, config_found, config_attach, and config_attach_pseudo
   should be made to return void, because use of the device_t they
   return is fundamentally unsafe.

config_detach_release is like device_release and then config_detach,
but avoids the race inherent with that sequence.

=> Eventually, config_detach should be eliminated, because it's
   fundamentally MP-unsafe (and difficult to use safely even in a UP
   system or with the kernel lock).

device_acquire acquires a reference to a device.  It never fails.
The caller is responsible for ensuring that the device_t cannot be
freed.  Typically this will be used in a read section (mutex, rwlock,
pserialize, &c.) in a data structure lookup, with corresponding logic
in the .ca_detach function to remove the device from the data
structure and wait for all read sections to complete.
---
 sys/kern/subr_autoconf.c | 137 +++++++++++++++++++++++++++++++++++++--
 sys/sys/device.h         |   7 ++
 2 files changed, 137 insertions(+), 7 deletions(-)

diff --git a/sys/kern/subr_autoconf.c b/sys/kern/subr_autoconf.c
index e60be2d7aad8..c8cfae8ab114 100644
--- a/sys/kern/subr_autoconf.c
+++ b/sys/kern/subr_autoconf.c
@@ -1259,7 +1259,7 @@ static const char * const msgs[] = {
  * not configured, call the given `print' function and return NULL.
  */
 device_t
-config_found(device_t parent, void *aux, cfprint_t print,
+config_found_acquire(device_t parent, void *aux, cfprint_t print,
     const struct cfargs * const cfargs)
 {
 	cfdata_t cf;
@@ -1286,6 +1286,29 @@ config_found(device_t parent, void *aux, cfprint_t print,
 	return NULL;
 }
 
+/*
+ * config_found(parent, aux, print, cfargs)
+ *
+ *	Legacy entry point for callers whose use of the returned
+ *	device_t is not delimited by device_release.  This is
+ *	inherently racy -- either they must ignore the return value, or
+ *	they must be converted to config_found_acquire with a matching
+ *	device_release.
+ */
+device_t
+config_found(device_t parent, void *aux, cfprint_t print,
+    const struct cfargs * const cfargs)
+{
+	device_t dev;
+
+	dev = config_found_acquire(parent, aux, print, cfargs);
+	if (dev == NULL)
+		return NULL;
+	device_release(dev);
+
+	return dev;
+}
+
 /*
  * As above, but for root devices.
  */
@@ -1708,6 +1731,8 @@ config_add_attrib_dict(device_t dev)
 
 /*
  * Attach a found device.
+ *
+ * Returns the device referenced, to be released with device_release.
  */
 static device_t
 config_attach_internal(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
@@ -1778,6 +1803,12 @@ config_attach_internal(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
 	 */
 	config_pending_incr(dev);
 
+	/*
+	 * Prevent concurrent detach from destroying the device_t until
+	 * the caller has released the device.
+	 */
+	device_acquire(dev);
+
 	/* Call the driver's attach function.  */
 	(*dev->dv_cfattach->ca_attach)(parent, dev, aux);
 
@@ -1813,7 +1844,7 @@ config_attach_internal(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
 }
 
 device_t
-config_attach(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
+config_attach_acquire(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
     const struct cfargs *cfargs)
 {
 	struct cfargs_internal store;
@@ -1824,6 +1855,29 @@ config_attach(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
 	    cfargs_canonicalize(cfargs, &store));
 }
 
+/*
+ * config_attach(parent, cf, aux, print, cfargs)
+ *
+ *	Legacy entry point for callers whose use of the returned
+ *	device_t is not delimited by device_release.  This is
+ *	inherently racy -- either they must ignore the return value, or
+ *	they must be converted to config_attach_acquire with a matching
+ *	device_release.
+ */
+device_t
+config_attach(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
+    const struct cfargs *cfargs)
+{
+	device_t dev;
+
+	dev = config_attach_acquire(parent, cf, aux, print, cfargs);
+	if (dev == NULL)
+		return NULL;
+	device_release(dev);
+
+	return dev;
+}
+
 /*
  * As above, but for pseudo-devices.  Pseudo-devices attached in this
  * way are silently inserted into the device tree, and their children
@@ -1834,7 +1888,7 @@ config_attach(device_t parent, cfdata_t cf, void *aux, cfprint_t print,
  * name by the attach routine.
  */
 device_t
-config_attach_pseudo(cfdata_t cf)
+config_attach_pseudo_acquire(cfdata_t cf, void *aux)
 {
 	device_t dev;
 
@@ -1867,8 +1921,14 @@ config_attach_pseudo(cfdata_t cf)
 	 */
 	config_pending_incr(dev);
 
+	/*
+	 * Prevent concurrent detach from destroying the device_t until
+	 * the caller has released the device.
+	 */
+	device_acquire(dev);
+
 	/* Call the driver's attach function.  */
-	(*dev->dv_cfattach->ca_attach)(ROOT, dev, NULL);
+	(*dev->dv_cfattach->ca_attach)(ROOT, dev, aux);
 
 	/*
 	 * Allow other threads to acquire references to the device now
@@ -1892,6 +1952,28 @@ out:	KERNEL_UNLOCK_ONE(NULL);
 	return dev;
 }
 
+/*
+ * config_attach_pseudo(cf)
+ *
+ *	Legacy entry point for callers whose use of the returned
+ *	device_t is not delimited by device_release.  This is
+ *	inherently racy -- either they must ignore the return value, or
+ *	they must be converted to config_attach_pseudo_acquire with a
+ *	matching device_release.
+ */
+device_t
+config_attach_pseudo(cfdata_t cf)
+{
+	device_t dev;
+
+	dev = config_attach_pseudo_acquire(cf, NULL);
+	if (dev == NULL)
+		return dev;
+	device_release(dev);
+
+	return dev;
+}
+
 /*
  * Caller must hold alldevs_lock.
  */
@@ -2000,9 +2082,12 @@ config_detach_exit(device_t dev)
  * Note that this code wants to be run from a process context, so
  * that the detach can sleep to allow processes which have a device
  * open to run and unwind their stacks.
+ *
+ * Caller must hold a reference with device_acquire or
+ * device_lookup_acquire.
  */
 int
-config_detach(device_t dev, int flags)
+config_detach_release(device_t dev, int flags)
 {
 	struct alldevs_foray af;
 	struct cftable *ct;
@@ -2031,6 +2116,7 @@ config_detach(device_t dev, int flags)
 	 * attached.
 	 */
 	rv = config_detach_enter(dev);
+	device_release(dev);
 	if (rv) {
 		KERNEL_UNLOCK_ONE(NULL);
 		return rv;
@@ -2185,6 +2271,22 @@ out:
 	return rv;
 }
 
+/*
+ * config_detach(dev, flags)
+ *
+ *	Legacy entry point for callers that have not acquired a
+ *	reference to dev.  This is inherently racy -- you must convert
+ *	such callers to acquire a reference and then use
+ *	config_detach_release instead.
+ */
+int
+config_detach(device_t dev, int flags)
+{
+
+	device_acquire(dev);
+	return config_detach_release(dev, flags);
+}
+
 /*
  * config_detach_commit(dev)
  *
@@ -2740,7 +2842,7 @@ retry:	if (unit < 0 || unit >= cd->cd_ndevs ||
 			mutex_enter(&alldevs_lock);
 			goto retry;
 		}
-		localcount_acquire(dv->dv_localcount);
+		device_acquire(dv);
 	}
 	mutex_exit(&alldevs_lock);
 	mutex_exit(&config_misc_lock);
@@ -2748,10 +2850,31 @@ retry:	if (unit < 0 || unit >= cd->cd_ndevs ||
 	return dv;
 }
 
+/*
+ * device_acquire:
+ *
+ *	Acquire a reference to a device.  It is the caller's
+ *	responsibility to ensure that the device's .ca_detach routine
+ *	cannot return before calling this.  Caller must release the
+ *	reference with device_release or config_detach_release.
+ */
+void
+device_acquire(device_t dv)
+{
+
+	/*
+	 * No lock because the caller has promised that this can't
+	 * change concurrently with device_acquire.
+	 */
+	KASSERTMSG(!dv->dv_detach_done, "%s",
+	    dv == NULL ? "(null)" : device_xname(dv));
+	localcount_acquire(dv->dv_localcount);
+}
+
 /*
  * device_release:
  *
- *	Release a reference to a device acquired with
+ *	Release a reference to a device acquired with device_acquire or
  *	device_lookup_acquire.
  */
 void
diff --git a/sys/sys/device.h b/sys/sys/device.h
index 4f47e063d0a6..91d994a9ff88 100644
--- a/sys/sys/device.h
+++ b/sys/sys/device.h
@@ -554,14 +554,20 @@ device_t config_found(device_t, void *, cfprint_t, const struct cfargs *);
 device_t config_rootfound(const char *, void *);
 device_t config_attach(device_t, cfdata_t, void *, cfprint_t,
 	    const struct cfargs *);
+device_t config_found_acquire(device_t, void *, cfprint_t,
+	    const struct cfargs *);
+device_t config_attach_acquire(device_t, cfdata_t, void *, cfprint_t,
+	    const struct cfargs *);
 int	config_match(device_t, cfdata_t, void *);
 int	config_probe(device_t, cfdata_t, void *);
 
 bool	ifattr_match(const char *, const char *);
 
 device_t config_attach_pseudo(cfdata_t);
+device_t config_attach_pseudo_acquire(cfdata_t, void *);
 
 int	config_detach(device_t, int);
+int	config_detach_release(device_t, int);
 int	config_detach_children(device_t, int flags);
 void	config_detach_commit(device_t);
 bool	config_detach_all(int);
@@ -588,6 +594,7 @@ device_t	device_lookup(cfdriver_t, int);
 void		*device_lookup_private(cfdriver_t, int);
 
 device_t	device_lookup_acquire(cfdriver_t, int);
+void		device_acquire(device_t);
 void		device_release(device_t);
 
 void		device_register(device_t, void *);


Home | Main Index | Thread Index | Old Index