tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kicking everybody out of the softc



Currently, device detachment is racy: a kernel thread races to
read/write a softc, after looking it up, before a second thread detaches
the corresponding device_t and reclaims the softc's storage.  I've been
working in spare moments on lockless code to prevent storage for a softc
from going away while a driver uses it.  I submit the idea and the code
here for review.

To stop the races against detachment is tricky because many drivers
are in and out of their softc over and over and over again, today,
without any synchronization with detachment whatsoever.  To use
atomic operations or locking to synchronize use of the softc with its
reclamation can add costly locked memory transactions to many fast paths
where there were no such transactions before, for a performance loss.

I took an approach that avoids locked memory transactions in the
common case.  I keep count of threads entering each softc on each
CPU using per-CPU counters in the corresponding device_t: an LWP
calls device_acquire(dev) as it enters a softc.  I also keep count
of threads leaving each softc using per-CPU counters: an LWP calls
device_release(dev) as it leaves a softc.  I call the counters
"turnstiles."  Turnstile-counts only increase.  Because turnstiles
are per-CPU, incrementing one only has to be atomic with respect to
other threads on that same CPU, so no locked memory transactions are
necessary.

After a LWP enters a softc through its turnstile, it passes through a
"gate" implemented by a pointer from the device_t to an object of type
device_gate_t that contains a kernel mutex, among other things.  This
happens in device_acquire().  Normally, the gate is open: the device_t
points to the default device_gate_t, gate_open.  It is not necessary for
device_acquire() to acquire gate_open's mutex, but device_acquire() must
acquire every other gate's mutex.

In the rare event that our thread wants to reclaim the softc (say that
it is completing config_detach(9)), first it "closes the gate" on the
softc's corresponding device_t.  To close the gate, our thread creates
a closed device_gate_t, acquires its mutex, and points the device_t at
it.  With the gate closed, our thread unlinks the device_t and softc.
Finally, it "seals" the gate by pointing the device_t at a special
device_gate_t, gate_sealed, and it wakes all of the threads that wait
to acquire the closed gate's mutex.  Since the device_t and softc are
unreachable, no new thread can enter them.  All of the threads that wake
holding the closed gate see that the device_t now points to gate_sealed
and leave the softc through a turnstile---device_acquire() returns in
those threads with ENXIO.  Our thread safely reclaims the softc when the
number of threads who have entered through its turnstiles balances the
number who have left.

Anyway, that's the gist of the idea.  I've attached the untested (and
uncompiled) code for the details.  Comments?

Dave

[1] I'm using the term "thread" loosely to mean any thread of execution,
    be it an LWP, software or hardware interrupt.

-- 
David Young             OJC Technologies
dyoung%ojctech.com@localhost      Urbana, IL * (217) 278-3933
/* I. DEVICE ATTACHMENT
 *
 * Steps for config_attach*() to follow:
 *
 * Call device_attach_gate(dv) and check for error before making dv
 * visible by linking it to alldevs or cfdriver_t.
 *
 * II. DEVICE DETACHMENT
 *
 * Steps for config_detach(9) to detach a device_t, dv:
 *
 * 0) Call the driver detach routine, dv->dv_cfattach.ca_detach.  It is
 * important for the detach routine to disestablish interrupt handlers
 * and to make sure interrupt handling on all CPUs is complete (by
 * making a low-priority cross-call to all CPUs, for example).
 *
 * 1) Install gate with device_gate_close(dv, "config_detach", &dgp).
 *
 * 2) Unlink the device_t from alldevs and from cfdriver_t.
 *
 * 3) Call device_gate_seal(dv, dgp).
 *
 * III. REFERENCING/READING/WRITING THE SOFTWARE STATE (device_t & softc)
 *
 * Steps for kernel threads, hard- and soft-interrupt handlers to
 * protect a device_t, dv, and its softc against reclamation:
 *
 * In "sleepable" LWP contexts, rc = device_acquire(dv, interruptible),
 * where interruptible is true or false, and check rc for errors.
 * Every successful device_acquire() call must be matched with a
 * device_release().  Between device_acquire() and device_release(),
 * neither dv nor its softc can be reclaimed.
 *
 * In hardware/software interrupts, device_enter(dv).  Every device_enter(dv)
 * must be matched with a device_exit(dv).  Between device_enter(dv) and
 * device_exit(dv), neither dv nor its softc can be reclaimed.
 *
 * LOOKING UP THE SOFTWARE STATE (device_t & softc)
 *
 * Routines that look up a device_t or softc by unit name or number,
 * such as device_lookup() and device_lookup_private(), should return
 * a device_t or softc that is protected against reclamation.  Callers
 * of lookup routines should match each successful call with a
 * device_release() or device_exit(), according to guidance given in
 * (III).
 */

/*
 * New device_t members.
 */
#if 0
struct device {
        /* ... */
        device_gate_t * volatile dv_gate;
        percpu_t        *dv_turnstiles;
        /* ... */
};
#endif

/*
 * Public API
 */
void device_attach_gate(device_t);
int device_acquire(device_t, bool);
void device_release(device_t);
void device_enter(device_t);
void device_exit(device_t);
int device_gate_close(device_t, const char *, device_gate_t **);
void device_gate_seal(device_t, device_gate_t *);

static int device_gate_enter(device_t, bool);
static int device_gate_exit(device_t);
static int device_gate_admission_wait(device_t, bool);
static void device_gate_init(device_gate_t *, const char *);
static void device_gate_destroy(device_gate_t *);
static device_gate_t *device_gate_alloc(const char *);
static void device_gate_free(device_gate_t *);
static void device_gate_sum(void *, void *, struct cpu_info *)
static int device_gate_default_init(void);

static device_gate_t gate_sealed, gate_open;

typedef struct device_gate {
        kmutex_t                dg_mtx;
        kcondvar_t              dg_enter;
        kcondvar_t              dg_exit;
} device_gate_t;

typedef struct device_tstiles {
        volatile uint64_t       dt_in;
        volatile uint64_t       dt_out;
} device_tstiles_t;

/* If `enter' is true, increase by one the number of threads that have
 * entered the softc.  If `enter' is false, increase by one the number
 * of threads that have left the softc.
 *
 * device_turn() may be called from thread or interrupt context.
 *
 * device_turn() synchronizes with threads and interrupt handlers at
 * every priority on the same CPU.  Synchronization with other CPUs is
 * not necessary because device_turn() modifies only per-CPU counters.
 */
static void
device_turn(device_t dv, bool enter)
{
        uint64_t ots;
        volatile uint64_t *tsp;
        device_tstiles_t *dt;

        dt = percpu_getref(dv->dv_turnstiles);

        tsp = enter ? &dt->dt_in : &dt->dt_out;

        /* XXX This should be atomic_inc_64_ni(tsp), but there is no
         * XXX such routine!
         */
        do {
                ots = *tsp;
        } while (__predict_false(atomic_cas_64_ni(tsp, ots, ots + 1) != ots));

        percpu_putref(dv->dv_turnstiles);
}

/* Increase by one the number of threads that have entered the softc.
 *
 * See device_turn() for information about synchronization and calling
 * context.
 */
void
device_enter(device_t dv)
{
        device_turn(dv, true);
}

/* Increase by one the number of threads that have left the softc.
 *
 * See device_turn() for information about synchronization and calling
 * context.
 */
void
device_exit(device_t dv)
{
        device_turn(dv, false);
}

/* Caller must hold dv->dv_gate->dg_mtx. */
static int
device_gate_admission_wait(device_t dv, bool interruptible)
{
        int error;
        device_gate_t *dg;

        /* When we enter the routine, dv->dv_gate is either
         * &gate_open, &gate_sealed, or X, where X is neither
         * &gate_open nor &gate_sealed.
         *
         * If dv_gate is X, it can change only to &gate_sealed,
         * and only during the cv_wait_sig(9) call, because
         * our caller holds dv->dv_gate->dg_mtx. 
         *
         * If dv->dv_gate does change from X to &gate_sealed,
         * we will now call cv_wait_sig(9) a second time
         * with the new dv->dv_gate->dg_mtx that the caller
         * does NOT hold.
         */

        if (dv->dv_gate == &gate_open)
                return 0;

        do {
                if ((dg = dv->dv_gate) == &gate_sealed)
                        return ENXIO;
        } while (!interruptible ||
            (error = cv_wait_sig(&dg->dg_mtx, &dg->dg_enter)) == 0);

        return error;
}

static int
device_gate_enter(device_t dv, bool interruptible)
{
        int error;
        device_gate_t *dg;

        device_enter(dv);

        dg = dv->dv_gate;

        mutex_enter(&dg->dg_mtx);
        error = device_gate_admission_wait(dv, interruptible);
        mutex_exit(&dg->dg_mtx);

        if (error != 0)
                device_release(dv);

        return error;
}

static int
device_gate_exit(device_t dv)
{
        device_gate_t *dg;

        dg = dv->dv_gate;

        mutex_enter(&dg->dg_mtx);
        device_exit(dv);
        cv_signal(&dg->dg_exit);
        mutex_exit(&dg->dg_mtx);
}

/* If `interruptible' is false, return 0 on success, or ENXIO if the
 * device has been detached.  If `interruptible' is true, return 0 on
 * success, ENXIO if the device has been detached, or EINTR or ERESTART
 * if the call has been interrupted by a signal.
 *
 * Only call device_acquire() from thread context.
 */
int
device_acquire(device_t dv, bool interruptible)
{
        device_gate_t *dg;
        int error = 0;

        if (__predict_true((dg = dv->dv_gate) == &gate_open)) {
                device_enter(dv);
                return 0;
        }

        /* Slow path. */
        return device_gate_enter(dv, interruptible);
}

void
device_release(device_t dv)
{
        struct device_turnstiles *dt;
        device_gate_t *dg;

        if (__predict_true((dg = dv->dv_gate) == &gate_open))
                device_exit(dv);
        else
                device_gate_exit(dv); /* Slow path. */
}

static void
device_gate_init(device_gate_t *dg, const char *wmesg)
{
        mutex_init(&dg->dg_mtx, MUTEX_DEFAULT, IPL_NONE);
        cv_init(&dg->dg_enter, wmesg);
        cv_init(&dg->dg_exit, wmesg);
}

static void
device_gate_destroy(device_gate_t *dg)
{
        cv_destroy(&dg->dg_exit);
        cv_destroy(&dg->dg_enter);
        mutex_destroy(&dg->dg_mtx);
}

static device_gate_t *
device_gate_alloc(const char *wmesg)
{
        device_gate_t *dg;

        if ((dg = kmem_alloc(sizeof(*dg))) == NULL)
                return NULL;

        device_gate_init(dg, wmesg);

        mutex_enter(&dg->dg_mtx);

        return dg;
}

static void
device_gate_free(device_gate_t *dg)
{
        mutex_exit(&dg->dg_mtx);
        device_gate_destroy(dg);
        kmem_free(dg, sizeof(*dg));
}

int
device_gate_close(device_t dv, const char *wmesg, device_gate_t **dgp)
{
        int error;
        device_gate_t *dg, *ndg;

        if ((ndg = device_gate_alloc(wmesg)) == NULL)
                return ENOMEM;

        dg = dv->dv_gate;
        mutex_enter(&dg->dg_mtx);

        /* Now we hold the lock on some gate, but some thread may
         * have re-assigned dv->dv_gate between our reading dv_gate
         * and acquiring dg's mutex.  If we don't hold the mutex
         * on the current dv_gate, release and try again.
         */
        while (__predict_false(dg != dv->dv_gate)) {
                mutex_exit(&dg->dg_mtx);
                dg = dv->dv_gate;
                mutex_enter(&dg->dg_mtx);
        }

        /* If some other thread has already sealed a gate on dv,
         * there is nothing left to do.
         */
        if (dg == &gate_sealed) {
                mutex_exit(&dg->dg_mtx);
                device_gate_destroy(ndg);
                return ENXIO;
        }

        device_enter(dv);

        /* If the gate is still open, put a closed gate on dv and
         * return to caller.
         */
        if (dg == &gate_open) {
                dv->dv_gate = ndg;
                mutex_exit(&dg->dg_mtx);
                *dgp = ndg;
                return 0;
        }

        /* Some other thread has already closed the gate.  We won't
         * need our own gate, so destroy it.  Wait for admission
         * to the other thread's gate.
         */
        *dgp = dg;
        device_gate_free(ndg);

        error = device_gate_admission_wait(dv, false);
        KASSERT(error == ENXIO);

        mutex_exit(&dg->dg_mtx);
        device_release(dv);

        return error;
}

static void
device_gate_sum(void *p, void *arg, struct cpu_info *)
{
        device_tstyle_t *ts = p;
        device_tstyle_t *sum = arg;

        sum->ts_in += ts->ts_in;
        sum->ts_out += ts->ts_out;
}

void
device_gate_seal(device_t dv, device_gate_t *dg)
{
        device_tstiles_t sum;
        KASSERT(mutex_owned(&dg->dg_mtx));
        KASSERT(dv->dv_gate == dg);

        dv->dv_gate = &gate_sealed;
        cv_broadcast(&dg->dg_enter);
        device_exit(dv);
        for (;;) {
                sum.ts_in = sum.ts_out = 0;
                percpu_foreach(dv->dv_turnstiles, device_gate_sum, &sum);
                if (sum.ts_in == sum.ts_out)
                        break;
                cv_wait(&dg->dg_exit);
        }
        device_gate_free(dg);
}

static int
device_gate_default_init(void)
{
        device_gate_init(&gate_sealed, "sealgate");
        device_gate_init(&gate_open, "opengate");
        return 0;
}

int
device_attach_gate(device_t dv)
{
        int rc;
        device_tstiles_t *dt;
        ONCE_DECL(initgates);

        rc = RUN_ONCE(&initgates, device_gate_default_init);

        if (rc != 0)
                return rc;

        if ((dt = percpu_alloc(sizeof(*dt))) == NULL)
                return ENOMEM;

        dv->dv_turnstiles = dt;
        dv->dv_gate = &gate_open;

        return 0;
}


Home | Main Index | Thread Index | Old Index