tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: reference counts



   Date: Wed, 15 Apr 2015 11:27:16 +0900
   From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>

   I'm trying refcount(9) with bridge. It works well for now.
   (I got kernel freeze once but I'm not sure it's due to refcount(9)
   because my testing VM sometimes freezes around uvm :-/)

   The diff is here: http://www.netbsd.org/~ozaki-r/bridge-refcount.diff

Looks nice!  I spotted one race which was there to begin with: nothing
prevents multiple concurrent bridge_delete_member on the same member.
I think bridge_delete_member should require the caller to hold the
lock, and drop it to wait for users to drain:

bridge_delete_member(sc, bif)
{

	KASSERT(BRIDGE_LOCKED(sc));
	...
	LIST_REMOVE(bif, bif_next);
	BRIDGE_PSZ_PERFORM(sc);

	BRIDGE_UNLOCK(sc);
	mutex_enter(sc->sc_iflist_intr_lock);
	refcount_dec_drain(&bif->bif_refcount, &sc->sc_iflist_cv,
	    &sc->sc_iflist_intr_lock);
	mutex_exit(sc->sc_iflist_intr_lock);
	...
	BRIDGE_LOCK(sc);
}

bridge_ioctl_del(sc, arg)
{

	BRIDGE_LOCK(sc);
	LIST_FOREACH(bif, &sc->sc_iflist, bif_next)
		...
	bridge_delete_member(sc, bif);
	BRIDGE_UNLOCK(sc);
	...
}

bridge_clone_destroy(sc)
{

	...
	BRIDGE_LOCK(sc);
	while ((bif = LIST_FIRST(&sc->sc_iflist)) != NULL)
		bridge_delete_member(sc, bif);
	BRIDGE_UNLOCK(sc);
	...
}

   Two comments to refcount(9) from me:
   - Can we have cv in struct refcount? I think we don't need to share
     it with outside refcount unlike mutex.

I'd rather not do that because:

(a) It doesn't hurt to share the condvar for most users -- all
condvars share sleepqs with some other condvars anyway, and everyone
must be prepared for spurious wakeups.

(b) Not all users need the condvar, so it would waste space for them.
Consider, e.g., kauth_cred_t, which would use refcount_dec_local
instead of refcount_dec_{signal,broadcast}.

   - It seems that the API assumes that refcount_inc is not called
     after refcount_dec_drain as KASSERT(0 < old) in refcount_inc
     implies. Don't we need to guarantee the assumption somehow?
     Or users have to guarantee that by themselves? (e.g., by using
     pserialize like bridge.) If so, we should document about that
     in refcount.9.

Users should remove the last reference in a table, or mark it dying,
so that nobody else can get to it, before calling refcount_dec_drain.
I added a note to the man page -- new draft attached.

This also enables a little optimization in refcount_dec_drain: if the
reference count is 1, it can avoid the atomic.  This means, e.g.,
emptying a large table (say, the vnode cache) need not issue atomic
operations for every entry, only for those that are currently in use.
/*	$NetBSD$	*/

/*-
 * Copyright (c) 2015 The NetBSD Foundation, Inc.
 * All rights reserved.
 *
 * This code is derived from software contributed to The NetBSD Foundation
 * by Taylor R. Campbell.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

#ifndef	_SYS_REFCOUNT_H
#define	_SYS_REFCOUNT_H

#include <sys/types.h>
#include <sys/atomic.h>
#include <sys/condvar.h>
#include <sys/errno.h>
#include <sys/mutex.h>
#include <sys/systm.h>

#include <machine/limits.h>

struct refcount {
	volatile unsigned	rc_value;
};

#if DIAGNOSTIC
static inline bool
refcount_referenced_p(const struct refcount *refcount)
{

	return 0 < refcount->rc_value;
}

static inline bool
refcount_exclusive_p(const struct refcount *refcount)
{

	KASSERT(refcount_referenced_p(refcount));
	return refcount->rc_value == 1;
}
#endif

static inline void
refcount_init(struct refcount *refcount)
{

	refcount->rc_value = 1;
}

static inline void
refcount_fini(struct refcount *refcount)
{

	KASSERT(!refcount_referenced_p(refcount));
	refcount->rc_value = UINT_MAX;
}

static inline int
refcount_inc(struct refcount *refcount)
{
	unsigned old, new;

	do {
		old = refcount->rc_value;
		if (old == UINT_MAX)
			return EBUSY;
		KASSERT(0 < old);
		new = old + 1;
	} while (atomic_cas_uint(&refcount->rc_value, old, new) != old);

	return 0;
}

static inline bool
refcount_dec_local(struct refcount *refcount)
{
	unsigned old, new;

	do {
		old = refcount->rc_value;
		KASSERT(0 < old);
		if (old == 1) {
			/*
			 * Avoid an atomic if we don't need it.  Caller
			 * guarantees that if the reference count is 1,
			 * nobody else can acquire new references.
			 */
			refcount->rc_value = 0;
			return true;
		}
		KASSERT(1 < old);
		new = old - 1;
	} while (atomic_cas_uint(&refcount->rc_value, old, new) != old);

	KASSERT(0 < new);
	return false;
}

static inline bool
refcount_dec_lock(struct refcount *refcount, kmutex_t *interlock)
{
	unsigned old, new;

	do {
		old = refcount->rc_value;
		KASSERT(0 < old);
		if (old == 1) {
			/*
			 * Transition from 1->0 is allowed only under
			 * the interlock.
			 */
			mutex_enter(interlock);
			new = atomic_dec_uint_nv(&refcount->rc_value);
			KASSERT(new != UINT_MAX);
			if (new == 0)
				return true;
			mutex_exit(interlock);
			return false;
		}
		KASSERT(1 < old);
		new = old - 1;
	} while (atomic_cas_uint(&refcount->rc_value, old, new) != old);

	KASSERT(0 < new);
	return false;
}

static inline void
refcount_dec_signal(struct refcount *refcount, kmutex_t *interlock,
    kcondvar_t *cv)
{

	if (refcount_dec_lock(refcount, interlock)) {
		cv_signal(cv);
		mutex_exit(interlock);
	}
}

static inline void
refcount_dec_broadcast(struct refcount *refcount, kmutex_t *interlock,
    kcondvar_t *cv)
{

	if (refcount_dec_lock(refcount, interlock)) {
		cv_broadcast(cv);
		mutex_exit(interlock);
	}
}

static inline void
refcount_dec_drain(struct refcount *refcount, kmutex_t *interlock,
    kcondvar_t *cv)
{

	KASSERT(mutex_owned(interlock));

	/*
	 * Caller guarantees if our reference is exclusive, nobody can
	 * acquire new references.  (If there are other references
	 * already, then they may turn into new references.)
	 */
	if (refcount->rc_value == 1)
		refcount->rc_value = 0;

	if (0 < atomic_dec_uint_nv(&refcount->rc_value))
		do cv_wait(cv, interlock); while (0 < refcount->rc_value);
}

#endif	/* _SYS_REFCOUNT_H */
.\"	$NetBSD$
.\"
.\" Copyright (c) 2015 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Taylor R. Campbell.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.Dd April 11, 2015
.Dt REFCOUNT 9
.Os
.Sh NAME
.Nm REFCOUNT
.Nd reference counting
.Sh SYNOPSIS
.In sys/refcount.h
.Ft void
.Fn refcount_init "struct refcount *refcount"
.Ft void
.Fn refcount_fini "struct refcount *refcount"
.Ft int
.Fn refcount_inc "struct refcount *refcount"
.Ft bool
.Fn refcount_dec_local "struct refcount *refcount"
.Ft bool
.Fn refcount_dec_lock "struct refcount *refcount" "kmutex_t *interlock"
.Ft void
.Fn refcount_dec_signal "struct refcount *refcount" "kmutex_t *interlock" \
        "kcondvar_t *cv"
.Ft void
.Fn refcount_dec_broadcast "struct refcount *refcount" "kmutex_t *interlock" \
        "kcondvar_t *cv"
.Ft void
.Fn refcount_dec_drain "struct refcount *refcount" "kmutex_t *interlock" \
        "kcondvar_t *cv"
.Fd "#if DIAGNOSTIC"
.Ft bool
.Fn refcount_referenced_p "const struct refcount *refcount"
.Ft bool
.Fn refcount_exclusive_p "const struct refcount *refcount"
.Fd "#endif  /* DIAGNOSTIC */"
.Sh DESCRIPTION
.Nm
is an abstraction for maintaining reference counts to objects that may
be shared between multiple threads.
Each object with a reference count should embed a
.Li "struct refcount" ,
which users should treat as opaque and should not inspect or copy.
The
.Li "struct refcount"
must be initialized with
.Fn refcount_init
before use, and finalized with
.Fn refcount_fini
when done.
When acquiring a reference to an object, use
.Fn refcount_inc ,
which increments its reference count.
When releasing a reference to an object, use one of the
.Fn refcount_dec_*
routines.
.Pp
The
.Nm
abstraction supports multiple approaches to freeing objects when they
are no longer referenced:
.Bl -dash
.It
The object's lifetime is ended by an explicit free operation, and the
reference count serves only to notify the free operation when the
object is no longer in use.
.Pp
Users of the object should use
.Fn refcount_dec_signal
or
.Fn refcount_dec_broadcast
to notify the free operation when they are done.
The free operation should remove the object from any global tables
first, and then use
.Fn refcount_dec_drain
to wait until there are no users left before freeing its memory.
.It
The object is stored in a global cache, and has no explicit free
operation -- its lifetime ends when the last user is done with it, at
which point it must be removed from the cache and freed.
.Pp
Each user of the object should use
.Fn refcount_dec_lock
when done, which returns true and enters a mutex if and only if that
user is the last one.
This allows the last user to safely remove the object from the global
cache without a window during which new references can be acquired.
.It
The object has no explicit free operation and is not stored in any
global tables.
.Pp
Each user of the object should use
.Fn refcount_dec_local
when done, which returns true if and only if that user is the last one,
who can then free the object.
.El
.Pp
The transition from nonzero to zero references is permanent.
It is also guaranteed to happen under an interlock excluding new
references whenever there is a possibility that one thread may acquire
a new reference at the same time another is releasing what would have
been the last one.
.Pp
Note that
.Fn refcount_dec_local
is
.Em not
simply a primitive out of which the other
.Fn refcount_dec_*
functions are built.
It is
.Em not
correct to replace
.Bd -literal -offset indent
if (refcount_dec_lock(&obj->obj_refcount, &obj->obj_lock)) {
	...
}
.Ed
.Pp
by:
.Bd -literal -offset indent
if (refcount_dec_local(&obj->obj_refcount)) {
	mutex_enter(&obj->obj_lock);
	...
}
.Ed
.Pp
The reason is that
.Fn refcount_dec_lock
guarantees the transition from nonzero to zero happens under the
interlock, which the above fragment does not.
The name
.Fn refcount_dec_local
was chosen instead of the alternative
.Fn refcount_dec
to emphasize that it is not a part of
.Fn refcount_dec_lock
but a different variant altogether, which is safe only when the object
is not reachable from any global tables that it must be removed from
under a mutex when it becomes unreferenced.
.Sh FUNCTIONS
.Bl -tag -width abcd
.It Fn refcount_init refcount
Initialize
.Fa refcount
for use with
.Nm .
.It Fn refcount_fini refcount
Finalize
.Fa refcount .
Further use with
.Nm
is not allowed.
.It Fn refcount_inc refcount
Try to increment
.Fa refcount .
If there are too many other references
.Pq Dv UINT_MAX ,
return
.Dv EBUSY .
Otherwise, return zero.
.It Fn refcount_dec_local refcount
Decrement
.Fa refcount .
If it drops to zero, return true.
Otherwise return false.
.Pp
This is useful
.Em only
when the object whose references are counted by
.Fa refcount
is not stored in any global tables or caches, so that, without
interlocks, if the reference count
.Em would
drop to zero, there is no way for other threads to acquire new
references.
.It Fn refcount_dec_lock refcount interlock
Decrement
.Fa refcount .
If it drops to zero, enter the mutex
.Fa interlock
and return true.
Otherwise return false.
.Pp
May enter
.Fa interlock
even if it returns false, in case another thread acquired a reference
at the same time.
.It Fn refcount_dec_signal refcount interlock cv
Decrement
.Fa refcount .
If it drops to zero, enter the mutex
.Fa interlock ,
signal
.Fa cv ,
and exit
.Fa interlock .
.Pp
May enter
.Fa interlock
even if it does not signal
.Fa cv ,
in case another thread acquired a reference at the same time.
.It Fn refcount_dec_broadcast refcount interlock cv
Decrement
.Fa refcount .
If it drops to zero, enter the mutex
.Fa interlock ,
broadcast
.Fa cv ,
and exit
.Fa interlock .
.Pp
May enter
.Fa interlock
even if it does not broadcast
.Fa cv ,
in case another thread acquired a reference at the same time.
.It Fn refcount_dec_drain refcount interlock cv
Decrement
.Fa refcount ,
and wait under
.Fa interlock
on
.Fa cv
until the reference count drops to zero.
.Pp
The caller must hold
.Fa interlock ,
and must guarantee that if the reference count is already at one, then
no new references can be made -- e.g., by first removing the enclosing
object from its global table.
.Pp
May sleep.
.It Fn refcount_referenced_p refcount
Return true if the reference count of
.Fa refcount
is positive, false if zero.
.Pp
May be used only under
.Dv DIAGNOSTIC
for assertions.
Do not make run-time decisions on the basis of
.Fn refcount_referenced_p .
.It Fn refcount_exclusive_p refcount
Return true if caller's reference to
.Fa refcount
is exclusive -- that is, if the reference count of
.Fa refcount
is one.
Otherwise return false.
.Fa refcount
must be referenced: the reference count may not be zero.
.Pp
May be used only under
.Dv DIAGNOSTIC
for assertions.
Do not make run-time decisions on the basis of
.Fn refcount_exclusive_p .
.El
.Sh CODE REFERENCES
The
.Nm
abstraction is implemented in
.Pa sys/sys/refcount.h .
.Sh SEE ALSO
.Xr condvar 9 ,
.Xr mutex 9
.Sh BUGS
There are no examples in this man page.
.Pp
The implementation is untested.


Home | Main Index | Thread Index | Old Index