threadpool(9) -- an API for scheduling jobs on shared pools of threads

To: Tech-kern <tech-kern%netbsd.org@localhost>
Subject: threadpool(9) -- an API for scheduling jobs on shared pools of threads
From: Jason Thorpe <thorpej%me.com@localhost>
Date: Wed, 26 Dec 2018 08:08:32 -0800

Hey folks ...

The other day I checked in a new kernel API - threadpool(9) - an API for scheduling jobs on shared pools of threads.  This new API was written by Taylor Campbell, but had been languishing as a WIP draft for some years now.  As it happens, I have a need for the (forthcoming, still being debugged) task(9) API, also written by Taylor, that's built on top of threadpool(9), so I decided to whip it into shape.

threadpool(9) basically makes it easy to create jobs that run in thread context, but that don't necessarily need to have a thread waiting around all the time to do work.  Kernel threads are created as-needed, and the threads will idle out and exit after a period of time if there is no work for them to do.  Thread pools are a shared system resource, and you can use unbound pools (that have no particular CPU affinity) or pools that are bound to a specific CPU (in the case of per-CPU pools, each CPU gets its own private pool for each priority).

The pools themselves are also created on-demand, only when requested by something else.  When requesting a reference to a pool, the caller specifies the priority that the threads should run at: PRI_NONE (the default timesharing priority) or up to MAXPRI_KERNEL_RT.

The threadpool(9) work abstraction is built around the concept of a "job", threadpool_job_t.  This is an opaque structure that the caller needs to allocate storage for.  A job can be scheduled on a pool from any context, including hard interrupt context up to and including IPL_VM.  The job will run until completion, and can take an arbitrarily long time, and sleep an arbitrarily long time; additional threads will be created in the pool for other jobs on-demand.  Note: there is no hard limit on the number of threads a pool will create.  Once scheduled, a job cannot be scheduled again until it has completed, at which point it needs to notify the system of this fact by calling threadpool_job_done().  Job cancellation is possible if the job has not yet run, but once a job is running, cancellation must wait for it to complete.  Among other things, this provides a deterministic way to ensure that a job is not running.  More information on job lifecycle and cancellation semantics can be found in the man page.

Job functions provided by the caller are passed a pointer to the threadpool_job_t corresponding to the work they're doing.  It is expected that this threadpool_job_t is embedded in the caller's state structure, and this state can be recovered by using the "container-of" access pattern, e.g.:

struct my_job_state {
	kmutex_t mutex;
	int some_counter;
	threadpool_job_t the_job;
};

threadpool_t *unbound_lopri_pool;

...

void
my_job_setup_routine(struct my_job_state *state)
{
	...
	error = threadpool_get(&unbound_lopri_threadpool, PRI_NONE);
	...
	threadpool_job_init(&state->the_job, my_job_function, &state->mutex);
	...
}

void
my_interrupt_handler(struct my_job_state *state)
{
	...
	mutex_enter(&state->mutex);
	...
	threadpool_schedule_job(unbound_lopri_pool, &state->the_job);
	...
	mutex_exit(&state->mutex);
	...
}

void
my_job_function(threadpool_job_t *job)
{
	struct my_job_state *state =
	    container_of(job, struct my_job_state, the_job);

	/* state->mutex is unlocked upon entering job */

	/* do whatever needs to be done. */

	threadpool_job_done(job);

	/* thread we ran on will be idle'd and available for future jobs until timing out and exiting. */
}

In addition to the foundation for task(9) (and eventually workqueue(9) -- Taylor already has a prototype implementation of that API on top of threadpool(9)), this API could be very useful for some of the other uses of ephemeral or mostly-idle kernel threads... the two that jumped into my mind immediately were scsibus_discover_thread and scsipi_completion_thread ... each of those could be easily converted to jobs running on an unbound PRI_NONE thread pool.  atabusconfig_thread and atabus_thread are other obvious candidates.  I actually think those 4 examples are really great applications of where to use threadpool(9) directly, because they are infrequent, but potentially arbitrarily long-running, and thus not suitable for task(9).  If someone wants to tackle those, I'd be happy to answer questions about how to use threadpool(9) in those situations (or perhaps I'll just do it so that there's a concrete in-tree example of how to use the API that's not as complex as task(9) is).

Anyway, feel to reach out if you have questions!  And kudos to Taylor for the great work... it didn't require much effort to get it the never-even-compile-tested draft up and running.

-- thorpej

Prev by Date: Re: RFC: vioif(4) multiqueue support
Next by Date: linux compat code vs netbsd kernel-internal negative error numbers
Previous by Thread: Current best practice for testing a new kernel API?
Next by Thread: linux compat code vs netbsd kernel-internal negative error numbers
Indexes:

Home | Main Index | Thread Index | Old Index