The attached patch adapts workqueue(9) to use a threadpool job instead
of a worker kthread in each queue. There are a couple of advantages
to this change:
1. No kthreads for unused workqueues.
A lot of drivers and subsystems create workqueues to handle tasks
in thread context -- often per-CPU workqueues. Even if these
subsystems aren't in use, they currently require kthreads to be
allocated, which may use up a substantial amount of kva.
With this change, kthreads are created only according to demand by
threadpool(9) -- but the workqueues are still guaranteed the same
concurrency as before, as long as kthreads can be created.
2. Workqueues can be created earlier (and are ready for CPU hotplug).
Right now, workqueue_create with WQ_PERCPU cannot be used until
after all CPUs have been detected. I accidentally started doing
this in wg(4) because I've been running with this patch (and the
issue doesn't affect the rumpy atf tests).
For now I'll apply a workaround to wg(4), but it would be nice if
modules could create workqueues before configure (and if, should
anyone make CPU hotplug happen, workqueues were not a barrier to
that).
This change uses percpu_create and threadpool_job_init, rather than
explicit allocation of ncpu-sized arrays and kthread_create. Using
percpu_create means percpu(9) takes care of running initialization
when CPUs are attached, and using threadpool_job_init instead of
kthread_create means we don't have to worry about failure when
initializing each CPU's queue in the percpu_create constructor.
The downside, of course, is that workqueue_create no longer guarantees
preallocation of all the threads needed to run the workqueues --
instead, if there is a shortage of threads dynamically assigned to do
work under load, the threadpool(9) logic will block until they can be
created.
Thoughts?