Subject: Re: workqueue(9), per-CPU queues [was: Re: soc zfs: taskqueue / workqueue]
To: None <rmind@NetBSD.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 07/11/2007 13:51:45
On Wed, Jul 11, 2007 at 02:54:58PM +0900, YAMAMOTO Takashi wrote:

> > I have modified workqueue(9) interface to have per-CPU queues and threads,
> > when a WQ_PERCPU flag is used - as intended by design. Here is a patch:
> > http://www.netbsd.org/~rmind/workqueue-2.diff
> > Comments?
> > 
> > -- 
> > Best regards,
> > Mindaugas
> > www.NetBSD.org
> 
> - thanks for taking a look at it.
> 
> - i'm not happy with iterating a list on each enqueue operations.
>   probably it's ok for now, but, at least, please make it a function
>   rather than having two copies.
> 
> 	static struct workqueue_queue *
> 	workqueue_queue_lookup(struct workqueue *, struct cpu_info *);
> 
>   for long term, probably it's useful to have "dense" cpuid which can be
>   used as an array index.
> 
> - use-after-free in workqueue_finiqueue.
> 
> - why s/kmem_alloc/kmem_zalloc/ ?
> 
> - why removed the check of kmem_alloc failure?
>   iirc, at least currently, it can fail.
> 
> - i doubt if it's worth to have a pool for workqueue_queue.

- Agreed on all of the above, but I would personally like to see us get this
  right the first time and that means having the "indexing cpu id". If items
  are being enqueued thousands of times a second the loop is too expensive.
  I think the ID could be assigned in mi_cpu_attach().

- I'd prefer the kthreads named ("%s/%d", wq->wq_name, ci->ci_cpuid)

- If the cpu is not specified, and the workqueue has a thread per-CPU, the
  item should by default go onto the local CPU's workqueue.

If you are to use an index, it probably means dynamically allocating and
sizing struct workqueue. For the maximum benefit, you could also include the
workqueue_queues in that block, and ensure that each one is aligned on a
64-byte boundary. With the lock and queue structures in each workqueue_queue
they are fairly cache-intensive items. Ensuring each is in its own line can
reduce cache ping-ping between CPUs.

Andrew