Subject: Re: percpu storage allocator
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 11/12/2007 21:38:02
On Mon, Nov 12, 2007 at 08:41:35PM +0900, YAMAMOTO Takashi wrote:

> the attached files contain an implementation of percpu storage.
> (and some users of it.)
> the patch is against vmlocking branch.

Nice! This would be very useful for a distributed reader/writer lock or
making evcnts per-cpu.

> static void
> percpu_cpu_swap(void *p1, void *p2)
> {
> 	struct cpu_info * const ci = p1;
> 	percpu_cpu_t * const newpcc = p2;
> 	percpu_cpu_t * const pcc = cpu_percpu(ci);
> 
> 	rw_enter(&percpu_swap_lock, RW_WRITER);
> 	/* unless anyone has beaten us... */
> 	if (newpcc->pcc_size > pcc->pcc_size) {
> 		percpu_cpu_t tmp;
> 		int s;
> 
> 		s = splhigh();
> 		/* copy data to new storage */
> 		memcpy(newpcc->pcc_data, pcc->pcc_data, pcc->pcc_size);
> 		/* swap */
> 		tmp = *pcc;
> 		*pcc = *newpcc;
> 		splx(s);
> 		*newpcc = tmp;
> 	}
> 	rw_exit(&percpu_swap_lock);
> }

I think you could get away with not taking the rwlock in the cross call
thread by deferring the lock to percpu_cpu_enlarge():

	if (pcc.pcc_data != NULL) {
		rw_enter(&percpu_swap_lock, RW_WRITER);
		/* Ensures old value of pcc_data is no longer used. */
		rw_exit(&percpu_swap_lock);
		kmem_free(pcc.pcc_data, pcc.pcc_size);
	}

Threads traversing will see either old or new values, but in either case the
pointer dereference will be safe. It may not be worth it; if so we should
probably document that threads doing a traversal should be careful not to
delay the xcall thread.

> static void
> percpu_cpu_enlarge(size_t size)
> {
> 	CPU_INFO_ITERATOR cii;
> 	struct cpu_info *ci;
> 
> 	for (CPU_INFO_FOREACH(cii, ci)) {
> 		percpu_cpu_t pcc;
> 
> 		pcc.pcc_data = kmem_zalloc(size, KM_SLEEP);
> 		pcc.pcc_size = size;
> 		if (!mp_online) {
> 			percpu_cpu_swap(ci, &pcc);
> 		} else {
> 			uint64_t where;
> 
> 			where = xc_unicast(0, percpu_cpu_swap, ci, &pcc, ci);
> 			xc_wait(where);
> 		}
> 		KASSERT(pcc.pcc_size < size);
> 		if (pcc.pcc_data != NULL)
> 			kmem_free(pcc.pcc_data, pcc.pcc_size);
> 	}
> }

The caller's stack could be swapped out before the xcall completes so &pcc
isn't safe.

Thanks,
Andrew