tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

New cpu_count races; seqlocks?



It is great to get rid of global contention over a bunch of counters
that don't need to be read exactly.  However, the new cpu_count logic
is racy -- particularly on 32-bit platforms, where it's not simply a
matter of reading stale values but a matter of reading total garbage:

/* sys/cpu_data.h */
static inline int64_t
cpu_count_get(enum cpu_count idx)
{
	extern int64_t cpu_counts[];
	return cpu_counts[idx];
}

On a 32-bit platform, this will generally be implemented by a sequence
of two 32-bit loads from memory, which can be torn and interrupted in
the middle.  For example, cpu_count_get(CPU_COUNT_NINTR) generates the
following i386 code (confirmed in procfs_linux.c, procfs_docpustat):

     65a:       8b 0d 20 00 00 00       mov    0x20,%ecx
                        65c: R_386_32   cpu_counts
     660:       8b 1d 24 00 00 00       mov    0x24,%ebx
                        662: R_386_32   cpu_counts

Perhaps the words will be written in an order such that on x86 CPUs
we'll read them in the correct order without garbage, but that's not
reliable in general.  To make this reliable in general, for quantities
that may overflow the 32-bit range, we might use a seqlock -- perhaps
conditionalized on !_LP64, with the cheaper atomic_load/store_relaxed
on _LP64 instead:

/* kern_cpu.c */
volatile uint32_t cpu_count_gen;
int64_t cpu_counts[...];

void
cpu_count_sync_all(void)
{
	uint32_t gen;

	/* Claim exclusive access for modifications.  */
	do {
		while (1 & (gen = atomic_load_relaxed(&cpu_count_gen)))
			SPINLOCK_BACKOFF_HOOK;
	} while (atomic_cas_32(&cpu_count_gen, gen, gen | 1) != gen);

	membar_producer(); /* store cpu_count_gen then cpu_counts[i] */
	...
		cpu_counts[i] = ...
	...
	membar_producer(); /* store cpu_counts[i] then cpu_count_gen */

	/* Relinquish exclusive access for modifications.  */
	KASSERT(cpu_count_gen == gen);
	atomic_store_relaxed(&cpu_count_gen, (gen | 1) + 1);
}

int64_t
cpu_count_get(enum cpu_count i)
{
	uint32_t gen;
	int64_t value;

	do {
		/* Wait for modifications to settle.  */
		while (1 & (gen = atomic_load_relaxed(&cpu_count_gen)))
			SPINLOCK_BACKOFF_HOOK;

		membar_consumer(); /* load cpu_count_gen then cpu_counts[i] */
		value = cpu_counts[i];
		membar_consumer(); /* load cpu_counts[i] then cpu_count_gen */

		/* Retry if modified in the interim.  */
	} while (__predict_false(gen != atomic_load_relaxed(&cpu_count_gen)));

	return value;
}

There is, of course, a chance that a 32-bit modification counter
(cpu_count_gen) can roll over, particularly if we are preempted in
cpu_count_get.  The probability of that happening may be low, but it
might also be prudent to kpreempt_disable during this read.

Perhaps we should have a seqlock(9) API for broader use.  I drafted an
implementation of Linux's seqlock API for a drm import from Linux
4.19:

https://github.com/riastradh/netbsd-src/blob/riastradh-drm419/sys/external/bsd/drm2/include/linux/seqlock.h


Home | Main Index | Thread Index | Old Index