NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-sparc/56788 Rump kernel panic: kernel diagnostic assertion "old == LOCK_LOCKED" failed
The following reply was made to PR port-sparc/56788; it has been noted by GNATS.
From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-hppa-maintainer%netbsd.org@localhost
Subject: Re: port-sparc/56788 Rump kernel panic: kernel diagnostic assertion "old == LOCK_LOCKED" failed
Date: Fri, 17 Jun 2022 17:46:19 -0400
I've spent a great deal of time digging into the t_ping test case,
and I think I see what is happening on HPPA ... but I'm not sure
if it explains anything for SPARC.
In the first place, my theory about bogus RAS setup is all wet.
What's actually being used in librump for HPPA is
sys/rump/librump/rumpkern/atomic_cas_generic.c
which does not use the RAS mechanism. Instead it relies on
__cpu_simple_lock() which is a simple LDCW spinlock, which looks
entirely bulletproof --- but by experiment, it fails intermittently
in various t_ping test cases. I eventually realized that each
of the problem tests is forking a child process and then running
rump kernels in both the parent and child processes. The two
kernels communicate via a memory-mapped host file (cf
rump_pub_shmif_create) and we're trying to synchronize via
atomic_cas_32() applied to a word in that shared memory. That's
fine, but what makes atomic_cas_32 actually atomic? AFAICS, the
LDCW spinlock storage is a static array in atomic_cas_generic.c,
which means that *each rump kernel has its own copy* after the
fork. Therefore, __cpu_simple_lock() successfully interlocks
between threads in each rump kernel, but not between threads in
the two kernels, making atomic_cas_32() not at all atomic for
if_shmem.c's "bus" lock.
According to Makefile.rumpkern, atomic_cas_generic.c is used on
.if (${MACHINE_CPU} == "arm" && "${FEAT_LDREX}" != "yes") \
|| ${MACHINE_ARCH} == "coldfire" || ${MACHINE_CPU} == "hppa" \
|| ${MACHINE_CPU} == "mips" || ${MACHINE_CPU} == "sh3" \
|| ${MACHINE_ARCH} == "vax" || ${MACHINE_ARCH} == "m68000"
so I'd kind of expect these tests to fail on all of those arches.
(Their spinlock mechanisms vary of course, but the problem of
the spinlock data being process-local will be the same for all.)
It's striking though that SPARC is not in this list. Perhaps
it has some related failure mechanism? I see that
src/common/lib/libc/arch/sparc/atomic/atomic_cas.S
makes use of a "locktab array" that looks like it'd have the
same problem of being process-local, but I am not sure if that
code is used in a rump kernel, or whether it applies to the
SPARC variant being complained of here.
Anyway, I'm not sure that I see a practical solution to make
these test cases work on these arches. Potentially we could
add a spinlock field to struct shmif_mem, but getting
atomic_cas_32 to use it would entail some ugly API changes.
Maybe it's better just to skip the problematic test cases
on these arches.
regards, tom lane
Home |
Main Index |
Thread Index |
Old Index