Subject: Re: Efficiency of nfsiod
To: Gordon W. Ross <gwr@netbsd.org>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 11/11/1998 22:28:14
>So, why can't the ARM or MIPS do this if desired?

i answered ross in gory detail but:


cache consistency, with virtually-indexed caches. get two writable
copies of the same physical address in the cache, (or stale data
address at the `right' virtual address) and youve already lost.

suppose the cache has no ASID bits (you do the TLB lazy-eval thing,
fine, and then you end up execuing stale I-cache footprint from the
same random address in some _other_ process.)  see the sa-110 code.  I
forget the exact details of Mark Brinicombes explanation, but that's
the gist of it.

iirc, solaris on sparcs restricts shared mmap()s to be at the same
offset, modulo the size of the largest supported virtually-indexed
cache. thats basically the problem but in a different guise.

your trick should stillwork with kernel-only threads which use
vmspace0 (and thus access only the globally-shared kernel text/data
mappings). but we were talking about nfsiod's, and they (a) have
userspace cache footprint from before they called sys_nvssvc(), and
(b) they sometimes can return to userspace...

but as best i understand /arch/arm32/arm32/{cpufunc_asmcpuswitch}.S
(as badly as someone who knew 6502 some 20 years ago, ie not at all:)
it looks like an sa-110 will invalidate the entire userspace cache on
every cpu_switch, even if its switching between one kernel-only
vmspace0 thread and another...  let alone one nfsiod and another.

and mips: mips3 has split i and d cache, both virtually-indexed.  some
systems also have physically-indexed L2 cache. the L2 will trigger a
cache-coherency exception if cache aliases are detected. the exception
handler forces writeback, if needed, and then invalidates the `old' line.

other mips3 systems have no l2 cache, and you dont get the VCE exception.

the kernel makes sure that shared writable pages are mapped
asuncacheable; but Soda-san reports weird bugs on Arc machines without
l2 cache. I'm not sanguine that we get the uncached-mapping correct in
all cases.  Forcing d-cache writeback/invalidate on context-switch is
one alternative. yes, we can elide that for vmspace0, but (again) not
for nfsiods. Unless we get better hints from the kernel about exactly
which proceses it's safe to short-cut the heavyweight TLB/cache
frobbing protocol needed for correct context-switches of full-weight
user address spaces.

(there're lots of options here, like deferring the cache-flush till
right before you go back to userspace, rather than when you set
curproc;; but that could potentially break copyin/copyout access,
or old 1.3-style signal delivery, ...)