NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES



On 2020/02/27 7:13, Jason Thorpe wrote:
On Feb 26, 2020, at 7:13 AM, Rin Okuyama <rokuyama.rk%gmail.com@localhost> wrote:

Certainly. Then, what should we do?

Until now, we've learned:

(1) uarea_poolpage_alloc() can fall back into uvm_km_alloc():

	https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#269

    This does not work if low-level routines need physically
    contiguous (i.e., direct-mapped) pages for u-area.

(2) However, all ports with __HAVE_CPU_UAREA_ROUTINES actually do
    *not* need contiguous u-area anymore, as far as we can see.

AFAIK, they *never* did.  Certainly, Alpha does not require a physically-contiguous u-area, neither does x86.  Heck, neither does MIPS, assuming wired TLB entries are used to keep the kernel stack mapped.  A physically contiguous u-area is ONLY required if you are using a direct-mapped segment to provide the address of the u-area to the CPU.

OK

(3) Unfortunately, (2) does not mean that fallback of (1) is safe.
    If some ports, that need direct-mapped u-area, bump USPACE from
    1 to 2 (or more), fallback of uvm_km_alloc() results in memory
    corruption. This is what we observed on powerpc/ibm4xx.

So, we have some options to do:

(a) Add MD flag to forbid fallback of uvm_km_alloc().

Or if this seems too much,

(b) Leave some comments in uarea_poolpage_alloc().

Thoughts?

We need to understand why the fallback fails on the platforms where it does fail.  The following statements should all be true:

1- If physically-contiguous pages for the u-area can be allocated and mapped with a direct-mapped segment, we should be able to use that.

2- If phusically-contiguous pages for the u-area cannot be allocated, then the system should be able to use a u-area that is virtually mapped but not physically contiguous.

(2) used to be the way the system always worked for UPAGES > 1.

As far as I can see, all archs except for powerpc/ibm4xx satisfy both
(1) and (2). (More precisely, they seem not to requires direct-mapped
memory for u-area.)

For ibm4xx, the external interrupt handler uses kernel stack before
enabling translation by MMU:

    https://nxr.netbsd.org/xref/src/sys/arch/powerpc/ibm4xx/trap_subr.S#INTR_SAVE

I managed to enable MMU before stack manipulation, but it causes kernel
panic due to TLB miss in the interrupt handler (see details below).

Thanks,
rin

Details:

By enabling MMU before using kernel stack in the interrupt handler,
kernel panic occurs when UPAGES == 2 and __HAVE_FAST_SOFTINTS. This is
due to TLB miss in the 2nd page of u-area.

However, I do not understand the situation yet; (a) why such a TLB miss
does not results in kernel panic for other exception handlers, that
already enable MMU in a similar manner:

    https://nxr.netbsd.org/xref/src/sys/arch/powerpc/ibm4xx/trap_subr.S#FRAME_SETUP

and (b) why not without __HAVE_FAST_SOFTINTS.

Seems like a problem in __HAVE_FAST_SOFTINTS v.s. powerpc/ibm4xx.

This is not a very urgent matter, because there is no problem with
UPAGES == 1 for ibm4xx. But should get fixed, of course.


Home | Main Index | Thread Index | Old Index