NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES



The following reply was made to PR kern/54994; it has been noted by GNATS.

From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
To: Jason Thorpe <thorpej%me.com@localhost>
Cc: Nick Hudson <nick.hudson%gmx.co.uk@localhost>, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, gnats-bugs%netbsd.org@localhost
Subject: Re: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with
 __HAVE_CPU_UAREA_ROUTINES
Date: Mon, 2 Mar 2020 09:03:39 +0900

 On 2020/02/27 7:13, Jason Thorpe wrote:
 >> On Feb 26, 2020, at 7:13 AM, Rin Okuyama <rokuyama.rk%gmail.com@localhost> wrote:
 >>
 >> Certainly. Then, what should we do?
 >>
 >> Until now, we've learned:
 >>
 >> (1) uarea_poolpage_alloc() can fall back into uvm_km_alloc():
 >>
 >> 	https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#269
 >>
 >>     This does not work if low-level routines need physically
 >>     contiguous (i.e., direct-mapped) pages for u-area.
 >>
 >> (2) However, all ports with __HAVE_CPU_UAREA_ROUTINES actually do
 >>     *not* need contiguous u-area anymore, as far as we can see.
 > 
 > AFAIK, they *never* did.  Certainly, Alpha does not require a physically-contiguous u-area, neither does x86.  Heck, neither does MIPS, assuming wired TLB entries are used to keep the kernel stack mapped.  A physically contiguous u-area is ONLY required if you are using a direct-mapped segment to provide the address of the u-area to the CPU.
 
 OK
 
 >> (3) Unfortunately, (2) does not mean that fallback of (1) is safe.
 >>     If some ports, that need direct-mapped u-area, bump USPACE from
 >>     1 to 2 (or more), fallback of uvm_km_alloc() results in memory
 >>     corruption. This is what we observed on powerpc/ibm4xx.
 >>
 >> So, we have some options to do:
 >>
 >> (a) Add MD flag to forbid fallback of uvm_km_alloc().
 >>
 >> Or if this seems too much,
 >>
 >> (b) Leave some comments in uarea_poolpage_alloc().
 >>
 >> Thoughts?
 > 
 > We need to understand why the fallback fails on the platforms where it does fail.  The following statements should all be true:
 > 
 > 1- If physically-contiguous pages for the u-area can be allocated and mapped with a direct-mapped segment, we should be able to use that.
 > 
 > 2- If phusically-contiguous pages for the u-area cannot be allocated, then the system should be able to use a u-area that is virtually mapped but not physically contiguous.
 > 
 > (2) used to be the way the system always worked for UPAGES > 1.
 
 As far as I can see, all archs except for powerpc/ibm4xx satisfy both
 (1) and (2). (More precisely, they seem not to requires direct-mapped
 memory for u-area.)
 
 For ibm4xx, the external interrupt handler uses kernel stack before
 enabling translation by MMU:
 
      https://nxr.netbsd.org/xref/src/sys/arch/powerpc/ibm4xx/trap_subr.S#INTR_SAVE
 
 I managed to enable MMU before stack manipulation, but it causes kernel
 panic due to TLB miss in the interrupt handler (see details below).
 
 Thanks,
 rin
 
 Details:
 
 By enabling MMU before using kernel stack in the interrupt handler,
 kernel panic occurs when UPAGES == 2 and __HAVE_FAST_SOFTINTS. This is
 due to TLB miss in the 2nd page of u-area.
 
 However, I do not understand the situation yet; (a) why such a TLB miss
 does not results in kernel panic for other exception handlers, that
 already enable MMU in a similar manner:
 
      https://nxr.netbsd.org/xref/src/sys/arch/powerpc/ibm4xx/trap_subr.S#FRAME_SETUP
 
 and (b) why not without __HAVE_FAST_SOFTINTS.
 
 Seems like a problem in __HAVE_FAST_SOFTINTS v.s. powerpc/ibm4xx.
 
 This is not a very urgent matter, because there is no problem with
 UPAGES == 1 for ibm4xx. But should get fixed, of course.
 


Home | Main Index | Thread Index | Old Index