kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
From: rokuyama.rk%gmail.com@localhost
Date: Fri, 21 Feb 2020 11:15:00 +0000 (UTC)

>Number:         54994
>Category:       kern
>Synopsis:       Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 21 11:15:00 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.47 and netbsd-9 at least
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD obs266 9.99.47 NetBSD 9.99.47 (OBS266) #154: Thu Feb 20 17:28:10 JST 2020  rin@latipes:/build/src/sys/arch/evbppc/compile/OBS266 evbppc
>Description:
For archs with __HAVE_CPU_UAREA_ROUTINES, i.e., alpha, mips, powerpc,
and riscv, uarea_poolpage_alloc() falls back to uvm_km_alloc() if
cpu_uarea_alloc() fails:

    https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#uarea_poolpage_alloc

This behavior is incorrect.

For these archs, cpu_uarea_alloc() allocates direct-mapped physically
contignous pages for u-area by using uvm_pglistalloc():

    https://nxr.netbsd.org/source/s?refs=cpu_uarea_alloc&project=src

Here, uvm_pglistalloc() may give up allocating memory even if waitok:

    https://netbsd.gw.com/cgi-bin/man-cgi?uvm_pglistalloc++NetBSD-current

This is because there's no way to wake up a process when contignous
pages become available. In this case, cpu_uarea_alloc() returns NULL,
and uvm_km_alloc() is used instead. However, pages allocated by
uvm_km_alloc() are not physically contignous, of course.

Since low-level routines for these archs assume physically contignous
u-area, this results in a catastrophe; pages happens to follow the 1st
page of u-area will be randomly overwritten.

I found this is the cause of random kernel crashes on an ibm4xx box
under heavy load, which has small memory and large page size.
>How-To-Repeat:
Fork a lot of processes/threads on above mentioned archs. For my
ibm4xx box with 128MB memory, /usr/tests/lib/librumpclient/h_execthr
causes kernel crash almost always in multi-user mode.

>Fix:
Stop uarea_poolpage_alloc() to fall back into uvm_km_alloc(), and let
it return NULL if cpu_uarea_alloc() fails. Then, fork(2) fails with
ENOMEM correctly when contignous pages are not available, instead of
mysterious kernel crashes.

However, there still remains a minor problem. Since u-area is allocated
with PR_WAITOK,

    https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#uvm_uarea_alloc

pool subsystem assumes that uarea_poolpage_alloc() never return NULL.
KASSERT failures take place otherwise. We therefore need to add a new
flag, PR_MAYFAIL for example, to indicate that pa_alloc() may fail
even for PR_WAITOK.

Here's patch for uvm as well as pool subsystem:

    http://www.netbsd.org/~rin/uarea_poolpage_20200221.patch

Note that for ibm4xx, we need only single page for u-area; PAGE_SIZE =
16KB is the same size of u-area for other powerpc processors. I will
change UPAGES from 2 to 1 for ibm4xx as a workaround, but this problem
still needs to be fixed at a fundamental level.

Follow-Ups:
- Re: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
  - From: Jason Thorpe

Prev by Date: Re: kern/54922 (linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674)
Next by Date: PR/54994 CVS commit: src/sys/arch/powerpc/include
Previous by Thread: Re: bin/54801 (sdiff tests fail on aarch64 since switch to GCC 8)
Next by Thread: Re: kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
Indexes:

Home | Main Index | Thread Index | Old Index