NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
>Number: 54994
>Category: kern
>Synopsis: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 21 11:15:00 +0000 2020
>Originator: Rin Okuyama
>Release: 9.99.47 and netbsd-9 at least
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD obs266 9.99.47 NetBSD 9.99.47 (OBS266) #154: Thu Feb 20 17:28:10 JST 2020 rin@latipes:/build/src/sys/arch/evbppc/compile/OBS266 evbppc
>Description:
For archs with __HAVE_CPU_UAREA_ROUTINES, i.e., alpha, mips, powerpc,
and riscv, uarea_poolpage_alloc() falls back to uvm_km_alloc() if
cpu_uarea_alloc() fails:
https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#uarea_poolpage_alloc
This behavior is incorrect.
For these archs, cpu_uarea_alloc() allocates direct-mapped physically
contignous pages for u-area by using uvm_pglistalloc():
https://nxr.netbsd.org/source/s?refs=cpu_uarea_alloc&project=src
Here, uvm_pglistalloc() may give up allocating memory even if waitok:
https://netbsd.gw.com/cgi-bin/man-cgi?uvm_pglistalloc++NetBSD-current
This is because there's no way to wake up a process when contignous
pages become available. In this case, cpu_uarea_alloc() returns NULL,
and uvm_km_alloc() is used instead. However, pages allocated by
uvm_km_alloc() are not physically contignous, of course.
Since low-level routines for these archs assume physically contignous
u-area, this results in a catastrophe; pages happens to follow the 1st
page of u-area will be randomly overwritten.
I found this is the cause of random kernel crashes on an ibm4xx box
under heavy load, which has small memory and large page size.
>How-To-Repeat:
Fork a lot of processes/threads on above mentioned archs. For my
ibm4xx box with 128MB memory, /usr/tests/lib/librumpclient/h_execthr
causes kernel crash almost always in multi-user mode.
>Fix:
Stop uarea_poolpage_alloc() to fall back into uvm_km_alloc(), and let
it return NULL if cpu_uarea_alloc() fails. Then, fork(2) fails with
ENOMEM correctly when contignous pages are not available, instead of
mysterious kernel crashes.
However, there still remains a minor problem. Since u-area is allocated
with PR_WAITOK,
https://nxr.netbsd.org/xref/src/sys/uvm/uvm_glue.c#uvm_uarea_alloc
pool subsystem assumes that uarea_poolpage_alloc() never return NULL.
KASSERT failures take place otherwise. We therefore need to add a new
flag, PR_MAYFAIL for example, to indicate that pa_alloc() may fail
even for PR_WAITOK.
Here's patch for uvm as well as pool subsystem:
http://www.netbsd.org/~rin/uarea_poolpage_20200221.patch
Note that for ibm4xx, we need only single page for u-area; PAGE_SIZE =
16KB is the same size of u-area for other powerpc processors. I will
change UPAGES from 2 to 1 for ibm4xx as a workaround, but this problem
still needs to be fixed at a fundamental level.
Home |
Main Index |
Thread Index |
Old Index