[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/54994: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
>Synopsis: Critical bug in uarea_poolpage_alloc() for archs with __HAVE_CPU_UAREA_ROUTINES
>Arrival-Date: Fri Feb 21 11:15:00 +0000 2020
>Originator: Rin Okuyama
>Release: 9.99.47 and netbsd-9 at least
Department of Physics, Meiji University
NetBSD obs266 9.99.47 NetBSD 9.99.47 (OBS266) #154: Thu Feb 20 17:28:10 JST 2020 rin@latipes:/build/src/sys/arch/evbppc/compile/OBS266 evbppc
For archs with __HAVE_CPU_UAREA_ROUTINES, i.e., alpha, mips, powerpc,
and riscv, uarea_poolpage_alloc() falls back to uvm_km_alloc() if
This behavior is incorrect.
For these archs, cpu_uarea_alloc() allocates direct-mapped physically
contignous pages for u-area by using uvm_pglistalloc():
Here, uvm_pglistalloc() may give up allocating memory even if waitok:
This is because there's no way to wake up a process when contignous
pages become available. In this case, cpu_uarea_alloc() returns NULL,
and uvm_km_alloc() is used instead. However, pages allocated by
uvm_km_alloc() are not physically contignous, of course.
Since low-level routines for these archs assume physically contignous
u-area, this results in a catastrophe; pages happens to follow the 1st
page of u-area will be randomly overwritten.
I found this is the cause of random kernel crashes on an ibm4xx box
under heavy load, which has small memory and large page size.
Fork a lot of processes/threads on above mentioned archs. For my
ibm4xx box with 128MB memory, /usr/tests/lib/librumpclient/h_execthr
causes kernel crash almost always in multi-user mode.
Stop uarea_poolpage_alloc() to fall back into uvm_km_alloc(), and let
it return NULL if cpu_uarea_alloc() fails. Then, fork(2) fails with
ENOMEM correctly when contignous pages are not available, instead of
mysterious kernel crashes.
However, there still remains a minor problem. Since u-area is allocated
pool subsystem assumes that uarea_poolpage_alloc() never return NULL.
KASSERT failures take place otherwise. We therefore need to add a new
flag, PR_MAYFAIL for example, to indicate that pa_alloc() may fail
even for PR_WAITOK.
Here's patch for uvm as well as pool subsystem:
Note that for ibm4xx, we need only single page for u-area; PAGE_SIZE =
16KB is the same size of u-area for other powerpc processors. I will
change UPAGES from 2 to 1 for ibm4xx as a workaround, but this problem
still needs to be fixed at a fundamental level.
Main Index |
Thread Index |