NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/53817: Random panics in vfs_mountroot()
>Number: 53817
>Category: kern
>Synopsis: Random panics in vfs_mountroot()
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 29 21:35:00 +0000 2018
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: aarch64
Machine: evbarm
>Description:
The automated evbarm-aarch64 test runs on lyta.netbsd.org have ended
with a panic in vfs_mountroot in 36 out of 98 runs:
lyta /bracket/evbarm-aarch64/results/2018 $ zgrep vfs_mountroot */test.log.gz
2018.10.26.22.10.15/test.log.gz:[ 2.1539321] fp ffffffc000979e00 vfs_mountroot() at ffffffc00043b524 netbsd:vfs_mountroot+0x214
2018.10.27.13.20.21/test.log.gz:[ 2.3229166] fp ffffffc000979e00 vfs_mountroot() at ffffffc00043b4f4 netbsd:vfs_mountroot+0x214
2018.10.28.00.44.37/test.log.gz:[ 2.1098840] fp ffffffc000979e00 vfs_mountroot() at ffffffc00043b4f4 netbsd:vfs_mountroot+0x214
[...]
2018.12.24.08.40.33/test.log.gz:[ 2.4557637] fp ffffffc000b8adf0 vfs_mountroot() at ffffffc0004581a4 netbsd:vfs_mountroot+0x214
2018.12.24.22.47.19/test.log.gz:[ 2.3375370] fp ffffffc000b8bdf0 vfs_mountroot() at ffffffc00045a0c4 netbsd:vfs_mountroot+0x214
2018.12.25.11.56.14/test.log.gz:[ 2.4641909] fp ffffffc000b8bdf0 vfs_mountroot() at ffffffc00045a0c4 netbsd:vfs_mountroot+0x214
2018.12.27.04.54.03/test.log.gz:[ 2.4717805] fp ffffffc000b8bdf0 vfs_mountroot() at ffffffc00045a054 netbsd:vfs_mountroot+0x214
Console output from the first recorded failure:
[ 2.0451998] mountroot: trying ffs...
[ 2.0561633] root file system type: ffs
[ 2.0561633] panic: Trap: Data Abort (EL1): Translation Fault L1 with read access for 0000000000000070: pc ffffffc00043b524: opcode f9403800: ldr x0, [x0,#112]
[ 2.0672672] cpu0: Begin traceback...
[ 2.0672672] trace fp ffffffc000979b50
[ 2.0773923] fp ffffffc000979b70 vpanic() at ffffffc0003efd00 netbsd:vpanic+0x190
[ 2.0773923] fp ffffffc000979bd0 panic() at ffffffc0003efdcc netbsd:panic+0x44
[ 2.0889589] fp ffffffc000979c60 data_abort_handler() at ffffffc00005fcf8 netbsd:data_abort_handler+0x480
[ 2.0889589] tf ffffffc000979cd0 el1_trap() at ffffffc00005d3f8 netbsd:el1_trap
[ 2.1014251] ---- trapframe 0xffffffc000979cd0 (304 bytes) ----
[ 2.1014251] pc=ffffffc00043b524, spsr=0000000060000005
[ 2.1014251] esr=0000000096000005, far=0000000000000070
[ 2.1114037] x0=0000000000000000, x1=ffff00005fab2bd0
[ 2.1114037] x2=0000000000000004, x3=00000000000000f0
[ 2.1114037] x4=ffffffc000820558, x5=0000000000000000
[ 2.1114037] x6=0000000000000004, x7=0000000000000001
[ 2.1225739] x8=0000000000000004, x9=ffffffc000979da0
[ 2.1225739] x10=ffff00005f9f9f58, x11=000000000000003f
[ 2.1225739] x12=fffffc00017e7e7c, x13=fffffc00017e7e80
[ 2.1225739] x14=ffffffffffffffe8, x15=ffff00005f9fa000
[ 2.1338631] x16=0000000000000021, x17=0000000000000083
[ 2.1338631] x18=0000000000001000, x19=0000000000000000
[ 2.1338631] x20=ffffffc000a1c000, x21=ffffffc000a8bce8
[ 2.1338631] x22=ffffffc000854a00, x23=ffffffc0006ef800
[ 2.1449961] x24=ffffffc000856000, x25=0000000000000000
[ 2.1449961] x26=ffffffc000a1c000, x27=ffffffc000971000
[ 2.1449961] x28=0000000000000000, fp=x29=ffffffc000979e00
[ 2.1449961] lr=x30=ffffffc00043b518, sp=ffffffc000979e00
[ 2.1449961] ------------------------------------------------
[ 2.1539321] fp ffffffc000979e00 vfs_mountroot() at ffffffc00043b524 netbsd:vfs_mountroot+0x214
[ 2.1539321] fp ffffffc000979e60 main() at ffffffc00050f3e4 netbsd:main+0x44c
[ 2.1539321] fp 0000000000000000 aarch64_start() at ffffffc00000183c netbsd:aarch64_start+0x103c
[ 2.1675298] cpu0: End traceback...
Stopped in pid 0.1 (system) at netbsd:cpu_Debugger+0x4: ret
Disassembling around the faulting instruction (0x214 is decimal 532):
0xffffffc000443824 <vfs_mountroot+508>: bl 0xffffffc000451210 <vref>
0xffffffc000443828 <vfs_mountroot+512>: ldr x0, [x21, #80]
0xffffffc00044382c <vfs_mountroot+516>: bl 0xffffffc00045d388 <VOP_UNLOCK>
0xffffffc000443830 <vfs_mountroot+520>: str xzr, [x22, #8]
0xffffffc000443834 <vfs_mountroot+524>: ldr x0, [x20, #2376]
0xffffffc000443838 <vfs_mountroot+528>: ldr x1, [x21, #80]
0xffffffc00044383c <vfs_mountroot+532>: ldr x0, [x0, #112]
0xffffffc000443840 <vfs_mountroot+536>: str x1, [x0]
0xffffffc000443844 <vfs_mountroot+540>: ldr x0, [x20, #2376]
0xffffffc000443848 <vfs_mountroot+544>: ldr x0, [x0, #112]
0xffffffc00044384c <vfs_mountroot+548>: ldr x0, [x0]
0xffffffc000443850 <vfs_mountroot+552>: bl 0xffffffc000451210 <vref>
This looks like the source line
/*
* Now that root is mounted, we can fixup initproc's CWD
* info. All other processes are kthreads, which merely
* share proc0's CWD info.
*/
initproc->p_cwdi->cwdi_cdir = rootvnode;
and indeed the offset #112 in the faulting instruction matches the offset
of the p_cwdi field in struct proc. Also, the faulting address is 0x70,
which is at a 112 bytes offset from a null pointer.
This looks like a race condition to me: main() calls fork1() to create
process 1 and then calls vfs_mountroot() which uses the global
variable initproc, but initproc is initialized in start_init() which
is executed by process 1, so it may or may not have been initialized
by the time it is used by vfs_mountroot().
I suspect the bug may have been introduced by:
revision 1.496
date: 2018-04-16 17:18:16 +0300; author: kamil; state: Exp; lines: +5 -3; commitid: YY8XhMArR8bEOFyA;
Set initproc inside start_init()
This allows us to stop using the rnewprocp argument in fork1(9).
The rnewprocp argument will be removed soon from the API, as it can cause
use-after-free scenarios.
No functional change intended.
Noted by <Mateusz Guzik>
Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html
Sponsored by <The NetBSD Foundation>
Although I have only seen this happen on evbarm-aarch64 under qemu,
the bug looks machine independent, so I'm filing it in category kern.
>How-To-Repeat:
>Fix:
Home |
Main Index |
Thread Index |
Old Index