NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/54880: -current hangs in mountroot
>Number: 54880
>Category: kern
>Synopsis: -current hangs in mountroot
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 20 18:35:00 +0000 2020
>Originator: Michael van Elst
>Release: NetBSD 9.99.39
>Organization:
>Environment:
System: NetBSD tazz 9.99.39 NetBSD 9.99.39 (GENERIC) #33: Mon Jan 20 16:34:50 UTC 2020 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
When booting -current the system stops when trying to mount root.
I've added debug printfs and a call to db_stracktrace().
[ 12.4556277] uvm_km_alloc(65536): no VM
[ 12.7114329] vm_map_lock_try(0xffffffff81d79820) = false (busy=0x0)
[ 12.7228386] uvm_map() at netbsd:uvm_map+0x6b
[ 12.7343650] uvm_km_alloc() at netbsd:uvm_km_alloc+0xff
[ 12.7463020] pool_grow() at netbsd:pool_grow+0x88
[ 12.7585233] pool_get() at netbsd:pool_get+0xa8
[ 12.7703830] allocbuf() at netbsd:allocbuf+0xe4
[ 12.7822311] getblk() at netbsd:getblk+0x143
[ 12.7940378] bio_doread() at netbsd:bio_doread+0x1d
[ 12.8060855] bread() at netbsd:bread+0x18
[ 12.8178598] lfs_mountfs() at netbsd:lfs_mountfs+0x9c
[ 12.8301169] lfs_mountroot() at netbsd:lfs_mountroot+0x6b
[ 12.8420919] vfs_mountroot() at netbsd:vfs_mountroot+0xf1
[ 12.8540742] main() at netbsd:main+0x4c6
This is caused by vm_map_lock_try() calling rw_tryenter() which is
defective on amd64 without LOCKDEBUG. As a result uvm_km_alloc()
and thus pool_get fails and allocbuf() repeats this infinitely.
rw_tryenter() is implemented as assembler stub, here is the
writer case:
/*
* Writer: if the compare-and-set fails, don't bother retrying.
*/
2: movq CPUVAR(CURLWP), %rcx
xorq %rax, %rax
orq $RW_WRITE_LOCKED, %rcx
LOCK
cmpxchgq %rcx, (%rdi)
movl $0, %eax
setz %al
The owner field, addressed by %rdi is atomically compared against zero
and if true overwritten with (curlwp | RW_WRITE_LOCKED).
However, without LOCKDEBUG, the owner field is initialized as RW_NODEBUG,
not zero. The check always fails and the new value is never written. The new
value would also lack the RW_NODEBUG flag.
The rw_enter() stub has the same flaw, but it only handles the first
check, which fails and then continues with the C version rw_vector_enter().
The C code handles the RW_NODEBUG case.
The error was introduced with rwlock.h 1.13, previously RW_NODEBUG was
set to zero when not compiling with LOCKDEBUG.
>How-To-Repeat:
boot -current. The hangup occurs when the buffer pool needs to be grown
and other buffers cannot be freed.
>Fix:
>Unformatted:
Home |
Main Index |
Thread Index |
Old Index