NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54880: -current hangs in mountroot

>Number:         54880
>Category:       kern
>Synopsis:       -current hangs in mountroot
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 20 18:35:00 +0000 2020
>Originator:     Michael van Elst
>Release:        NetBSD 9.99.39
System: NetBSD tazz 9.99.39 NetBSD 9.99.39 (GENERIC) #33: Mon Jan 20 16:34:50 UTC 2020 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64

When booting -current the system stops when trying to mount root.
I've added debug printfs and a call to db_stracktrace().

[  12.4556277] uvm_km_alloc(65536): no VM
[  12.7114329] vm_map_lock_try(0xffffffff81d79820) = false (busy=0x0)
[  12.7228386] uvm_map() at netbsd:uvm_map+0x6b
[  12.7343650] uvm_km_alloc() at netbsd:uvm_km_alloc+0xff
[  12.7463020] pool_grow() at netbsd:pool_grow+0x88
[  12.7585233] pool_get() at netbsd:pool_get+0xa8
[  12.7703830] allocbuf() at netbsd:allocbuf+0xe4
[  12.7822311] getblk() at netbsd:getblk+0x143
[  12.7940378] bio_doread() at netbsd:bio_doread+0x1d
[  12.8060855] bread() at netbsd:bread+0x18
[  12.8178598] lfs_mountfs() at netbsd:lfs_mountfs+0x9c
[  12.8301169] lfs_mountroot() at netbsd:lfs_mountroot+0x6b
[  12.8420919] vfs_mountroot() at netbsd:vfs_mountroot+0xf1
[  12.8540742] main() at netbsd:main+0x4c6

This is caused by vm_map_lock_try() calling rw_tryenter() which is
defective on amd64 without LOCKDEBUG. As a result uvm_km_alloc()
and thus pool_get fails and allocbuf() repeats this infinitely.

rw_tryenter() is implemented as assembler stub, here is the
writer case:

         * Writer: if the compare-and-set fails, don't bother retrying.
2:      movq    CPUVAR(CURLWP), %rcx
        xorq    %rax, %rax
        orq     $RW_WRITE_LOCKED, %rcx
        cmpxchgq %rcx, (%rdi)
        movl    $0, %eax
        setz    %al

The owner field, addressed by %rdi is atomically compared against zero
and if true overwritten with (curlwp | RW_WRITE_LOCKED).

However, without LOCKDEBUG, the owner field is initialized as RW_NODEBUG,
not zero. The check always fails and the new value is never written. The new
value would also lack the RW_NODEBUG flag.

The rw_enter() stub has the same flaw, but it only handles the first
check, which fails and then continues with the C version rw_vector_enter().
The C code handles the RW_NODEBUG case.

The error was introduced with rwlock.h 1.13, previously RW_NODEBUG was
set to zero when not compiling with LOCKDEBUG.


boot -current. The hangup occurs when the buffer pool needs to be grown
and other buffers cannot be freed.



Home | Main Index | Thread Index | Old Index