NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59870: kernel lock runtime diagnostics are difficult
>Number: 59870
>Category: kern
>Synopsis: kernel lock runtime diagnostics are difficult
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 31 04:20:00 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
The NetBSD Locker, Inc.
>Environment:
>Description:
Sometimes the legacy kernel lock is held for an unreasonably
long time.
How do you tell who holds it when this happens? If you're
lucky, you can enter ddb and find the threads running on each
cpuN with `ps' and switch to `mach cpu N' and run `bt' to find
a code path that obviously holds the kernel lock.
If you're not lucky, you have a heartbeat panic because some
softint tried to take the kernel lock and waited too long for
it and crash dumps failed because suspendsched has
KASSERT(!cpu_intr_p()) and the heartbeat panic happens within
an interrupt handler, as in:
https://mail-index.NetBSD.org/current-users/2025/12/27/msg047183.html
If you have enabled LOCKDEBUG, and the kernel lock is held for
more than 10sec, you get a kernel lock spinout and an IPI is
sent to the hogging CPU to geta stack trace. But since
autoconf(9) runs kernel-locked, loading a module for a driver
can trigger this panic.
It's also annoying when the kernel lock is held for enough time
to make the system flaky (partly because, e.g., wscons(4) and
pckbport(4) run with it, and so do some network drivers like
iwm(4)), but not enough to trigger other diagnostics. However,
attempts to dtrace the kernel_lock function, along the lines of
https://mail-index.netbsd.org/tech-kern/2022/10/30/msg028499.html,
only show that it was taken in sleepq_block because something
that held the kernel lock slept and then woke up again.
>How-To-Repeat:
- chase bad interactive system latency due to kernel lock hogs
- try to diagnose panics like
https://mail-index.netbsd.org/current-users/2025/12/27/msg047183.html
>Fix:
1. Enable the logic to provoke an IPI to dump a stack trace
_without_ LOCKDEBUG.
2. Pass a cookie across the unlock/sleep/relock logic so that
dtrace can tell on whose behalf the relock happened.
Other ideas welcome!
Home |
Main Index |
Thread Index |
Old Index