Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Call for testing: New kernel heartbeat(9) checks
On 7/7/23 22:11, Taylor R Campbell wrote:
FYI: In 10.99.5, I just added a new kernel diagnostic subsystem called
heartbeat(9) that will make the system crash rather than hang when
CPUs are stuck in certain ways that hardware watchdog timers can't
detect (or on systems without hardware watchdog timers) > [...]
This is a NetBSD/amd64 guest with 2 virtual CPUs, running on VMware:
1. cpuctl offline 0
sleep 20
cpuctl online 0
No panics.
2. cpuctl offline 1
sleep 20
cpuctl online 1
No panics.
3. cpuctl offline 0
sysctl -w kern.heartbeat.max_period=5
sleep 10
sysctl -w kern.heartbeat.max_period=0
sleep 10
sysctl -w kern.heartbeat.max_period=15
sleep 20
cpuctl online 0
No panics.
4. sysctl -w debug.crashme_enable=1
sysctl -w debug.crashme.spl_spinout=2 # IPL_SOFTCLOCK
# verify system panics after 15sec
Changing spl_spinout hangs sysctl. The kernel panics after 15 seconds:
Jul 8 22:16:13 netbsd-current /netbsd: [ 231.3581695]
crashme_sysctl_forwarder:208: invoking "spl_spinout" (infinite loop at
raised spl)
Jul 8 22:16:13 netbsd-current /netbsd: [ 231.3581695]
crashme_spl_spinout: raising ipl to 2
Jul 8 22:16:13 netbsd-current /netbsd: [ 231.3581695]
crashme_spl_spinout: raised ipl to 2, s=0
Jul 8 22:16:13 netbsd-current /netbsd: [ 247.0084882] cpu0: found cpu1
heart stopped beating after 16 seconds
Jul 8 22:16:13 netbsd-current /netbsd: [ 247.0084882] panic: cpu1[1743
sysctl]: heart stopped beating
5. sysctl -w debug.crashme_enable=1
sysctl -w debug.crashme.spl_spinout=6 # IPL_SCHED
# verify system panics after 15sec
Like 4 but it panics with a different message:
Jul 8 22:23:24 netbsd-current /netbsd: [ 411.0078445] panic: cpu0:
softints stuck for 16 seconds
6. cpuctl offline 0
sysctl -w debug.crashme_enable=1
sysctl -w debug.crashme.spl_spinout=2 # IPL_SOFTCLOCK
# verify system panics after 15sec
It panics after 15 seconds:
Jul 8 22:27:04 netbsd-current /netbsd: [ 200.0060379] panic: cpu1:
softints stuck for 16 seconds
7. cpuctl offline 0
sysctl -w debug.crashme_enable=1
sysctl -w debug.crashme.spl_spinout=5 # IPL_VM
# verify system panics after 15sec
It panics after 15 seconds:
Jul 8 22:29:45 netbsd-current /netbsd: [ 142.0029650] panic: cpu1:
softints stuck for 16 seconds
Home |
Main Index |
Thread Index |
Old Index