NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/60029: panic: cpu0: softints stuck for 16 seconds



The following reply was made to PR kern/60029; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Thomas Klausner <wiz%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost, chs%NetBSD.org@localhost
Subject: Re: kern/60029: panic: cpu0: softints stuck for 16 seconds
Date: Sun, 22 Feb 2026 19:24:59 +0000

 > Date: Sun, 22 Feb 2026 14:02:33 +0100
 > From: Thomas Klausner <wiz%netbsd.org@localhost>
 > 
 > 0    >  935 7  16     40200   ffff8fb6be107400           pgdaemon
 > [...]
 > 0    >    4 7   0       200   ffff8fc60b25f800          softbio/0
 
 These two look suspicious.
 
 Now, I think the stack traces won't be reliable because the threads in
 question are running on the CPU (so we really want `mach cpu N' and
 `bt', but PR bin/58010: crash(8) doesn't support `mach cpu N' to
 examine registers/stack of other CPUs), but just in case, here's the
 stack trace for softbio/0:
 
 > crash> bt/a ffff8fc60b25f800
 > trace: pid 0 lid 4 at 0xffffa72469e83fa0
 > uvm_aio_aiodone() at uvm_aio_aiodone+0xbe
 > dkiodone() at dkiodone+0xa8
 > lddone() at lddone+0xf
 > nvme_q_complete() at nvme_q_complete+0xf2
 > softint_dispatch() at softint_dispatch+0x112
 
 If you have netbsd.gdb, can you get output of `info line
 *(uvm_aio_aiodone+0xbe)' in gdb from it?
 
 And can you get the stack trace for pgdaemon too?
 
 crash> bt/a ffff8fb6be107400
 
 
 My guess is the softbio thread might be waiting for this lock:
 
     528 	if (write && (bp->b_cflags & BC_AGE) != 0) {
  -> 529 		mutex_enter(bp->b_objlock);
     530 		vwakeup(bp);
     531 		mutex_exit(bp->b_objlock);
     532 	}
 
 https://nxr.netbsd.org/xref/src/sys/uvm/uvm_pager.c?r=1.131#529
 
 This is probably either buffer_lock or vp->v_interlock for some vnode.
 In case it's buffer_lock, can you show this?
 
 crash> x/Lx buffer_lock
 
 (We should really record in struct lwp or struct cpu_info what lock a
 thread on the CPU is spinning for, so that we can just find who holds
 it right now.)
 
 
 Since they're not _sleeping_ on a wait-channel, whoever holds the lock
 (whether it's buffer_lock or vp->v_interlock) is probably on running
 on another CPU.  That narrows it down to:
 
 PID     LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 20857>20857 7  17   8060000   ffff8fa8dbeac000                moc
 2482 > 2482 7  23   8060000   ffff8fb50065c800                moc
 26437>26437 7   9   8060000   ffff8fb50065c000                moc
 12057>12057 7  31   8020000   ffff8fbf931ab000                moc
 18565>18565 7  21   8060000   ffff8fc129ed9c00                moc
 12544>12544 7  22   8060000   ffff8fc129ed9000                moc
 13508>13508 7   2   8060000   ffff8fb908ee8800                moc
 3522 > 3522 7  20   8020000   ffff8fc1d885ac00                moc
 23657>23657 7   3   8020000   ffff8fb90950c800                moc
 3031 > 3031 7  28   8060000   ffff8fb10d2b3400                moc
 13759>13759 7   5   8020000   ffff8faac5fec000                moc
 25301>25301 7  15   8060000   ffff8fc2d6fb1400                moc
 13934>13934 7  14   8060000   ffff8fb8f1b68800                moc
 9619 > 9619 7  11   8060000   ffff8fb8f1b68000                moc
 11474>11474 7  13   8020000   ffff8fb7c17d2c00                moc
 23339>23339 7  29   8060000   ffff8fb7b9e4cc00                moc
 5510 > 5510 7  10   8060000   ffff8fc36cf3ec00                moc
 2567 > 2567 7  27   8020000   ffff8fc331b27c00                moc
 29924>29924 7   4   8020000   ffff8fc561094c00                moc
 20557>20557 7  26   8020000   ffff8fc561094800                moc
 23674>23674 7   6   8060000   ffff8fc38a557000                moc
 27935>27935 7  25   8060000   ffff8faaa08c1000                moc
 19811>19811 7   7   8060000   ffff8fbdc9009800                moc
 14383>14383 7  18   8060000   ffff8fbac25c2000                moc
 19788>19788 7  19   8060000   ffff8fc08f854c00                moc
 11290>11290 7  24   8060000   ffff8fb0ea2d1800                moc
 28134>28134 7  30   8020000   ffff8fc2010df800                moc
 27882>27882 7   8   8060000   ffff8fbd4892ac00                moc
 21709>21709 7  12   8060000   ffff8fb7cd8e5400                moc
 29702>29702 7   1   8020000   ffff8fb90a69dc00                moc
 0    >  935 7  16     40200   ffff8fb6be107400           pgdaemon
 0    >    4 7   0       200   ffff8fc60b25f800          softbio/0
 
 Could try getting the stack traces for the moc processes but they're
 probably in userland without any locks held...
 


Home | Main Index | Thread Index | Old Index