Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

heartbeat panic by heavy traffic



Hi.

I can see the following heartbeat panic when a machine is forwarding
heavy short packets:

[ 745.0068385] cpu14: found cpu15 heart stopped beating after 16 seconds
[ 745.0068385] panic: cpu15: softints stuck for 16 seconds
[ 745.0168386] cpu15: Begin traceback...
[ 745.0168386] cpu14: found cpu15 heart stopped beating after 16 seconds
[ 745.0268387] vpanic() at cpu14: found cpu15 heart stopped beating after 16 seconds
[ 745.0268387] netbsd:vpanic+0x173
[ 745.0368390] cpu14: found cpu15 heart stopped beating after 16 seconds
[ 745.0368390] panic() at cpu14: found cpu15 heart stopped beating after 16 seconds
[ 745.0468390] netbsd:panic+0x3c
[ 745.0468390] heartbeat() at netbsd:heartbeat+0x353
[ 745.0568392] hardclock() at netbsd:hardclock+0x8b
[ 745.0668393] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e
[ 745.0668393] --- interrupt ---
[ 745.0768393] psref_release() at netbsd:psref_release+0x83
[ 745.0768393] ipintr() at netbsd:ipintr+0xef
[ 745.0868396] softint_dispatch() at netbsd:softint_dispatch+0x103
[ 745.0868396] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff8288589fc0f0
[ 745.0968395] Xsoftintr() at netbsd:Xsoftintr+0x4c
[ 745.0968395] --- interrupt ---
[ 745.1068397] f9faeac0f5baeac4:
[ 745.1068397] cpu15: End traceback...
[ 745.1068397] fatal breakpoint trap in supervisor mode
[ 745.1168399] trap type 1 code 0 rip 0xffffffff80235425 cs 0x8 rflags 0x202 cr2 0 ilevel 0x7 rsp 0xffff8288589fbc68
[ 745.1268401] curlwp 0xffffd8070facf6c0 pid 0.175 lowest kstack 0xffff8288589f72c0
Stopped in pid 0.175 (system) at        netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x173
panic() at netbsd:panic+0x3c
heartbeat() at netbsd:heartbeat+0x353
hardclock() at netbsd:hardclock+0x8b
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e
--- interrupt ---
psref_release() at netbsd:psref_release+0x83
ipintr() at netbsd:ipintr+0xef
softint_dispatch() at netbsd:softint_dispatch+0x103
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff8288589fc0f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
(snip)

wm and ixg have hw.{wm,ixg}N.txrx_workqueue sysctl.
If we set them from 0 to 1, we can avoid the panic. Many drivers
have no way to avoid the problem.

I think it would be good to change the default behavior from
panic to something others because GENERIC kernel enables HEARTBEAT.
by default. One of idea is to print warning message at sufficient intervals.

 Regards.

-- 
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index