NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-amd64/53155: OS wedges after <12h uptime when >2 bnx network interfaces in use
The following reply was made to PR port-amd64/53155; it has been noted by GNATS.
From: Havard Eidnes <he%NetBSD.org@localhost>
To: ozaki-r%netbsd.org@localhost
Cc: gnats-bugs%netbsd.org@localhost, port-amd64-maintainer%netbsd.org@localhost
Subject: Re: port-amd64/53155: OS wedges after <12h uptime when >2 bnx
network interfaces in use
Date: Thu, 17 May 2018 12:34:28 +0200 (CEST)
>> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
>> 7052 1 3 1 8020000 fffffe8220c58540 cron tstile
>> Wants fstrans_lock
>>
>> 9187 1 3 6 8020000 fffffe821faa60c0 expect xchicv
>> Holds fstrans_lock, in pserialize_perform, waits on condition variable
>> after doing xc_broadcast(XC_HIGHPRI, nullop)
>> Doing (roughly) pty_grant_slave -> genfs_revoke -> vfs_suspend ->
>> fstrans_setstate -> pserialize_perform -> xc_wait -> cv_wait
>
> This xcall requires that the softint of SOFTINT_SERIAL (softser/N)
> on all CPUs processes a callback of the xcall. If any of the softints
> get stuck for some reason, the xcall never finish.
>
> Could you show the stack trace of each softser/N? In particular softser/0
> looks running and is a suspect.
Getting the tracebacks of these LWPs appears to be somewhat
problematic; I don't get much useful info from crash:
0 78 3 3 200 fffffe810f2e9740 mfi0 mfi0
0 69 3 5 200 fffffe810f20d6e0 atabus1 atath
0 68 3 4 200 fffffe810f20db00 atabus0 atath
0 66 3 0 200 fffffe810ed3b6c0 scsibus0 sccomp
0 65 3 0 200 fffffe810ed3bae0 usbtask-dr usbtsk
0 64 3 0 200 fffffe810ecaa280 usbtask-hc usbtsk
0 63 3 0 200 fffffe810ecaa6a0 bnx3 bnx3
0 62 3 0 200 fffffe810ecaaac0 bnx2 bnx2
0 61 3 0 200 fffffe810eb61260 bnx1 bnx1
0 60 3 7 200 fffffe810eb61680 bnx0 bnx0
0 > 59 7 2 200 fffffe810eb61aa0 ipmi
0 58 3 7 200 fffffe810ea7c240 xcall/7 xcall
0 57 1 7 200 fffffe810ea7c660 softser/7
0 56 1 7 200 fffffe810ea7ca80 softclk/7
0 55 1 7 200 fffffe810ea5d220 softbio/7
0 54 1 7 200 fffffe810ea5d640 softnet/7
0 > 53 7 7 201 fffffe810ea5da60 idle/7
0 52 3 6 200 fffffe810ea36200 xcall/6 xcall
0 51 1 6 200 fffffe810ea36620 softser/6
0 50 1 6 200 fffffe810ea36a40 softclk/6
0 49 1 6 200 fffffe810ea0f1e0 softbio/6
0 48 1 6 200 fffffe810ea0f600 softnet/6
0 47 1 6 201 fffffe810ea0fa20 idle/6
0 46 3 5 200 fffffe810e9e01c0 xcall/5 xcall
0 45 1 5 200 fffffe810e9e05e0 softser/5
0 44 1 5 200 fffffe810e9e0a00 softclk/5
0 43 1 5 200 fffffe810e9d11a0 softbio/5
0 42 1 5 200 fffffe810e9d15c0 softnet/5
0 > 41 7 5 201 fffffe810e9d19e0 idle/5
0 40 3 4 200 fffffe810e9a2180 xcall/4 xcall
0 39 1 4 200 fffffe810e9a25a0 softser/4
0 38 1 4 200 fffffe810e9a29c0 softclk/4
0 37 1 4 200 fffffe810e973160 softbio/4
0 36 1 4 200 fffffe810e973580 softnet/4
0 > 35 7 4 201 fffffe810e9739a0 idle/4
0 34 3 3 200 fffffe810e94c140 xcall/3 xcall
0 33 1 3 200 fffffe810e94c560 softser/3
0 32 1 3 200 fffffe810e94c980 softclk/3
0 31 1 3 200 fffffe810e93d120 softbio/3
0 30 1 3 200 fffffe810e93d540 softnet/3
0 > 29 7 3 201 fffffe810e93d960 idle/3
0 28 3 2 200 fffffe810e906100 xcall/2 xcall
0 27 1 2 200 fffffe810e906520 softser/2
0 26 1 2 200 fffffe810e906940 softclk/2
0 25 1 2 200 fffffe810e8ef0e0 softbio/2
0 24 1 2 200 fffffe810e8ef500 softnet/2
0 23 1 2 201 fffffe810e8ef920 idle/2
0 22 3 1 200 fffffe810e8c50c0 xcall/1 xcall
0 21 1 1 200 fffffe810e8c54e0 softser/1
0 20 1 1 200 fffffe810e8c5900 softclk/1
0 19 1 1 200 fffffe810e8b10a0 softbio/1
0 18 1 1 200 fffffe810e8b14c0 softnet/1
0 > 17 7 1 201 fffffe810e8b18e0 idle/1
0 16 3 0 200 fffffe822de92080 lnxsyswq lnxsyswq
0 15 3 0 200 fffffe822de924a0 sysmon smtaskq
0 14 3 2 200 fffffe822de928c0 pmfsuspend pmfsuspend
0 13 3 7 200 fffffe822e2b1060 pmfevent pmfevent
0 12 3 0 200 fffffe822e2b1480 sopendfree sopendfr
0 11 3 2 200 fffffe822e2b18a0 nfssilly nfssilly
0 10 3 2 200 fffffe822f6d0040 cachegc cachegc
0 9 3 3 200 fffffe822f6d0460 vdrain vdrain
0 8 3 0 200 fffffe822f6d0880 modunload mod_unld
0 7 3 0 200 fffffe822f6eb020 xcall/0 xcall
0 > 6 7 0 200 fffffe822f6eb440 softser/0
0 > 5 7 0 200 fffffe822f6eb860 softclk/0
0 4 1 0 200 fffffe822f707000 softbio/0
0 3 1 0 200 fffffe822f707420 softnet/0
0 2 1 0 201 fffffe822f707840 idle/0
0 1 3 4 200 ffffffff81481b20 swapper uvm
crash> trace/a fffffe810ea7c660
trace: pid 0 lid 57 at 0xffff80008f7e8e80
lockdebug_wantlock() at lockdebug_wantlock+0xf9
_KERNEL_OPT_WSDISPLAY_SCROLLBACK_LINES() at _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LIN
ES+0x173
crash> trace/a fffffe810ea36620
trace: pid 0 lid 51 at 0x0
crash> trace/a fffffe810e9e05e0
trace: pid 0 lid 45 at 0x0
0:
crash> trace/a fffffe810e9a25a0
trace: pid 0 lid 39 at 0x0
0:
crash> trace/a fffffe810e94c560
trace: pid 0 lid 33 at 0x0
0:
crash> trace/a fffffe810e906520
trace: pid 0 lid 27 at 0xffff80008f72ae80
lockdebug_wantlock() at lockdebug_wantlock+0xf9
_KERNEL_OPT_WSDISPLAY_SCROLLBACK_LINES() at _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LIN
ES+0x173
crash> trace/a fffffe810e8c54e0
trace: pid 0 lid 21 at 0x0
crash> trace/a fffffe822f6eb440
trace: pid 0 lid 6 at 0x0
crash>
GDB doesn't look a lot better either:
(gdb) kvm proc 0xfffffe822f6eb440
0x0000000000000005 in ?? ()
(gdb) where
#0 0x0000000000000005 in ?? ()
#1 0xfffffe810ed5be08 in ?? ()
#2 0xffffffff809a2791 in percpu_getref (pc=0xffff80008f69bfe0)
at /usr/src/sys/kern/subr_percpu.c:293
#3 0x0000000000000000 in ?? ()
(gdb)
(gdb) x/i 0xfffffe810ed5be08
0xfffffe810ed5be08: (bad)
(gdb)
(gdb) up
#1 0xfffffe810ed5be08 in ?? ()
(gdb) up
#2 0xffffffff809a2791 in percpu_getref (pc=0xffff80008f69bfe0)
at /usr/src/sys/kern/subr_percpu.c:293
293 kpreempt_disable();
(gdb)
(gdb) list
288
289 void *
290 percpu_getref(percpu_t *pc)
291 {
292
293 kpreempt_disable();
294 return percpu_getptr_remote(pc, curcpu());
295 }
296
297 /*
(gdb)
Regards,
- Havard
Home |
Main Index |
Thread Index |
Old Index