NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/53155: OS wedges after <12h uptime when >2 bnx network interfaces in use



The following reply was made to PR port-amd64/53155; it has been noted by GNATS.

From: Havard Eidnes <he%NetBSD.org@localhost>
To: ozaki-r%netbsd.org@localhost
Cc: gnats-bugs%netbsd.org@localhost, port-amd64-maintainer%netbsd.org@localhost
Subject: Re: port-amd64/53155: OS wedges after <12h uptime when >2 bnx
 network interfaces in use
Date: Thu, 17 May 2018 12:34:28 +0200 (CEST)

 >>    PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 >>    7052     1 3   1   8020000   fffffe8220c58540               cron tstile
 >>      Wants fstrans_lock
 >>
 >>    9187     1 3   6   8020000   fffffe821faa60c0             expect xchicv
 >>      Holds fstrans_lock, in pserialize_perform, waits on condition variable
 >>        after doing xc_broadcast(XC_HIGHPRI, nullop)
 >>      Doing (roughly) pty_grant_slave -> genfs_revoke -> vfs_suspend ->
 >>        fstrans_setstate -> pserialize_perform -> xc_wait -> cv_wait
 >
 > This xcall requires that the softint of SOFTINT_SERIAL (softser/N)
 > on all CPUs processes a callback of the xcall. If any of the softints
 > get stuck for some reason, the xcall never finish.
 >
 > Could you show the stack trace of each softser/N? In particular softser/0
 > looks running and is a suspect.
 
 Getting the tracebacks of these LWPs appears to be somewhat
 problematic; I don't get much useful info from crash:
 
 0       78 3   3       200   fffffe810f2e9740               mfi0 mfi0
 0       69 3   5       200   fffffe810f20d6e0            atabus1 atath
 0       68 3   4       200   fffffe810f20db00            atabus0 atath
 0       66 3   0       200   fffffe810ed3b6c0           scsibus0 sccomp
 0       65 3   0       200   fffffe810ed3bae0         usbtask-dr usbtsk
 0       64 3   0       200   fffffe810ecaa280         usbtask-hc usbtsk
 0       63 3   0       200   fffffe810ecaa6a0               bnx3 bnx3
 0       62 3   0       200   fffffe810ecaaac0               bnx2 bnx2
 0       61 3   0       200   fffffe810eb61260               bnx1 bnx1
 0       60 3   7       200   fffffe810eb61680               bnx0 bnx0
 0    >  59 7   2       200   fffffe810eb61aa0               ipmi
 0       58 3   7       200   fffffe810ea7c240            xcall/7 xcall
 0       57 1   7       200   fffffe810ea7c660          softser/7
 0       56 1   7       200   fffffe810ea7ca80          softclk/7
 0       55 1   7       200   fffffe810ea5d220          softbio/7
 0       54 1   7       200   fffffe810ea5d640          softnet/7
 0    >  53 7   7       201   fffffe810ea5da60             idle/7
 0       52 3   6       200   fffffe810ea36200            xcall/6 xcall
 0       51 1   6       200   fffffe810ea36620          softser/6
 0       50 1   6       200   fffffe810ea36a40          softclk/6
 0       49 1   6       200   fffffe810ea0f1e0          softbio/6
 0       48 1   6       200   fffffe810ea0f600          softnet/6
 0       47 1   6       201   fffffe810ea0fa20             idle/6
 0       46 3   5       200   fffffe810e9e01c0            xcall/5 xcall
 0       45 1   5       200   fffffe810e9e05e0          softser/5
 0       44 1   5       200   fffffe810e9e0a00          softclk/5
 0       43 1   5       200   fffffe810e9d11a0          softbio/5
 0       42 1   5       200   fffffe810e9d15c0          softnet/5
 0    >  41 7   5       201   fffffe810e9d19e0             idle/5
 0       40 3   4       200   fffffe810e9a2180            xcall/4 xcall
 0       39 1   4       200   fffffe810e9a25a0          softser/4
 0       38 1   4       200   fffffe810e9a29c0          softclk/4
 0       37 1   4       200   fffffe810e973160          softbio/4
 0       36 1   4       200   fffffe810e973580          softnet/4
 0    >  35 7   4       201   fffffe810e9739a0             idle/4
 0       34 3   3       200   fffffe810e94c140            xcall/3 xcall
 0       33 1   3       200   fffffe810e94c560          softser/3
 0       32 1   3       200   fffffe810e94c980          softclk/3
 0       31 1   3       200   fffffe810e93d120          softbio/3
 0       30 1   3       200   fffffe810e93d540          softnet/3
 0    >  29 7   3       201   fffffe810e93d960             idle/3
 0       28 3   2       200   fffffe810e906100            xcall/2 xcall
 0       27 1   2       200   fffffe810e906520          softser/2
 0       26 1   2       200   fffffe810e906940          softclk/2
 0       25 1   2       200   fffffe810e8ef0e0          softbio/2
 0       24 1   2       200   fffffe810e8ef500          softnet/2
 0       23 1   2       201   fffffe810e8ef920             idle/2
 0       22 3   1       200   fffffe810e8c50c0            xcall/1 xcall
 0       21 1   1       200   fffffe810e8c54e0          softser/1
 0       20 1   1       200   fffffe810e8c5900          softclk/1
 0       19 1   1       200   fffffe810e8b10a0          softbio/1
 0       18 1   1       200   fffffe810e8b14c0          softnet/1
 0    >  17 7   1       201   fffffe810e8b18e0             idle/1
 0       16 3   0       200   fffffe822de92080           lnxsyswq lnxsyswq
 0       15 3   0       200   fffffe822de924a0             sysmon smtaskq
 0       14 3   2       200   fffffe822de928c0         pmfsuspend pmfsuspend
 0       13 3   7       200   fffffe822e2b1060           pmfevent pmfevent
 0       12 3   0       200   fffffe822e2b1480         sopendfree sopendfr
 0       11 3   2       200   fffffe822e2b18a0           nfssilly nfssilly
 0       10 3   2       200   fffffe822f6d0040            cachegc cachegc
 0        9 3   3       200   fffffe822f6d0460             vdrain vdrain
 0        8 3   0       200   fffffe822f6d0880          modunload mod_unld
 0        7 3   0       200   fffffe822f6eb020            xcall/0 xcall
 0    >   6 7   0       200   fffffe822f6eb440          softser/0
 0    >   5 7   0       200   fffffe822f6eb860          softclk/0
 0        4 1   0       200   fffffe822f707000          softbio/0
 0        3 1   0       200   fffffe822f707420          softnet/0
 0        2 1   0       201   fffffe822f707840             idle/0
 0        1 3   4       200   ffffffff81481b20            swapper uvm
 crash> trace/a fffffe810ea7c660
 trace: pid 0 lid 57 at 0xffff80008f7e8e80
 lockdebug_wantlock() at lockdebug_wantlock+0xf9
 _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LINES() at _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LIN
 ES+0x173
 crash> trace/a fffffe810ea36620
 trace: pid 0 lid 51 at 0x0
 crash> trace/a fffffe810e9e05e0
 trace: pid 0 lid 45 at 0x0
 0:
 crash> trace/a fffffe810e9a25a0
 trace: pid 0 lid 39 at 0x0
 0:
 crash> trace/a fffffe810e94c560
 trace: pid 0 lid 33 at 0x0
 0:
 crash> trace/a fffffe810e906520
 trace: pid 0 lid 27 at 0xffff80008f72ae80
 lockdebug_wantlock() at lockdebug_wantlock+0xf9
 _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LINES() at _KERNEL_OPT_WSDISPLAY_SCROLLBACK_LIN
 ES+0x173
 crash> trace/a fffffe810e8c54e0
 trace: pid 0 lid 21 at 0x0
 crash> trace/a fffffe822f6eb440
 trace: pid 0 lid 6 at 0x0
 crash> 
 
 GDB doesn't look a lot better either:
 
 (gdb) kvm proc 0xfffffe822f6eb440
 0x0000000000000005 in ?? ()
 (gdb) where
 #0  0x0000000000000005 in ?? ()
 #1  0xfffffe810ed5be08 in ?? ()
 #2  0xffffffff809a2791 in percpu_getref (pc=0xffff80008f69bfe0)
     at /usr/src/sys/kern/subr_percpu.c:293
 #3  0x0000000000000000 in ?? ()
 (gdb) 
 (gdb) x/i 0xfffffe810ed5be08
    0xfffffe810ed5be08:  (bad)  
 (gdb) 
 (gdb) up
 #1  0xfffffe810ed5be08 in ?? ()
 (gdb) up
 #2  0xffffffff809a2791 in percpu_getref (pc=0xffff80008f69bfe0)
     at /usr/src/sys/kern/subr_percpu.c:293
 293             kpreempt_disable();
 (gdb) 
 (gdb) list
 288
 289     void *
 290     percpu_getref(percpu_t *pc)
 291     {
 292
 293             kpreempt_disable();
 294             return percpu_getptr_remote(pc, curcpu());
 295     }
 296
 297     /*
 (gdb) 
 
 Regards,
 
 - Havard
 


Home | Main Index | Thread Index | Old Index