NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53814: wm0 device timeout in netbsd 7.1



Hi Team,

Thanks for your patience and assistance on this case.
We've not removed "wm_pkt_stats" and "wm_print_stats" in NetBSD 7.1 and we use WM_T_I354 chip type.
I've attached the output you've requested. I've taken the kernel live core last time when we faced this issue and the WM PHY was active. 
wm_reset doesn't help in to recover the issue state and device timeout log keeps on piling after wm_reset.
Please let me know if any further details is needed.

Regards,
Aravind M.

On Mon, Dec 31, 2018 at 2:40 PM Aravind M <aravind.ss1094%gmail.com@localhost> wrote:
The following reply was made to PR kern/53814; it has been noted by GNATS.

From: Aravind M <aravind.ss1094%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1
Date: Mon, 31 Dec 2018 14:35:05 +0530

 --000000000000331cd7057e4db69c
 Content-Type: text/plain; charset="UTF-8"

 Thanks for your assistance.
 I'll add those options suggested and will get back on this.

 Regards,
 Aravind Mani

 On Thu 27 Dec, 2018, 9:10 AM Masanobu SAITOH <msaitoh%execsw.org@localhost wrote:

 > The following reply was made to PR kern/53814; it has been noted by GNATS.
 >
 > From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
 > To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
 >  gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
 > Cc: msaitoh%execsw.org@localhost
 > Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1
 > Date: Thu, 27 Dec 2018 12:38:45 +0900
 >
 >  On 2018/12/26 15:50, aravind.ss1094%gmail.com@localhost wrote:
 >  >> Number:         53814
 >  >> Category:       kern
 >  >> Synopsis:       wm0 device timeout in netbsd 7.1
 >  >> Confidential:   no
 >  >> Severity:       serious
 >  >> Priority:       medium
 >  >> Responsible:    kern-bug-people
 >  >> State:          open
 >  >> Class:          sw-bug
 >  >> Submitter-Id:   net
 >  >> Arrival-Date:   Wed Dec 26 06:50:00 +0000 2018
 >  >> Originator:     Aravind Mani
 >  >> Release:        netbsd 7.1
 >  >> Organization:
 >  > private organization
 >  >> Environment:
 >  > chip type: I354
 >  >> Description:
 >  > We use WM_T_I354 chip type.When we reload continuously,we could able to
 > observe device timeout issue. wm_init(),wm_reset() doesn't help to recover
 > from problem state.The only way to recover is to reload the switch.There
 > was no initialization error.
 >  >>From wm_print_stats() and wm_pkt_stats(),i don't see any value in the
 > registers listed and the packets are not hitting the hardware.
 >  > wm_reset also didn't help to recover the issue.
 >  >
 >  > Do you need any other output to investigate further?
 >  > Is there any other way to recover from this issue?.
 >  > Is there any other fix has been done around this area?.
 >  >
 >  >
 >  >
 >  >> How-To-Repeat:
 >  > Reload the switch continuously that runs with NetBSD 7.1.
 >  >> Fix:
 >  >
 >
 >    Are you using modified version of if_wm.c? It has neither
 > wm_print_stats()
 >  nor wm_pkt_stats().
 >
 >  > Do you need any other output to investigate further?
 >
 >  wm(4) has WM_EVENT_COUNTERS option, so it would be good to
 >  add "options WM_EVENT_COUNTERS" into your kernel configuration
 >  file and see vmstat -e.
 >
 >  > Reload the switch continuously that runs with NetBSD 7.1.
 >
 >  It's little hard to know what triggers the problem because
 >  I don't know what your switch implementation do in the reload.
 >
 >  I have SGMII based C2000 machines. I've not tested on others
 >  (e.g. KX, PCIe SERDES or GMII). It would be good to check your
 >  PHY configuration and/or status if your system is not SGMII based.
 >
 >
 >  --
 >  -----------------------------------------------
 >                   SAITOH Masanobu (msaitoh%execsw.org@localhost
 >                                    msaitoh%netbsd.org@localhost)
 >
 >

 --000000000000331cd7057e4db69c
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"auto">Thanks for your assistance.<div dir=3D"auto">I&#39;ll add=
  those options suggested and will get back on this.<br><br><div data-smartm=
 ail=3D"gmail_signature" dir=3D"auto">Regards,<br>Aravind Mani</div></div></=
 div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu 27 Dec, 2018, 9:=
 10 AM Masanobu SAITOH &lt;<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost">msaitoh@exe=
 csw.org</a> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
 in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The following re=
 ply was made to PR kern/53814; it has been noted by GNATS.<br>
 <br>
 From: Masanobu SAITOH &lt;<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_=
 blank" rel=3D"noreferrer">msaitoh%execsw.org@localhost</a>&gt;<br>
 To: gnats-bugs%NetBSD.org@localhost, <a href="" href="mailto:kern-bug-people%netbsd.org@localhost" target="_blank">kern-bug-people%netbsd.org@localhost" ta=
 rget=3D"_blank" rel=3D"noreferrer">kern-bug-people%netbsd.org@localhost</a>,<br>
 =C2=A0<a href="" href="mailto:gnats-admin%netbsd.org@localhost" target="_blank">gnats-admin%netbsd.org@localhost" target=3D"_blank" rel=3D"no=
 referrer">gnats-admin%netbsd.org@localhost</a>, <a href="" href="mailto:netbsd-bugs@netbsd." target="_blank">netbsd-bugs@netbsd.=
 org" target=3D"_blank" rel=3D"noreferrer">netbsd-bugs%netbsd.org@localhost</a><br>
 Cc: <a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_blank" rel=3D"noreferr=
 er">msaitoh%execsw.org@localhost</a><br>
 Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1<br>
 Date: Thu, 27 Dec 2018 12:38:45 +0900<br>
 <br>
 =C2=A0On 2018/12/26 15:50, <a href="" href="mailto:aravind.ss1094%gmail.com@localhost" target="_blank">aravind.ss1094%gmail.com@localhost" targ=
 et=3D"_blank" rel=3D"noreferrer">aravind.ss1094%gmail.com@localhost</a> wrote:<br>
 =C2=A0&gt;&gt; Number:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A053814<br>
 =C2=A0&gt;&gt; Category:=C2=A0 =C2=A0 =C2=A0 =C2=A0kern<br>
 =C2=A0&gt;&gt; Synopsis:=C2=A0 =C2=A0 =C2=A0 =C2=A0wm0 device timeout in ne=
 tbsd 7.1<br>
 =C2=A0&gt;&gt; Confidential:=C2=A0 =C2=A0no<br>
 =C2=A0&gt;&gt; Severity:=C2=A0 =C2=A0 =C2=A0 =C2=A0serious<br>
 =C2=A0&gt;&gt; Priority:=C2=A0 =C2=A0 =C2=A0 =C2=A0medium<br>
 =C2=A0&gt;&gt; Responsible:=C2=A0 =C2=A0 kern-bug-people<br>
 =C2=A0&gt;&gt; State:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 open<br>
 =C2=A0&gt;&gt; Class:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sw-bug<br>
 =C2=A0&gt;&gt; Submitter-Id:=C2=A0 =C2=A0net<br>
 =C2=A0&gt;&gt; Arrival-Date:=C2=A0 =C2=A0Wed Dec 26 06:50:00 +0000 2018<br>
 =C2=A0&gt;&gt; Originator:=C2=A0 =C2=A0 =C2=A0Aravind Mani<br>
 =C2=A0&gt;&gt; Release:=C2=A0 =C2=A0 =C2=A0 =C2=A0 netbsd 7.1<br>
 =C2=A0&gt;&gt; Organization:<br>
 =C2=A0&gt; private organization<br>
 =C2=A0&gt;&gt; Environment:<br>
 =C2=A0&gt; chip type: I354<br>
 =C2=A0&gt;&gt; Description:<br>
 =C2=A0&gt; We use WM_T_I354 chip type.When we reload continuously,we could =
 able to observe device timeout issue. wm_init(),wm_reset() doesn&#39;t help=
  to recover from problem state.The only way to recover is to reload the swi=
 tch.There was no initialization error.<br>
 =C2=A0&gt;&gt;From wm_print_stats() and wm_pkt_stats(),i don&#39;t see any =
 value in the registers listed and the packets are not hitting the hardware.=
 <br>
 =C2=A0&gt; wm_reset also didn&#39;t help to recover the issue.<br>
 =C2=A0&gt; <br>
 =C2=A0&gt; Do you need any other output to investigate further?<br>
 =C2=A0&gt; Is there any other way to recover from this issue?.<br>
 =C2=A0&gt; Is there any other fix has been done around this area?.<br>
 =C2=A0&gt; <br>
 =C2=A0&gt; <br>
 =C2=A0&gt; <br>
 =C2=A0&gt;&gt; How-To-Repeat:<br>
 =C2=A0&gt; Reload the switch continuously that runs with NetBSD 7.1.<br>
 =C2=A0&gt;&gt; Fix:<br>
 =C2=A0&gt; <br>
 <br>
 =C2=A0 =C2=A0Are you using modified version of if_wm.c? It has neither wm_p=
 rint_stats()<br>
 =C2=A0nor wm_pkt_stats().<br>
 <br>
 =C2=A0&gt; Do you need any other output to investigate further?<br>
 <br>
 =C2=A0wm(4) has WM_EVENT_COUNTERS option, so it would be good to<br>
 =C2=A0add &quot;options WM_EVENT_COUNTERS&quot; into your kernel configurat=
 ion<br>
 =C2=A0file and see vmstat -e.<br>
 <br>
 =C2=A0&gt; Reload the switch continuously that runs with NetBSD 7.1.<br>
 <br>
 =C2=A0It&#39;s little hard to know what triggers the problem because<br>
 =C2=A0I don&#39;t know what your switch implementation do in the reload.<br=
 >
 <br>
 =C2=A0I have SGMII based C2000 machines. I&#39;ve not tested on others<br>
 =C2=A0(e.g. KX, PCIe SERDES or GMII). It would be good to check your<br>
 =C2=A0PHY configuration and/or status if your system is not SGMII based.<br=
 >
 <br>
 <br>
 =C2=A0-- <br>
 =C2=A0-----------------------------------------------<br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SAITOH Masan=
 obu (<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_blank" rel=3D"norefer=
 rer">msaitoh%execsw.org@localhost</a><br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href="" href="mailto:msait" target="_blank">msait=
 oh%netbsd.org@localhost" target=3D"_blank" rel=3D"noreferrer">msaitoh%netbsd.org@localhost</a>)=
 <br>
 <br>
 </blockquote></div>

 --000000000000331cd7057e4db69c--



--

with Regards,
Aravind.

SStk-1 # vmstat -e
event                                         total     rate type
bus_dma loads                              95451577      319 misc
vmcmd kills                                     661        0 misc
vmcmd calls                                    3731        0 misc
vmem static_bt_inuse                            200        0 misc
vmem static_bt_count                            200        0 misc
TLB shootdown                                182842        0 intr
cpu0 runqueue pull                         16763601       56 misc
cpu0 runqueue push                           218455        0 misc
cpu0 runqueue stay                         29807214       99 misc
cpu0 runqueue localize                    199719304      669 misc
softint net/0                               1172158        3 misc
softint net block/0                           46424        0 misc
softint bio/0                                  6245        0 misc
softint bio block/0                               4        0 misc
softint clk/0                              29819349       99 misc
softint clk block/0                          145137        0 misc
softint ser/0                                 44794        0 misc
callout late/0                                38366        0 misc
crosscall unicast                                11        0 misc
crosscall broadcast                               4        0 misc
namecache entries collected                   13850        0 misc
namecache under scan target                  298154        0 misc
cpu0 timer                                 29826661       99 intr
cpu0 generic IPI                             548755        1 misc
cpu0 FPU synch IPI                             3116        0 misc
cpu0 kpreempt IPI                            235125        0 misc
cpu1 runqueue pull                         18640375       62 misc
cpu1 runqueue push                          2168053        7 misc
cpu1 runqueue stay                         30124219      100 misc
cpu1 runqueue localize                    158923916      532 misc
softint net/1                                   365        0 misc
softint net block/1                             360        0 misc
softint clk/1                              29817170       99 misc
softint clk block/1                           28745        0 misc
softint ser/1                                  8658        0 misc
callout late/1                                18516        0 misc
cpu1 timer                                 29826661       99 misc
cpu1 FPU synch IPI                             4340        0 misc
cpu1 kpreempt IPI                            173706        0 misc
ioapic0 pin 20                               172536        0 intr
wm0 txsstall                                   1088        0 misc
wm0 txdw                                     183747        0 intr
wm0 txseg0                                   255914        0 misc
ioapic0 pin 23                                   18        0 intr
ioapic0 pin 19                                 6797        0 intr
ioapic0 pin 4                                 33936        0 intr
kpreempt defer: critical section               7776        0 misc
kpreempt defer: kernel_lock                 2793374        9 misc
kpreempt immediate                           493760        1 misc


SStk-1 # sysctl -w ddb.command="call wm_pkt_stats(0)"
Total Pkts Recv     =0
Missed Pkts Recv    =0
Good Pkts Recv      =0
No Buff Pkts Recv   =0
Mgmt Pkt Recv       =0
Mgmt Buff Drop Recv =0
Interrupt Assertion =80

wm_print_stats:

0x4000 : 0
0x4004 : 0
0x4008 : 0
0x400c : 0
0x4010 : 0
0x4014 : 0
0x4018 : 0
0x401c : 0
0x4020 : 0
0x4024 : 0
0x4028 : 0
0x402c : 0
0x4030 : 0
0x4034 : 0
0x4038 : 0
0x403c : 0
0x4040 : 0
0x4044 : 0
0x4048 : 0
0x404c : 0
0x4050 : 0
0x4054 : 0
0x4058 : 0
0x405c : 0
0x4060 : 0
0x4064 : 0
0x4068 : 0
0x406c : 0
0x4070 : 0
0x4074 : 0
0x4078 : 0
0x407c : 0
0x4080 : 0
0x4084 : 0
0x4088 : 0
0x408c : 0
0x4090 : 0
0x4094 : 0
0x4098 : 0
0x409c : 0
0x40a0 : 0
0x40a4 : 0
0x40a8 : 0
0x40ac : 0
0x40b0 : 0
0x40b4 : 0
0x40b8 : 0
0x40bc : 0
0x40c0 : 0
0x40c4 : 0
0x40c8 : 0
0x40cc : 0
0x40d0 : 0
0x40d4 : 0
0x40d8 : 0
0x40dc : 0
0x40e0 : 0
0x40e4 : 0
0x40e8 : 0
0x40ec : 0
0x40f0 : 0
0x40f4 : 0
0x40f8 : 0
0x40fc : 0
0x4100 : 0x24
0x4104 : 0
0x4108 : 0
0x410c : 0
0x4110 : 0
0x4114 : 0
0x4118 : 0
0x411c : 0
0x4120 : 0
0x4124 : 0
0x4128 : 0
0x412c : 0
0x4130 : 0
0x4134 : 0
0x4138 : 0
0x413c : 0
0x4140 : 0
0x4144 : 0
0x4148 : 0
0x414c : 0
0x4150 : 0
0x4154 : 0






Home | Main Index | Thread Index | Old Index