The following reply was made to PR kern/53814; it has been noted by GNATS.
From: Aravind M <aravind.ss1094%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1
Date: Mon, 31 Dec 2018 14:35:05 +0530
--000000000000331cd7057e4db69c
Content-Type: text/plain; charset="UTF-8"
Thanks for your assistance.
I'll add those options suggested and will get back on this.
Regards,
Aravind Mani
On Thu 27 Dec, 2018, 9:10 AM Masanobu SAITOH <msaitoh%execsw.org@localhost wrote:
> The following reply was made to PR kern/53814; it has been noted by GNATS.
>
> From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
> To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
> gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
> Cc: msaitoh%execsw.org@localhost
> Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1
> Date: Thu, 27 Dec 2018 12:38:45 +0900
>
> On 2018/12/26 15:50, aravind.ss1094%gmail.com@localhost wrote:
> >> Number: 53814
> >> Category: kern
> >> Synopsis: wm0 device timeout in netbsd 7.1
> >> Confidential: no
> >> Severity: serious
> >> Priority: medium
> >> Responsible: kern-bug-people
> >> State: open
> >> Class: sw-bug
> >> Submitter-Id: net
> >> Arrival-Date: Wed Dec 26 06:50:00 +0000 2018
> >> Originator: Aravind Mani
> >> Release: netbsd 7.1
> >> Organization:
> > private organization
> >> Environment:
> > chip type: I354
> >> Description:
> > We use WM_T_I354 chip type.When we reload continuously,we could able to
> observe device timeout issue. wm_init(),wm_reset() doesn't help to recover
> from problem state.The only way to recover is to reload the switch.There
> was no initialization error.
> >>From wm_print_stats() and wm_pkt_stats(),i don't see any value in the
> registers listed and the packets are not hitting the hardware.
> > wm_reset also didn't help to recover the issue.
> >
> > Do you need any other output to investigate further?
> > Is there any other way to recover from this issue?.
> > Is there any other fix has been done around this area?.
> >
> >
> >
> >> How-To-Repeat:
> > Reload the switch continuously that runs with NetBSD 7.1.
> >> Fix:
> >
>
> Are you using modified version of if_wm.c? It has neither
> wm_print_stats()
> nor wm_pkt_stats().
>
> > Do you need any other output to investigate further?
>
> wm(4) has WM_EVENT_COUNTERS option, so it would be good to
> add "options WM_EVENT_COUNTERS" into your kernel configuration
> file and see vmstat -e.
>
> > Reload the switch continuously that runs with NetBSD 7.1.
>
> It's little hard to know what triggers the problem because
> I don't know what your switch implementation do in the reload.
>
> I have SGMII based C2000 machines. I've not tested on others
> (e.g. KX, PCIe SERDES or GMII). It would be good to check your
> PHY configuration and/or status if your system is not SGMII based.
>
>
> --
> -----------------------------------------------
> SAITOH Masanobu (msaitoh%execsw.org@localhost
> msaitoh%netbsd.org@localhost)
>
>
--000000000000331cd7057e4db69c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"auto">Thanks for your assistance.<div dir=3D"auto">I'll add=
those options suggested and will get back on this.<br><br><div data-smartm=
ail=3D"gmail_signature" dir=3D"auto">Regards,<br>Aravind Mani</div></div></=
div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu 27 Dec, 2018, 9:=
10 AM Masanobu SAITOH <<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost">msaitoh@exe=
csw.org</a> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The following re=
ply was made to PR kern/53814; it has been noted by GNATS.<br>
<br>
From: Masanobu SAITOH <<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_=
blank" rel=3D"noreferrer">msaitoh%execsw.org@localhost</a>><br>
To: gnats-bugs%NetBSD.org@localhost, <a href="" href="mailto:kern-bug-people%netbsd.org@localhost" target="_blank">kern-bug-people%netbsd.org@localhost" ta=
rget=3D"_blank" rel=3D"noreferrer">kern-bug-people%netbsd.org@localhost</a>,<br>
=C2=A0<a href="" href="mailto:gnats-admin%netbsd.org@localhost" target="_blank">gnats-admin%netbsd.org@localhost" target=3D"_blank" rel=3D"no=
referrer">gnats-admin%netbsd.org@localhost</a>, <a href="" href="mailto:netbsd-bugs@netbsd." target="_blank">netbsd-bugs@netbsd.=
org" target=3D"_blank" rel=3D"noreferrer">netbsd-bugs%netbsd.org@localhost</a><br>
Cc: <a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_blank" rel=3D"noreferr=
er">msaitoh%execsw.org@localhost</a><br>
Subject: Re: kern/53814: wm0 device timeout in netbsd 7.1<br>
Date: Thu, 27 Dec 2018 12:38:45 +0900<br>
<br>
=C2=A0On 2018/12/26 15:50, <a href="" href="mailto:aravind.ss1094%gmail.com@localhost" target="_blank">aravind.ss1094%gmail.com@localhost" targ=
et=3D"_blank" rel=3D"noreferrer">aravind.ss1094%gmail.com@localhost</a> wrote:<br>
=C2=A0>> Number:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A053814<br>
=C2=A0>> Category:=C2=A0 =C2=A0 =C2=A0 =C2=A0kern<br>
=C2=A0>> Synopsis:=C2=A0 =C2=A0 =C2=A0 =C2=A0wm0 device timeout in ne=
tbsd 7.1<br>
=C2=A0>> Confidential:=C2=A0 =C2=A0no<br>
=C2=A0>> Severity:=C2=A0 =C2=A0 =C2=A0 =C2=A0serious<br>
=C2=A0>> Priority:=C2=A0 =C2=A0 =C2=A0 =C2=A0medium<br>
=C2=A0>> Responsible:=C2=A0 =C2=A0 kern-bug-people<br>
=C2=A0>> State:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 open<br>
=C2=A0>> Class:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sw-bug<br>
=C2=A0>> Submitter-Id:=C2=A0 =C2=A0net<br>
=C2=A0>> Arrival-Date:=C2=A0 =C2=A0Wed Dec 26 06:50:00 +0000 2018<br>
=C2=A0>> Originator:=C2=A0 =C2=A0 =C2=A0Aravind Mani<br>
=C2=A0>> Release:=C2=A0 =C2=A0 =C2=A0 =C2=A0 netbsd 7.1<br>
=C2=A0>> Organization:<br>
=C2=A0> private organization<br>
=C2=A0>> Environment:<br>
=C2=A0> chip type: I354<br>
=C2=A0>> Description:<br>
=C2=A0> We use WM_T_I354 chip type.When we reload continuously,we could =
able to observe device timeout issue. wm_init(),wm_reset() doesn't help=
to recover from problem state.The only way to recover is to reload the swi=
tch.There was no initialization error.<br>
=C2=A0>>From wm_print_stats() and wm_pkt_stats(),i don't see any =
value in the registers listed and the packets are not hitting the hardware.=
<br>
=C2=A0> wm_reset also didn't help to recover the issue.<br>
=C2=A0> <br>
=C2=A0> Do you need any other output to investigate further?<br>
=C2=A0> Is there any other way to recover from this issue?.<br>
=C2=A0> Is there any other fix has been done around this area?.<br>
=C2=A0> <br>
=C2=A0> <br>
=C2=A0> <br>
=C2=A0>> How-To-Repeat:<br>
=C2=A0> Reload the switch continuously that runs with NetBSD 7.1.<br>
=C2=A0>> Fix:<br>
=C2=A0> <br>
<br>
=C2=A0 =C2=A0Are you using modified version of if_wm.c? It has neither wm_p=
rint_stats()<br>
=C2=A0nor wm_pkt_stats().<br>
<br>
=C2=A0> Do you need any other output to investigate further?<br>
<br>
=C2=A0wm(4) has WM_EVENT_COUNTERS option, so it would be good to<br>
=C2=A0add "options WM_EVENT_COUNTERS" into your kernel configurat=
ion<br>
=C2=A0file and see vmstat -e.<br>
<br>
=C2=A0> Reload the switch continuously that runs with NetBSD 7.1.<br>
<br>
=C2=A0It's little hard to know what triggers the problem because<br>
=C2=A0I don't know what your switch implementation do in the reload.<br=
>
<br>
=C2=A0I have SGMII based C2000 machines. I've not tested on others<br>
=C2=A0(e.g. KX, PCIe SERDES or GMII). It would be good to check your<br>
=C2=A0PHY configuration and/or status if your system is not SGMII based.<br=
>
<br>
<br>
=C2=A0-- <br>
=C2=A0-----------------------------------------------<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SAITOH Masan=
obu (<a href="" href="mailto:msaitoh%execsw.org@localhost" target="_blank">msaitoh%execsw.org@localhost" target=3D"_blank" rel=3D"norefer=
rer">msaitoh%execsw.org@localhost</a><br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href="" href="mailto:msait" target="_blank">msait=
oh%netbsd.org@localhost" target=3D"_blank" rel=3D"noreferrer">msaitoh%netbsd.org@localhost</a>)=
<br>
<br>
</blockquote></div>
--000000000000331cd7057e4db69c--
SStk-1 # vmstat -e event total rate type bus_dma loads 95451577 319 misc vmcmd kills 661 0 misc vmcmd calls 3731 0 misc vmem static_bt_inuse 200 0 misc vmem static_bt_count 200 0 misc TLB shootdown 182842 0 intr cpu0 runqueue pull 16763601 56 misc cpu0 runqueue push 218455 0 misc cpu0 runqueue stay 29807214 99 misc cpu0 runqueue localize 199719304 669 misc softint net/0 1172158 3 misc softint net block/0 46424 0 misc softint bio/0 6245 0 misc softint bio block/0 4 0 misc softint clk/0 29819349 99 misc softint clk block/0 145137 0 misc softint ser/0 44794 0 misc callout late/0 38366 0 misc crosscall unicast 11 0 misc crosscall broadcast 4 0 misc namecache entries collected 13850 0 misc namecache under scan target 298154 0 misc cpu0 timer 29826661 99 intr cpu0 generic IPI 548755 1 misc cpu0 FPU synch IPI 3116 0 misc cpu0 kpreempt IPI 235125 0 misc cpu1 runqueue pull 18640375 62 misc cpu1 runqueue push 2168053 7 misc cpu1 runqueue stay 30124219 100 misc cpu1 runqueue localize 158923916 532 misc softint net/1 365 0 misc softint net block/1 360 0 misc softint clk/1 29817170 99 misc softint clk block/1 28745 0 misc softint ser/1 8658 0 misc callout late/1 18516 0 misc cpu1 timer 29826661 99 misc cpu1 FPU synch IPI 4340 0 misc cpu1 kpreempt IPI 173706 0 misc ioapic0 pin 20 172536 0 intr wm0 txsstall 1088 0 misc wm0 txdw 183747 0 intr wm0 txseg0 255914 0 misc ioapic0 pin 23 18 0 intr ioapic0 pin 19 6797 0 intr ioapic0 pin 4 33936 0 intr kpreempt defer: critical section 7776 0 misc kpreempt defer: kernel_lock 2793374 9 misc kpreempt immediate 493760 1 misc SStk-1 # sysctl -w ddb.command="call wm_pkt_stats(0)" Total Pkts Recv =0 Missed Pkts Recv =0 Good Pkts Recv =0 No Buff Pkts Recv =0 Mgmt Pkt Recv =0 Mgmt Buff Drop Recv =0 Interrupt Assertion =80 wm_print_stats: 0x4000 : 0 0x4004 : 0 0x4008 : 0 0x400c : 0 0x4010 : 0 0x4014 : 0 0x4018 : 0 0x401c : 0 0x4020 : 0 0x4024 : 0 0x4028 : 0 0x402c : 0 0x4030 : 0 0x4034 : 0 0x4038 : 0 0x403c : 0 0x4040 : 0 0x4044 : 0 0x4048 : 0 0x404c : 0 0x4050 : 0 0x4054 : 0 0x4058 : 0 0x405c : 0 0x4060 : 0 0x4064 : 0 0x4068 : 0 0x406c : 0 0x4070 : 0 0x4074 : 0 0x4078 : 0 0x407c : 0 0x4080 : 0 0x4084 : 0 0x4088 : 0 0x408c : 0 0x4090 : 0 0x4094 : 0 0x4098 : 0 0x409c : 0 0x40a0 : 0 0x40a4 : 0 0x40a8 : 0 0x40ac : 0 0x40b0 : 0 0x40b4 : 0 0x40b8 : 0 0x40bc : 0 0x40c0 : 0 0x40c4 : 0 0x40c8 : 0 0x40cc : 0 0x40d0 : 0 0x40d4 : 0 0x40d8 : 0 0x40dc : 0 0x40e0 : 0 0x40e4 : 0 0x40e8 : 0 0x40ec : 0 0x40f0 : 0 0x40f4 : 0 0x40f8 : 0 0x40fc : 0 0x4100 : 0x24 0x4104 : 0 0x4108 : 0 0x410c : 0 0x4110 : 0 0x4114 : 0 0x4118 : 0 0x411c : 0 0x4120 : 0 0x4124 : 0 0x4128 : 0 0x412c : 0 0x4130 : 0 0x4134 : 0 0x4138 : 0 0x413c : 0 0x4140 : 0 0x4144 : 0 0x4148 : 0 0x414c : 0 0x4150 : 0 0x4154 : 0