NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/55538: RPI usmsc link state handling flawed - loss of connectivity
>Number: 55538
>Category: kern
>Synopsis: RPI usmsc link state handling flawed - loss of connectivity
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Aug 04 07:45:00 +0000 2020
>Originator: Frank Kardel
>Release: NetBSD 9.0_STABLE
>Organization:
>Environment:
System: NetBSD rpi 9.0_STABLE NetBSD 9.0_STABLE (AHZ) #7: Thu Jul 30 14:12:49 CEST 2020 kardel@Andromeda:/src/NetBSD/n9/src/obj.evbarm/sys/arch/evbarm/compile/AHZ evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
I observed loss of connectivity on a usmsc0 directly connected network while
a vlan attached to usmsc0 was still functional.
The interface state in this situation is:
usmsc0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=1<VLAN_MTU>
ec_enabled=1<VLAN_MTU>
address: yy:yy:yy:yy:yy:yy
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::xxxx:xxxx:xxxx:xxxx%usmsc0/64 flags 0x8<DETACHED> scopeid 0x1
inet6 xxxx:xx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 flags 0x8<DETACHED>
inet6 xxxx:xx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/128 flags 0x8<DETACHED>
inet 10.200.100.1/24 broadcast 10.200.100.255 flags 0x4<DETACHED>
No wonder that communication with 10.200.100.2 does not work (DETACHED) but
communication via vlan works (media status active).
I observed two issues here
a) interface address state becomes desynchronized from actual link state
b) link changes are being posted though the switch connected to
usmsc0 does NOT observe any link status changes.
dtrace of the relevant entry points shows:
CPU ID FUNCTION:NAME
3 9278 mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
netbsd`ukphy_service+0x70
netbsd`mii_pollstat+0x10
netbsd`ether_mediastatus+0xc
netbsd`ifmedia_ioctl+0x10
netbsd`usbnet_ioctl+0xc
netbsd`doifioctl+0xc
netbsd`sys_ioctl+0xc
netbsd`syscall+0xc
0 9278 mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
netbsd`ukphy_service+0x70
netbsd`mii_tick+0xc
netbsd`usbnet_tick_task+0xc
netbsd`usb_task_thread+0xc
1 9278 mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
netbsd`ukphy_service+0x70
netbsd`mii_pollstat+0x10
netbsd`ether_mediastatus+0xc
netbsd`ifmedia_ioctl+0x10
netbsd`usbnet_ioctl+0xc
netbsd`doifioctl+0xc
netbsd`sys_ioctl+0xc
netbsd`syscall+0xc
0 9278 mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
netbsd`ukphy_service+0x70
netbsd`mii_tick+0xc
netbsd`usbnet_tick_task+0xc
netbsd`usb_task_thread+0xc
3 9278 mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
netbsd`ukphy_service+0x70
netbsd`mii_pollstat+0x10
netbsd`ether_mediastatus+0xc
netbsd`ifmedia_ioctl+0x10
netbsd`usbnet_ioctl+0xc
netbsd`doifioctl+0xc
netbsd`sys_ioctl+0xc
netbsd`syscall+0xc
0 9278 mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
netbsd`ukphy_service+0x70
netbsd`mii_tick+0xc
netbsd`usbnet_tick_task+0xc
netbsd`usb_task_thread+0xc
2 9278 mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
netbsd`ukphy_service+0x70
netbsd`mii_pollstat+0x10
netbsd`ether_mediastatus+0xc
netbsd`ifmedia_ioctl+0x10
netbsd`usbnet_ioctl+0xc
netbsd`doifioctl+0xc
netbsd`sys_ioctl+0xc
netbsd`syscall+0xc
2 9278 mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x1/0x3
netbsd`ukphy_service+0x70
netbsd`mii_pollstat+0x10
netbsd`ether_mediastatus+0xc
netbsd`ifmedia_ioctl+0x10
netbsd`usbnet_ioctl+0xc
netbsd`doifioctl+0xc
netbsd`sys_ioctl+0xc
netbsd`syscall+0xc
So this shows spurious (several hours/days apart) media_atatus changes - cmd==3 is a status poll by quagga, cmd==1 is
MII_TICK (once per second). First value of a 0x???/0x??? sequence is the mii_softc state, the second is the mii state.
The state lossage (case a: status: active vs. addresses DETACHED) can be
found in sys/dev/mii/mii_physubr.c:mii_phy_update
mii_phy_update(struct mii_softc *sc, int cmd)
{
struct mii_data *mii = sc->mii_pdata;
if (sc->mii_media_active != mii->mii_media_active ||
sc->mii_media_status != mii->mii_media_status ||
cmd == MII_MEDIACHG) {
mii_phy_statusmsg(sc);
(*mii->mii_statchg)(mii->mii_ifp);
sc->mii_media_active = mii->mii_media_active;
sc->mii_media_status = mii->mii_media_status;
}
}
Here the the mii_softc state picks up the current, possibly by (*mii->mii_statchg)(mii->mii_ifp) changed,
state. Thus state changes can be lost as the condition test invariant is being violated.
We fix that by
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.87.4.1
diff -u -r1.87.4.1 mii_physubr.c
--- sys/dev/mii/mii_physubr.c 21 Nov 2019 14:00:49 -0000 1.87.4.1
+++ sys/dev/mii/mii_physubr.c 4 Aug 2020 07:23:26 -0000
@@ -424,14 +424,16 @@
mii_phy_update(struct mii_softc *sc, int cmd)
{
struct mii_data *mii = sc->mii_pdata;
+ u_int mii_media_active = mii->mii_media_active;
+ int mii_media_status = mii->mii_media_status;
- if (sc->mii_media_active != mii->mii_media_active ||
- sc->mii_media_status != mii->mii_media_status ||
+ if (sc->mii_media_active != mii_media_active ||
+ sc->mii_media_status != mii_media_status ||
cmd == MII_MEDIACHG) {
- mii_phy_statusmsg(sc);
(*mii->mii_statchg)(mii->mii_ifp);
- sc->mii_media_active = mii->mii_media_active;
- sc->mii_media_status = mii->mii_media_status;
+ sc->mii_media_active = mii_media_active;
+ sc->mii_media_status = mii_media_status;
+ mii_phy_statusmsg(sc);
}
}
then the interface becomes stable again as no state changes with respect to
address validity are lost.
For case b): Why the MII layer sees spurious link status changes which are not observed by the
switch remains to be analysed.
>How-To-Repeat:
Run a RPI2 on a switch and observe after some time (can be days) that the addresses become
DETACHED.
>Fix:
See patch above to state consistency.
Spurious mii link status changes need to be examined.
linkstate dtrace script:
begin 644 link.d.gz
M'XL("-8.*5\``VQI;FLN9`#-4\MNPC`0/..OV")%"E)HZ14$/X(JRW(<6-&8
M*#$M"/CWKA\!!]P['")Y9A^S.PMJCA5J-'.E37MB9S9J6M2FRL=8-9!U\(UZ
M!YT11L$2LG)<T(,B-OLJ%^VF6\^^IBNL^%&+6DT*B+%2&/%.Y;FMP5V-R8)!
M^-%;[G("KHRAE>'##LW+2"GWO_H%Q#A[6F4.K28="8G_<+3)F*D1>;,]$4S]
MU6TN:BAI@J#/RI!UV0.?#J!,`CHY7;D:5KZ%[2*6EO0X/1?,P9#WL;4J47`A
M#?XH>(N"!\3E$H8?IMG%'+I$FB<F<'9I=U^$5.3,/`R0E04,VBQA=LR.'_;3
M,Z%%Q(P+-O(N!#-IK-A(SU*'XF;8LW+?L$B/VS=(S/J4$2;UQW$[#(#K@]7N
MAKC<"KV)G?4.1=9VL;$LM;M[N;#"](D_;`7Z=^*T*>OIGS70&QWI'WK41I*/
#!```
`
end
>Unformatted:
Home |
Main Index |
Thread Index |
Old Index