NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55538: RPI usmsc link state handling flawed - loss of connectivity



>Number:         55538
>Category:       kern
>Synopsis:       RPI usmsc link state handling flawed - loss of connectivity
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 04 07:45:00 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 9.0_STABLE
>Organization:
	
>Environment:
	
	
System: NetBSD rpi 9.0_STABLE NetBSD 9.0_STABLE (AHZ) #7: Thu Jul 30 14:12:49 CEST 2020 kardel@Andromeda:/src/NetBSD/n9/src/obj.evbarm/sys/arch/evbarm/compile/AHZ evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
	I observed loss of connectivity on a usmsc0 directly connected network while
	a vlan attached to usmsc0 was still functional.
	The interface state in this situation is:

	usmsc0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        	ec_capabilities=1<VLAN_MTU>
        	ec_enabled=1<VLAN_MTU>
        	address: yy:yy:yy:yy:yy:yy
        	media: Ethernet autoselect (100baseTX full-duplex)
        	status: active
        	inet6 fe80::xxxx:xxxx:xxxx:xxxx%usmsc0/64 flags 0x8<DETACHED> scopeid 0x1
        	inet6 xxxx:xx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 flags 0x8<DETACHED>
        	inet6 xxxx:xx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/128 flags 0x8<DETACHED>
        	inet 10.200.100.1/24 broadcast 10.200.100.255 flags 0x4<DETACHED>
	No wonder that communication with 10.200.100.2 does not work (DETACHED) but
	communication via vlan works (media status active).
	I observed two issues here
		a) interface address state becomes desynchronized from actual link state
		b) link changes are being posted though the switch connected to
		   usmsc0 does NOT observe any link status changes.

dtrace of the relevant entry points shows:
CPU     ID                    FUNCTION:NAME
  3   9278             mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
              netbsd`ukphy_service+0x70
              netbsd`mii_pollstat+0x10
              netbsd`ether_mediastatus+0xc
              netbsd`ifmedia_ioctl+0x10
              netbsd`usbnet_ioctl+0xc
              netbsd`doifioctl+0xc
              netbsd`sys_ioctl+0xc
              netbsd`syscall+0xc

  0   9278             mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
              netbsd`ukphy_service+0x70
              netbsd`mii_tick+0xc
              netbsd`usbnet_tick_task+0xc
              netbsd`usb_task_thread+0xc

  1   9278             mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
              netbsd`ukphy_service+0x70
              netbsd`mii_pollstat+0x10
              netbsd`ether_mediastatus+0xc
              netbsd`ifmedia_ioctl+0x10
              netbsd`usbnet_ioctl+0xc
              netbsd`doifioctl+0xc
              netbsd`sys_ioctl+0xc
              netbsd`syscall+0xc

  0   9278             mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
              netbsd`ukphy_service+0x70
              netbsd`mii_tick+0xc
              netbsd`usbnet_tick_task+0xc
              netbsd`usb_task_thread+0xc

  3   9278             mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
              netbsd`ukphy_service+0x70
              netbsd`mii_pollstat+0x10
              netbsd`ether_mediastatus+0xc
              netbsd`ifmedia_ioctl+0x10
              netbsd`usbnet_ioctl+0xc
              netbsd`doifioctl+0xc
              netbsd`sys_ioctl+0xc
              netbsd`syscall+0xc

  0   9278             mii_phy_update:entry iface usmsc0: cmd = 1, media_active = 0x100026/0x100026, media_status = 0x1/0x3
              netbsd`ukphy_service+0x70
              netbsd`mii_tick+0xc
              netbsd`usbnet_tick_task+0xc
              netbsd`usb_task_thread+0xc

  2   9278             mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x3/0x1
              netbsd`ukphy_service+0x70
              netbsd`mii_pollstat+0x10
              netbsd`ether_mediastatus+0xc
              netbsd`ifmedia_ioctl+0x10
              netbsd`usbnet_ioctl+0xc
              netbsd`doifioctl+0xc
              netbsd`sys_ioctl+0xc
              netbsd`syscall+0xc

  2   9278             mii_phy_update:entry iface usmsc0: cmd = 3, media_active = 0x100026/0x100026, media_status = 0x1/0x3
              netbsd`ukphy_service+0x70
              netbsd`mii_pollstat+0x10
              netbsd`ether_mediastatus+0xc
              netbsd`ifmedia_ioctl+0x10
              netbsd`usbnet_ioctl+0xc
              netbsd`doifioctl+0xc
              netbsd`sys_ioctl+0xc
              netbsd`syscall+0xc

	So this shows spurious (several hours/days apart) media_atatus changes - cmd==3 is a status poll by quagga, cmd==1 is
	MII_TICK (once per second). First value of a 0x???/0x??? sequence is the mii_softc state, the second is the mii state.

	The state lossage (case a: status: active vs. addresses DETACHED) can be 
	found in sys/dev/mii/mii_physubr.c:mii_phy_update

mii_phy_update(struct mii_softc *sc, int cmd)
{
        struct mii_data *mii = sc->mii_pdata;

        if (sc->mii_media_active != mii->mii_media_active ||
            sc->mii_media_status != mii->mii_media_status ||
            cmd == MII_MEDIACHG) {
                mii_phy_statusmsg(sc);
                (*mii->mii_statchg)(mii->mii_ifp);
                sc->mii_media_active = mii->mii_media_active;
                sc->mii_media_status = mii->mii_media_status;
        }
}

	Here the the mii_softc state picks up the current, possibly by (*mii->mii_statchg)(mii->mii_ifp) changed,
	state. Thus state changes can be lost as the condition test invariant is being violated.

	We fix that by
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.87.4.1
diff -u -r1.87.4.1 mii_physubr.c
--- sys/dev/mii/mii_physubr.c   21 Nov 2019 14:00:49 -0000      1.87.4.1
+++ sys/dev/mii/mii_physubr.c   4 Aug 2020 07:23:26 -0000
@@ -424,14 +424,16 @@
 mii_phy_update(struct mii_softc *sc, int cmd)
 {
        struct mii_data *mii = sc->mii_pdata;
+       u_int mii_media_active = mii->mii_media_active;
+       int   mii_media_status = mii->mii_media_status;
 
-       if (sc->mii_media_active != mii->mii_media_active ||
-           sc->mii_media_status != mii->mii_media_status ||
+       if (sc->mii_media_active != mii_media_active ||
+           sc->mii_media_status != mii_media_status ||
            cmd == MII_MEDIACHG) {
-               mii_phy_statusmsg(sc);
                (*mii->mii_statchg)(mii->mii_ifp);
-               sc->mii_media_active = mii->mii_media_active;
-               sc->mii_media_status = mii->mii_media_status;
+               sc->mii_media_active = mii_media_active;
+               sc->mii_media_status = mii_media_status;
+               mii_phy_statusmsg(sc);
        }
 }
 
	then the interface becomes stable again as no state changes with respect to
	address validity are lost.

	For case b): Why the MII layer sees spurious link status changes which are not observed by the
	switch remains to be analysed.

>How-To-Repeat:
	Run a RPI2 on a switch and observe after some time (can be days) that the addresses become
	DETACHED.

>Fix:
	See patch above to state consistency.
	Spurious mii link status changes need to be examined.

linkstate dtrace script:
begin 644 link.d.gz
M'XL("-8.*5\``VQI;FLN9`#-4\MNPC`0/..OV")%"E)HZ14$/X(JRW(<6-&8
M*#$M"/CWKA\!!]P['")Y9A^S.PMJCA5J-'.E37MB9S9J6M2FRL=8-9!U\(UZ
M!YT11L$2LG)<T(,B-OLJ%^VF6\^^IBNL^%&+6DT*B+%2&/%.Y;FMP5V-R8)!
M^-%;[G("KHRAE>'##LW+2"GWO_H%Q#A[6F4.K28="8G_<+3)F*D1>;,]$4S]
MU6TN:BAI@J#/RI!UV0.?#J!,`CHY7;D:5KZ%[2*6EO0X/1?,P9#WL;4J47`A
M#?XH>(N"!\3E$H8?IMG%'+I$FB<F<'9I=U^$5.3,/`R0E04,VBQA=LR.'_;3
M,Z%%Q(P+-O(N!#-IK-A(SU*'XF;8LW+?L$B/VS=(S/J4$2;UQW$[#(#K@]7N
MAKC<"KV)G?4.1=9VL;$LM;M[N;#"](D_;`7Z=^*T*>OIGS70&QWI'WK41I*/
#!```
`
end

>Unformatted:
 	
 	


Home | Main Index | Thread Index | Old Index