NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53265: panic in bnx_detach() on shutdown



On 2018/05/06 21:05, Andreas Gustafsson wrote:
Number:         53265
Category:       kern
Synopsis:       panic in bnx_detach() on shutdown
Confidential:   no
Severity:       non-critical
Priority:       low
Responsible:    kern-bug-people
State:          open
Class:          sw-bug
Submitter-Id:   net
Arrival-Date:   Sun May 06 12:05:00 +0000 2018
Originator:     Andreas Gustafsson
Release:        NetBSD-current, source date 2018.05.04.14.15.41
Organization:

Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
Description:

Seen on the serial console shutting down an 8-core amd64 machine
running a fresh -current:

[ 63426.6135600] syncing disks... done
[ 63428.0241184] cd0: detached
[ 63428.0608678] brgphy3: detached
[ 63428.0983665] brgphy2: detached
[ 63428.1358664] brgphy1: detached
[ 63428.1733649] brgphy0: detached
[ 63428.2108644] atapibus0: detached
[ 63428.2442055] uhub5: detached
[ 63428.2842215] uhub4: detached
[ 63428.3142328] uhub2: detached
[ 63428.3542487] uhub1: detached
[ 63428.3842605] uhub0: detached
[ 63428.4242764] com1: detached
[ 63428.4642922] bnx3: detached
[ 63428.5043087] bnx2: detached
[ 63428.5443239] Skipping crash dump on recursive panic
[ 63428.5943436] panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || c->c_cpu->cc_active != c" failed: file "/tmp/bracket/build/2018.05.04.14.15\
.41-amd64-debug-baremetal/src/sys/kern/kern_timeout.c", line 318
[ 63428.8344384] cpu7: Begin traceback...
[ 63428.8744542] vpanic() at netbsd:vpanic+0x16f
[ 63428.9344780] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
[ 63429.0045057] callout_destroy() at netbsd:callout_destroy+0x75
[ 63429.0745334] bnx_detach() at netbsd:bnx_detach+0xbb
[ 63429.1345572] config_detach() at netbsd:config_detach+0x121
[ 63429.2045849] config_detach_all() at netbsd:config_detach_all+0x97
[ 63429.2746126] cpu_reboot() at netbsd:cpu_reboot+0x19a
[ 63429.3346364] sys_reboot() at netbsd:sys_reboot+0x85
[ 63429.3946602] syscall() at netbsd:syscall+0x208
[ 63429.4446800] --- syscall (number 208) ---
[ 63429.4946998] 74ed2443ebda:
[ 63429.5347157] cpu7: End traceback...
[ 63429.5847356] rebooting...

No harm done, but it's a bug nonetheless..

 Even if you do "shutdown -h", it doesn't halt and reboot.


How-To-Repeat:

Only happened once so far.

Fix:


 How often does it panic on shutdown? Could you test the following patch
to verify the problem is fixed?

---------------------------
- Fix a bug that bnx(4) panic on shutdown. Reported by Andreas Gustafsson in
  PR#53265.
- Make sure not to re-arm the callout when we are about to detach. Same as
  if_bge.c rev. 1.292.
- Use pci_intr_establish_xname().
---------------------------
Index: if_bnxvar.h
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_bnxvar.h,v
retrieving revision 1.6
diff -u -p -r1.6 if_bnxvar.h
--- if_bnxvar.h	1 Jul 2014 17:11:35 -0000	1.6
+++ if_bnxvar.h	7 May 2018 10:03:56 -0000
@@ -210,6 +210,7 @@ struct bnx_softc
 	uint32_t		tx_prod_bseq;	/* Counts the bytes used.  */
struct callout bnx_timeout;
+	int			bnx_detaching;
/* Frame size and mbuf allocation size for RX frames. */
 	uint32_t		max_frame_size;
Index: if_bnx.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.63
diff -u -p -r1.63 if_bnx.c
--- if_bnx.c	8 Feb 2018 09:05:19 -0000	1.63
+++ if_bnx.c	7 May 2018 10:03:59 -0000
@@ -792,7 +792,8 @@ bnx_attach(device_t parent, device_t sel
 	    IFCAP_CSUM_UDPv4_Tx | IFCAP_CSUM_UDPv4_Rx;
/* Hookup IRQ last. */
-	sc->bnx_intrhand = pci_intr_establish(pc, ih, IPL_NET, bnx_intr, sc);
+	sc->bnx_intrhand = pci_intr_establish_xname(pc, ih, IPL_NET, bnx_intr,
+	    sc, device_xname(self));
 	if (sc->bnx_intrhand == NULL) {
 		aprint_error_dev(self, "couldn't establish interrupt");
 		if (intrstr != NULL)
@@ -890,17 +891,7 @@ bnx_detach(device_t dev, int flags)
/* Stop and reset the controller. */
 	s = splnet();
-	if (ifp->if_flags & IFF_RUNNING)
-		bnx_stop(ifp, 1);
-	else {
-		/* Disable the transmit/receive blocks. */
-		REG_WR(sc, BNX_MISC_ENABLE_CLR_BITS, 0x5ffffff);
-		REG_RD(sc, BNX_MISC_ENABLE_CLR_BITS);
-		DELAY(20);
-		bnx_disable_intr(sc);
-		bnx_reset(sc, BNX_DRV_MSG_CODE_RESET);
-	}
-
+	bnx_stop(ifp, 1);
 	splx(s);
pmf_device_deregister(dev);
@@ -3371,10 +3362,11 @@ bnx_stop(struct ifnet *ifp, int disable)
DBPRINT(sc, BNX_VERBOSE_RESET, "Entering %s()\n", __func__); - if ((ifp->if_flags & IFF_RUNNING) == 0)
-		return;
-
-	callout_stop(&sc->bnx_timeout);
+	if (disable) {
+		sc->bnx_detaching = 1;
+		callout_halt(&sc->bnx_timeout, NULL);
+	} else
+		callout_stop(&sc->bnx_timeout);
mii_down(&sc->bnx_mii); @@ -5694,9 +5686,6 @@ bnx_tick(void *xsc)
 	/* Update the statistics from the hardware statistics block. */
 	bnx_stats_update(sc);
- /* Schedule the next tick. */
-	callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
-
 	mii = &sc->bnx_mii;
 	mii_tick(mii);
@@ -5707,6 +5696,11 @@ bnx_tick(void *xsc)
 	bnx_get_buf(sc, &prod, &chain_prod, &prod_bseq);
 	sc->rx_prod = prod;
 	sc->rx_prod_bseq = prod_bseq;
+
+	/* Schedule the next tick. */
+	if (!sc->bnx_detaching)
+		callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
+
 	splx(s);
 	return;
 }


The same diff is at:

	http://www.netbsd.org/~msaitoh/bnx-20180507-0.dif


--
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index