NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53265: panic in bnx_detach() on shutdown



The following reply was made to PR kern/53265; it has been noted by GNATS.

From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: msaitoh%execsw.org@localhost
Subject: Re: kern/53265: panic in bnx_detach() on shutdown
Date: Mon, 7 May 2018 19:09:01 +0900

 On 2018/05/06 21:05, Andreas Gustafsson wrote:
 >> Number:         53265
 >> Category:       kern
 >> Synopsis:       panic in bnx_detach() on shutdown
 >> Confidential:   no
 >> Severity:       non-critical
 >> Priority:       low
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Sun May 06 12:05:00 +0000 2018
 >> Originator:     Andreas Gustafsson
 >> Release:        NetBSD-current, source date 2018.05.04.14.15.41
 >> Organization:
 > 
 >> Environment:
 > System: NetBSD
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 > 
 > Seen on the serial console shutting down an 8-core amd64 machine
 > running a fresh -current:
 > 
 > [ 63426.6135600] syncing disks... done
 > [ 63428.0241184] cd0: detached
 > [ 63428.0608678] brgphy3: detached
 > [ 63428.0983665] brgphy2: detached
 > [ 63428.1358664] brgphy1: detached
 > [ 63428.1733649] brgphy0: detached
 > [ 63428.2108644] atapibus0: detached
 > [ 63428.2442055] uhub5: detached
 > [ 63428.2842215] uhub4: detached
 > [ 63428.3142328] uhub2: detached
 > [ 63428.3542487] uhub1: detached
 > [ 63428.3842605] uhub0: detached
 > [ 63428.4242764] com1: detached
 > [ 63428.4642922] bnx3: detached
 > [ 63428.5043087] bnx2: detached
 > [ 63428.5443239] Skipping crash dump on recursive panic
 > [ 63428.5943436] panic: kernel diagnostic assertion "c->c_cpu->cc_lwp == curlwp || c->c_cpu->cc_active != c" failed: file "/tmp/bracket/build/2018.05.04.14.15\
 > .41-amd64-debug-baremetal/src/sys/kern/kern_timeout.c", line 318
 > [ 63428.8344384] cpu7: Begin traceback...
 > [ 63428.8744542] vpanic() at netbsd:vpanic+0x16f
 > [ 63428.9344780] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
 > [ 63429.0045057] callout_destroy() at netbsd:callout_destroy+0x75
 > [ 63429.0745334] bnx_detach() at netbsd:bnx_detach+0xbb
 > [ 63429.1345572] config_detach() at netbsd:config_detach+0x121
 > [ 63429.2045849] config_detach_all() at netbsd:config_detach_all+0x97
 > [ 63429.2746126] cpu_reboot() at netbsd:cpu_reboot+0x19a
 > [ 63429.3346364] sys_reboot() at netbsd:sys_reboot+0x85
 > [ 63429.3946602] syscall() at netbsd:syscall+0x208
 > [ 63429.4446800] --- syscall (number 208) ---
 > [ 63429.4946998] 74ed2443ebda:
 > [ 63429.5347157] cpu7: End traceback...
 > [ 63429.5847356] rebooting...
 > 
 > No harm done, but it's a bug nonetheless..
 
   Even if you do "shutdown -h", it doesn't halt and reboot.
 
 
 >> How-To-Repeat:
 > 
 > Only happened once so far.
 > 
 >> Fix:
 > 
 
   How often does it panic on shutdown? Could you test the following patch
 to verify the problem is fixed?
 
 ---------------------------
 - Fix a bug that bnx(4) panic on shutdown. Reported by Andreas Gustafsson in
    PR#53265.
 - Make sure not to re-arm the callout when we are about to detach. Same as
    if_bge.c rev. 1.292.
 - Use pci_intr_establish_xname().
 ---------------------------
 Index: if_bnxvar.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/if_bnxvar.h,v
 retrieving revision 1.6
 diff -u -p -r1.6 if_bnxvar.h
 --- if_bnxvar.h	1 Jul 2014 17:11:35 -0000	1.6
 +++ if_bnxvar.h	7 May 2018 10:03:56 -0000
 @@ -210,6 +210,7 @@ struct bnx_softc
   	uint32_t		tx_prod_bseq;	/* Counts the bytes used.  */
   
   	struct callout		bnx_timeout;
 +	int			bnx_detaching;
   
   	/* Frame size and mbuf allocation size for RX frames. */
   	uint32_t		max_frame_size;
 Index: if_bnx.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/if_bnx.c,v
 retrieving revision 1.63
 diff -u -p -r1.63 if_bnx.c
 --- if_bnx.c	8 Feb 2018 09:05:19 -0000	1.63
 +++ if_bnx.c	7 May 2018 10:03:59 -0000
 @@ -792,7 +792,8 @@ bnx_attach(device_t parent, device_t sel
   	    IFCAP_CSUM_UDPv4_Tx | IFCAP_CSUM_UDPv4_Rx;
   
   	/* Hookup IRQ last. */
 -	sc->bnx_intrhand = pci_intr_establish(pc, ih, IPL_NET, bnx_intr, sc);
 +	sc->bnx_intrhand = pci_intr_establish_xname(pc, ih, IPL_NET, bnx_intr,
 +	    sc, device_xname(self));
   	if (sc->bnx_intrhand == NULL) {
   		aprint_error_dev(self, "couldn't establish interrupt");
   		if (intrstr != NULL)
 @@ -890,17 +891,7 @@ bnx_detach(device_t dev, int flags)
   
   	/* Stop and reset the controller. */
   	s = splnet();
 -	if (ifp->if_flags & IFF_RUNNING)
 -		bnx_stop(ifp, 1);
 -	else {
 -		/* Disable the transmit/receive blocks. */
 -		REG_WR(sc, BNX_MISC_ENABLE_CLR_BITS, 0x5ffffff);
 -		REG_RD(sc, BNX_MISC_ENABLE_CLR_BITS);
 -		DELAY(20);
 -		bnx_disable_intr(sc);
 -		bnx_reset(sc, BNX_DRV_MSG_CODE_RESET);
 -	}
 -
 +	bnx_stop(ifp, 1);
   	splx(s);
   
   	pmf_device_deregister(dev);
 @@ -3371,10 +3362,11 @@ bnx_stop(struct ifnet *ifp, int disable)
   
   	DBPRINT(sc, BNX_VERBOSE_RESET, "Entering %s()\n", __func__);
   
 -	if ((ifp->if_flags & IFF_RUNNING) == 0)
 -		return;
 -
 -	callout_stop(&sc->bnx_timeout);
 +	if (disable) {
 +		sc->bnx_detaching = 1;
 +		callout_halt(&sc->bnx_timeout, NULL);
 +	} else
 +		callout_stop(&sc->bnx_timeout);
   
   	mii_down(&sc->bnx_mii);
   
 @@ -5694,9 +5686,6 @@ bnx_tick(void *xsc)
   	/* Update the statistics from the hardware statistics block. */
   	bnx_stats_update(sc);
   
 -	/* Schedule the next tick. */
 -	callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
 -
   	mii = &sc->bnx_mii;
   	mii_tick(mii);
   
 @@ -5707,6 +5696,11 @@ bnx_tick(void *xsc)
   	bnx_get_buf(sc, &prod, &chain_prod, &prod_bseq);
   	sc->rx_prod = prod;
   	sc->rx_prod_bseq = prod_bseq;
 +
 +	/* Schedule the next tick. */
 +	if (!sc->bnx_detaching)
 +		callout_reset(&sc->bnx_timeout, hz, bnx_tick, sc);
 +
   	splx(s);
   	return;
   }
 
 
 The same diff is at:
 
 	http://www.netbsd.org/~msaitoh/bnx-20180507-0.dif
 
 
 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh%execsw.org@localhost
                                   msaitoh%netbsd.org@localhost)
 


Home | Main Index | Thread Index | Old Index