NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57644: ixl(4) may panic or hang



The following reply was made to PR kern/57644; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Manuel.Bouyer%lip6.fr@localhost
Cc: gnats-bugs%NetBSD.org@localhost, knakahara%NetBSD.org@localhost, yamaguchi%NetBSD.org@localhost,
	rin%NetBSD.org@localhost
Subject: Re: kern/57644: ixl(4) may panic or hang
Date: Fri, 6 Oct 2023 09:43:27 +0000

 bouyer: Can you try a current kernel and reproduce the hang?  This
 should trigger a heartbeat panic in current, which should give us more
 diagnostic information.
 
 
 knakahara, yamaguchi, rin: This can't be right:
 
 	ixl_atq_set(&sc->sc_link_state_atq, ixl_get_link_status_done);
 
 The function passed to ixl_atq_set is only ever called from interrupt
 context via ixl_intr/ixl_other_intr -> ixl_atq_done ->
 ixl_atq_done_locked.
 
 ixl_get_link_status_done is wrong here for two reasons:
 
 1. It releases and reacquires sc->sc_atq_lock, but ixl_atq_done_locked
    does not appear to be prepared for iatq_fn to release and reacquire
    the lock -- it may silently use stale cached values of
    sc->sc_atq_prod/cons.
 
 2. It calls ixl_link_state_update which takes sc->sc_cfg_lock which is
    an adaptive lock, forbidden in interrupt context -- that's what led
    to this crash.
 
 ixl_atq_exec_locked is also wrong because it doesn't wait for any
 particular condition with cv_timedwait; it may spuriously wake up
 early, or it may report timeout even though the interrupt was
 delivered.  (cv_timedwait is a bad API, very hard to use correctly.)
 
 If ixl_get_link_status_done has to be done asynchronously, perhaps we
 can resolve both (1) and (2) by creating a new lock, say sc_mii_lock,
 with the following lock order:
 
 sc_cfg_lock => sc_atq_lock => sc_mii_lock
 
 This sc_mii_lock would be used for all the mii/ifmedia logic, such as
 ifmedia_init_with_lock.  ixl_get_link_status_done would no longer need
 to release and reacquire sc_atq_lock, and ixl_link_status_update would
 no longer need to take sc_cfg_lock.
 
 It doesn't resolve the issue with ixl_wakeup and cv_timedwait, though.
 Perhaps a single bit of state inside the iatq entry or something would
 be enough -- something like this:
 
 /* ixl_wakeup */
 	iatq->iatq_done = 1;
 	cv_broadcast(&sc->sc_atq_cv);
 
 /* ixl_atq_exec_locked */
 	const unsigned start = getticks();
 	iatq->iatq_done = 0;
 	ixl_atq_set(iatq, ixl_wakeup);
 	error = ixl_atq_post_locked(sc, iatq);
 	if (error)
 		return error;
 	while (!iatq->iatq_done) {
 		const unsigned slept = getticks() - start;
 		if (slept >= IXL_ATQ_EXEC_TIMEOUT)
 			return ETIMEDOUT;
 		(void)cv_timedwait(&sc->sc_atq_cv, &sc->sc_atq_lock,
 		    IXL_ATQ_EXEC_TIMEOUT - slept);
 	}
 	return 0;
 


Home | Main Index | Thread Index | Old Index