Re: kern/49462: if_slowtimo callout mangled?

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,martin%NetBSD.org@localhost
Subject: Re: kern/49462: if_slowtimo callout mangled?
From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
Date: Thu, 11 Dec 2014 01:55:01 +0000 (UTC)

The following reply was made to PR kern/49462; it has been noted by GNATS.

From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/49462: if_slowtimo callout mangled?
Date: Thu, 11 Dec 2014 10:51:06 +0900

 On Thu, Dec 11, 2014 at 2:35 AM,  <martin%netbsd.org@localhost> wrote:
 >>Number:         49462
 >>Category:       kern
 >>Synopsis:       if_slowtimo callout mangled?
 >>Confidential:   no
 >>Severity:       critical
 >>Priority:       high
 >>Responsible:    kern-bug-people
 >>State:          open
 >>Class:          sw-bug
 >>Submitter-Id:   net
 >>Arrival-Date:   Wed Dec 10 17:35:00 +0000 2014
 >>Originator:     Martin Husemann
 >>Release:        NetBSD 7.99.2
 >>Organization:
 > The NetBSD Foundation, Inc.
 >>Environment:
 > System: NetBSD thirdstage.duskware.de 7.99.2 NetBSD 7.99.2 (MODULAR) #237: Wed Dec 10 17:53:26 CET 2014 martin%thirdstage.duskware.de@localhost:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
 > Architecture: sparc64
 > Machine: sparc64
 >>Description:
 >
 > I get a (not quite, but pretty often) reproducable KASSERT firing on reboot
 > on a SMP sparc64 machine with four bge interfaces:
 >
 > panic: kernel diagnostic assertion "(c->c_flags & CALLOUT_PENDING) == 0" failed: file "../../../../kern/kern_timeout.c", line 314 callout 0x103b29aa8: c_func (0x119f100) c_flags (0x3) destroyed from 0x11a0e40
 >
 > The function is if_slowtimo and the caller of callout_destroy() is
 > if_detach.
 >
 > However, this is basically impossible:
 >
 >          if (ifp->if_slowtimo != NULL) {
 >                  callout_halt(ifp->if_slowtimo_ch, NULL);
 >                  callout_destroy(ifp->if_slowtimo_ch);
 >                  kmem_free(ifp->if_slowtimo_ch, sizeof(*ifp->if_slowtimo_ch));
 >          }
 >
 > and callout_halt() certainly kills bit 0 in flags (CALLOUT_BOUND), and should
 > also not return before CALLOUT_PENDING has cleared.
 >
 > So something(tm) is wrong, but it is not obvious to me.

 It can happen. See PR 47881 and this thread:
 http://mail-index.netbsd.org/netbsd-bugs/2014/11/12/msg039065.html .

 The problem is that callout_halt waits for a callout handler
 completion but doesn't prevent the handler from scheduling
 a new callout.

 I fixed the problem by using an existing flag of the user (in6m->in6m_timer)
 as "don't callout_schedule anymore" flag for callout. I think the fix can
 be applied to this case.

 Nonetheless, I'm thinking that we maybe should do it in callout_halt itself.
 For example, introduce CALLOUT_HALTING flag and set it before waiting a
 callout handler finished, while callout_schedule first checks the flag and
 do nothing if the flag is set. By doing so, we can prevent a new callout
 from being scheduled during callout_halt.

 Off topic: callout_schedule_locked takes a (held) mutex but it's just
 released only just before returning. We can release the mutex outside
 callout_schedule_locked so that we don't need to pass it at all.

   ozaki-r

 >
 >>How-To-Repeat:
 > Reboot an MP sparc64 machine with a -current DIAGNOSTIC kernel, best from a
 > ssh login.
 >
 >>Fix:
 > n/a
 >

Prev by Date: Re: kern/49462: if_slowtimo callout mangled?
Next by Date: Re: port-arm/48805 (Audio Driver issues on Pi running NetBSD (2014-05-10) image - hangs audio applications like audioplay)
Previous by Thread: Re: kern/49462: if_slowtimo callout mangled?
Next by Thread: Re: kern/49462: if_slowtimo callout mangled?
Indexes:

Home | Main Index | Thread Index | Old Index