NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/49462: if_slowtimo callout mangled?
The following reply was made to PR kern/49462; it has been noted by GNATS.
From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/49462: if_slowtimo callout mangled?
Date: Thu, 11 Dec 2014 10:51:06 +0900
On Thu, Dec 11, 2014 at 2:35 AM, <martin%netbsd.org@localhost> wrote:
>>Number: 49462
>>Category: kern
>>Synopsis: if_slowtimo callout mangled?
>>Confidential: no
>>Severity: critical
>>Priority: high
>>Responsible: kern-bug-people
>>State: open
>>Class: sw-bug
>>Submitter-Id: net
>>Arrival-Date: Wed Dec 10 17:35:00 +0000 2014
>>Originator: Martin Husemann
>>Release: NetBSD 7.99.2
>>Organization:
> The NetBSD Foundation, Inc.
>>Environment:
> System: NetBSD thirdstage.duskware.de 7.99.2 NetBSD 7.99.2 (MODULAR) #237: Wed Dec 10 17:53:26 CET 2014 martin%thirdstage.duskware.de@localhost:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
> Architecture: sparc64
> Machine: sparc64
>>Description:
>
> I get a (not quite, but pretty often) reproducable KASSERT firing on reboot
> on a SMP sparc64 machine with four bge interfaces:
>
> panic: kernel diagnostic assertion "(c->c_flags & CALLOUT_PENDING) == 0" failed: file "../../../../kern/kern_timeout.c", line 314 callout 0x103b29aa8: c_func (0x119f100) c_flags (0x3) destroyed from 0x11a0e40
>
> The function is if_slowtimo and the caller of callout_destroy() is
> if_detach.
>
> However, this is basically impossible:
>
> if (ifp->if_slowtimo != NULL) {
> callout_halt(ifp->if_slowtimo_ch, NULL);
> callout_destroy(ifp->if_slowtimo_ch);
> kmem_free(ifp->if_slowtimo_ch, sizeof(*ifp->if_slowtimo_ch));
> }
>
> and callout_halt() certainly kills bit 0 in flags (CALLOUT_BOUND), and should
> also not return before CALLOUT_PENDING has cleared.
>
> So something(tm) is wrong, but it is not obvious to me.
It can happen. See PR 47881 and this thread:
http://mail-index.netbsd.org/netbsd-bugs/2014/11/12/msg039065.html .
The problem is that callout_halt waits for a callout handler
completion but doesn't prevent the handler from scheduling
a new callout.
I fixed the problem by using an existing flag of the user (in6m->in6m_timer)
as "don't callout_schedule anymore" flag for callout. I think the fix can
be applied to this case.
Nonetheless, I'm thinking that we maybe should do it in callout_halt itself.
For example, introduce CALLOUT_HALTING flag and set it before waiting a
callout handler finished, while callout_schedule first checks the flag and
do nothing if the flag is set. By doing so, we can prevent a new callout
from being scheduled during callout_halt.
Off topic: callout_schedule_locked takes a (held) mutex but it's just
released only just before returning. We can release the mutex outside
callout_schedule_locked so that we don't need to pass it at all.
ozaki-r
>
>>How-To-Repeat:
> Reboot an MP sparc64 machine with a -current DIAGNOSTIC kernel, best from a
> ssh login.
>
>>Fix:
> n/a
>
Home |
Main Index |
Thread Index |
Old Index