tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: callout_stop => callout_halt



On Mon, Nov 17, 2014 at 2:28 PM, Taylor R Campbell
<campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>    Date: Mon, 17 Nov 2014 12:52:05 +0900
>    From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
>
>    On Mon, Nov 17, 2014 at 3:00 AM, Taylor R Campbell
>    <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>    > My inclination is to nix sme_callout_mtx altogether and replace every
>    > use of it by sme_mtx (there are obviously some races where one thread
>    > holds one and another thread holds another and they try to touch the
>    > same data structure), but I don't know this code well enough to say
>    > for sure.
>
>    Sounds reasonable for me. One concern is that the callout function
>    calls workqueue_enqueue that enqueues a work of IPL_SOFTCLOCK and
>    the work also tries to acquire sme_mtx. IIUC, we need to splsoftclock
>    in the callout function to avoid deadlocks. Is it right?
>
> Not quite.  There are a few things going on here...
>
> 1. The IPL of the workqueue is the highest IPL from which we are going
> to queue work, so that the workqueue thread can block interrupts at
> that level while it grabs work from its queue data structure.  It's
> safe to call workqueue_enqueue at any IPL below that too.  When we're
> in the callout, we're at IPL_SOFTCLOCK, so it's OK.

Oh, I didn't know that (callout functions run at IPL_SOFTCLOCK)!
So I'm convinced to the following discussion.

>
> 2. When an interrupt handler needs to take a lock that threads might
> want to use, the threads need to block the interrupt so that it
> doesn't come in on the same CPU while the thread holds the lock --->
> deadlock.  It's not the interrupt handler that needs to spl*; it's
> everyone who *else* takes the lock.  That's why we pass an IPL to
> mutex_init: so mutex_enter can raise to it and mutex_exit can restore
> the IPL.  (It is ~never the case that you need to write explicit calls
> to spl* in modern APIs.)

Yes.

>
> 3. Presumably that's why sme_callout_mtx exists -- I hadn't noticed
> before that it was at IPL_SOFTCLOCK while sme_mtx is at IPL_NONE.
> (That's not enough, though: the locking looks broken to me because not
> all the envsys code is consistent for each data structure about
> whether you need sme_mtx or sme_callout_mtx.)
>
> 4. However, for the purposes of mutex(9), IPL_NONE and all the
> IPL_SOFT* are equivalent.  So in this case it almost certainly doesn't
> matter in practice, and it's probably no worse than the rest of the
> system to just use sme_mtx with IPL_NONE for now...

I see, though I'm not sure I understand it completely

> although on
> discussion with mrg@, it sounds like there are some pretty deep bugs
> in the way that softints/callouts and blocking work, which can be the
> subject of a separate thread.  Argh!

...so I'm looking forward to the new thread :)

>
>    So the patch is updated: http://www.netbsd.org/~ozaki-r/callout_halt.diff
>
> I know you didn't introduce this, but while I'm looking at that I
> realize that some of the code you're touching is calling *_destroy
> routines under a lock, which is usually not OK.  callout_destroy is an
> exception that happens to be OK, but, e.g., workqueue_destroy may wait
> for work in progress to complete, and kmem_free may call into uvm and
> wait for things inside that.

Okay, I'm trying to get rid of them that require some rearrangements.

Thanks,
  ozaki-r


Home | Main Index | Thread Index | Old Index