Re: callout_stop => callout_halt

To: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
Subject: Re: callout_stop => callout_halt
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Mon, 17 Nov 2014 05:28:32 +0000

   Date: Mon, 17 Nov 2014 12:52:05 +0900
   From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>

   On Mon, Nov 17, 2014 at 3:00 AM, Taylor R Campbell
   <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
   > My inclination is to nix sme_callout_mtx altogether and replace every
   > use of it by sme_mtx (there are obviously some races where one thread
   > holds one and another thread holds another and they try to touch the
   > same data structure), but I don't know this code well enough to say
   > for sure.

   Sounds reasonable for me. One concern is that the callout function
   calls workqueue_enqueue that enqueues a work of IPL_SOFTCLOCK and
   the work also tries to acquire sme_mtx. IIUC, we need to splsoftclock
   in the callout function to avoid deadlocks. Is it right?

Not quite.  There are a few things going on here...

1. The IPL of the workqueue is the highest IPL from which we are going
to queue work, so that the workqueue thread can block interrupts at
that level while it grabs work from its queue data structure.  It's
safe to call workqueue_enqueue at any IPL below that too.  When we're
in the callout, we're at IPL_SOFTCLOCK, so it's OK.

2. When an interrupt handler needs to take a lock that threads might
want to use, the threads need to block the interrupt so that it
doesn't come in on the same CPU while the thread holds the lock --->
deadlock.  It's not the interrupt handler that needs to spl*; it's
everyone who *else* takes the lock.  That's why we pass an IPL to
mutex_init: so mutex_enter can raise to it and mutex_exit can restore
the IPL.  (It is ~never the case that you need to write explicit calls
to spl* in modern APIs.)

3. Presumably that's why sme_callout_mtx exists -- I hadn't noticed
before that it was at IPL_SOFTCLOCK while sme_mtx is at IPL_NONE.
(That's not enough, though: the locking looks broken to me because not
all the envsys code is consistent for each data structure about
whether you need sme_mtx or sme_callout_mtx.)

4. However, for the purposes of mutex(9), IPL_NONE and all the
IPL_SOFT* are equivalent.  So in this case it almost certainly doesn't
matter in practice, and it's probably no worse than the rest of the
system to just use sme_mtx with IPL_NONE for now...although on
discussion with mrg@, it sounds like there are some pretty deep bugs
in the way that softints/callouts and blocking work, which can be the
subject of a separate thread.  Argh!

   So the patch is updated: http://www.netbsd.org/~ozaki-r/callout_halt.diff

I know you didn't introduce this, but while I'm looking at that I
realize that some of the code you're touching is calling *_destroy
routines under a lock, which is usually not OK.  callout_destroy is an
exception that happens to be OK, but, e.g., workqueue_destroy may wait
for work in progress to complete, and kmem_free may call into uvm and
wait for things inside that.

Follow-Ups:
- Re: callout_stop => callout_halt
  - From: Ryota Ozaki

References:
- Re: callout_stop => callout_halt
  - From: Ryota Ozaki

Prev by Date: Re: callout_stop => callout_halt
Next by Date: Re: callout_stop => callout_halt
Previous by Thread: Re: callout_stop => callout_halt
Next by Thread: Re: callout_stop => callout_halt
Indexes:

Home | Main Index | Thread Index | Old Index