tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: percpu_foreach() does not execute remotely



[Cc'ing the wizards at IIJ because a diagnostic measure we're
discussing may require adapting the network stack.]

> Date: Wed, 29 Jan 2020 05:43:39 -0800
> From: Jason Thorpe <thorpej%me.com@localhost>
> 
> > On Jan 28, 2020, at 9:46 PM, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> > 
> > Normally sleeping under percpu_getref -- even on an adaptive lock --
> > is a no-no for precisely this reason.  (percpu_getref, while very nice
> > in some ways, is an extremely delicate abstraction!  It would be nice
> > if violating this rule triggered an immediate panic.)
> 
> My point was about the percpu_swap_lock ... percpu_foreach() is
> protected from this scenario by its use of that lock,
> percpu_getref() is not.  Maybe we want to consider adding panics in
> the cases where a thread goes to sleep while preemption is disabled?

Sounds reasonable, although it's not immediately clear to me that
kpreempt_disable is the right criterion -- e.g., kpreempt_disable is
mandatory to call mi_switch, so this might cause trouble in the
scheduler, and there may be other logic out there that relies on
preventing _preemption_ but deliberately _sleeps_.

Perhaps we should have an LWP or CPU flag saying we're currently in
the middle of percpu_getref/putref; we can then add another case to
assert_sleepable in kern/kern_lock.c and an assertion in mi_switch.

However, there's some things we might have to adapt to avoid
spuriously triggering this (some of which already look a little
questionable to me):

net/if.h if_tunnel_get_ro:
  percpu_getref then mutex_enter on an adaptive lock.  I think this is
  not a problem right now because we never use the percpu pointer
  after mutex_enter.

  I think it should be enough to move percpu_putref out of
  if_tunnel_put_ro and into if_tunnel_get_ro -- holding the lock
  should be enough to serialize access to the struct route.

net/if_l2tp.c l2tp_ifq_percpu_getref:
  Held across if_tunnel_get_ro, via in6_l2tp_output, which might sleep
  on an adaptive lock.

  I think it should be enough to move percpu_putref out of
  l2tp_ifq_percpu_putref and into l2tp_ifq_percpu_getref (perhaps
  change the name for clarity), and to use curlwp_bind/bindx instead.

net/route.h rtcache_percpu_getref:
  Held across large subsystems, too many for me to examine -- e.g.,
  ip_output, which might acquire the kernel lock which might sleep.
  Since the percpu pointer is never used after it returns, though, I
  think only curlwp_bind/bindx is needed here.

netinet/wqinput.c wqinput_percpu_getref:
  Held across (a) softnet_lock, which is adaptive, in wqinput_work;
  and across (b) pool_get on an IPL_NONE pool, which also takes an
  adaptive lock, in wqinput_input.

  I think (a) is technically OK now because wqinput_work is bound to a
  CPU anyway, so access to wwl should be serialized on that CPU (no
  sleeps in wqinput_work_get), but (b) might be a problem unless we
  can prove wqinput_input is bound to a CPU, which I ran out of juice
  to do.

  In any case, I think the percpu_putref can again be moved into
  wqinput_percpu_getref (and perhaps the name should be changed for
  clarity), since we never use the percpu pointer again -- but we
  should either add curlwp_bind/bindx or comment on a proof that
  wqinput_input is already bound to the CPU.

> IF we did that, then we should also disable preemption after
> acquiring percpu_swap_lock to catch bad behavior by the
> percpu_foreach() callback.

I don't think so, because it's OK for the percpu_foreach callback to
mutex_enter/exit on an adaptive lock even though that might sleep.
What's not OK is sleeping for allocation or other long durations.
Merely holding percpu_swap_lock during percpu_foreach is enough to
prevent concurrent percpu_cpu_swap.


Home | Main Index | Thread Index | Old Index