tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Race condition between an LWP migration and curlwp_bind



On Thu, Feb 15, 2018 at 5:30 AM, Mateusz Guzik <mjguzik%gmail.com@localhost> wrote:
> On Tue, Feb 13, 2018 at 05:14:38PM +0900, Ryota Ozaki wrote:
>>   panic: kernel diagnostic assertion "(psref->psref_cpu == curcpu())"
>> failed: file "/(hidden)/sys/kern/subr_psref.c", line 317 passive
>> reference transferred from CPU 0 to CPU 3
>>
>> I first thought that something went wrong in an ioctl handler
>> for example curlwp_bindx was called doubly and LP_BOUND was unset
>> then the LWP was migrated to another CPU. However, this kind of
>> assumptions was denied by KASSERTs in psref_release. So I doubted
>> of LP_BOUND and found there is a race condition on LWP migrations.
>>
>> curlwp_bind sets LP_BOUND to l_pflags of curlwp and that prevents
>> curlwp from migrating to another CPU until curlwp_bindx is called.
>
> The bug you found (and I trimmed) looks like the culprit, but there is
> an extra problem which probably happens to not manifest itself in terms
> of code generation: the bind/unbind inlines lack compiler barriers. See
> KPREEMPT_* inlines for comparison. The diff is definitely trivial:
>
> diff --git a/sys/sys/lwp.h b/sys/sys/lwp.h
> index 47d162271f9c..f18b76b984e4 100644
> --- a/sys/sys/lwp.h
> +++ b/sys/sys/lwp.h
> @@ -536,6 +536,7 @@ curlwp_bind(void)
>
>         bound = curlwp->l_pflag & LP_BOUND;
>         curlwp->l_pflag |= LP_BOUND;
> +       __insn_barrier();
>
>         return bound;
>  }
> @@ -545,6 +546,7 @@ curlwp_bindx(int bound)
>  {
>
>         KASSERT(curlwp->l_pflag & LP_BOUND);
> +       __insn_barrier();
>         curlwp->l_pflag ^= bound ^ LP_BOUND;
>  }
>
> --
> Mateusz Guzik <mjguzik gmail.com>

Right, the barriers are basically required as you say.

Note that for the current usage of curlwp_bind, i.e., preventing LWP
migrations between psref_acquire and psref_release, the barriers are
not must, IIUC. Because psref_acquire and psref_release are functions
and no reordering probably happen between psref_* and curlwp_*. Even
if a compiler optimization tries to reorder, curlwp_* won't cross
psref_* because psref_* uses percpu that uses kpreempt APIs that call
__insn_barrier.

Anyway I'll add the barriers.

Thanks,
  ozaki-r


Home | Main Index | Thread Index | Old Index