Re: Anomalies while handling p_nstopchild count

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: Anomalies while handling p_nstopchild count
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Tue, 13 Oct 2015 02:27:47 +0000

   Date: Tue, 13 Oct 2015 08:42:55 +0700
   From: Robert Elz <kre%munnari.OZ.AU@localhost>

   The code is (pre fix) ...

                   p->p_waited = 0;
                   membar_producer();
                   p->p_stat = SSTOP;

   My assumption is that the objective was to ensure that no-one would
   ever see p_stat == SSTOP; before the p_waited = 0 is also visible.

   That can be important, as p_lock (nor any other relevant lock) is not
   acquired before inspecting p_stat (see for example find_stopped_child())
   so there's nothing preventing another CPU from racing against this pair of
   assignments, which without a memory barrier of some kind could be
   reordered by the hardware, and detecting p_stat == SSTOP and p_waited != 0.

   I don't believe that membar_consumer() is relevant at all, no-one
   (for this purpose) cares what order reads from memory actually happen.

In this case, and generally[*], you need a membar_consumer matching
the membar_producer in order for the latter to have any useful effect.
Otherwise, if another CPU executes code that looks like

if (p->p_stat == SSTOP && p->p_waited == 0 && ...) ...;

such as in find_stopped_child, then the following sequence of events
might happen, due, for example, to CPU instruction reordering:

CPU 0                   CPU 1
                        tmp0 = p->p_waited  // gets 1
p->p_waited = 0
                        tmp1 = p->p_stat
p->p_stat = SSTOP
                        if (tmp1 == SSTOP && tmp0 == 0 && ...)

The membar_consumer forces CPU 1 not to reorder loads like that for
any reason.

On closer inspection, it looks like:

(a) The protocol is that under proc_lock but *not* p->p_lock, p_waited
must be written before p_stat, and p_stat must be read before
p_waited, as an optimization to avoid mutex/atomics.

(b) There's a missing membar_consumer in find_stopped_child and in
proc_reparent (but maybe it doesn't matter on x86, if all current
implementations will not in practice reorder those loads).

(c) The write to p->p_waited in exit1 looks fishy: it holds p->p_lock,
but not proc_lock.

That said, there is no missing membar_producer in fork1, kpsignal2,
proc_stop_done, proc_unstop, or sys_ptrace, because they all hold both
proc_lock and p->p_lock.

[*] One exception: data-dependent reads, as in q = *p; v = *q, need
only membar_datadep_consumer, which is a noop on most platforms,
unlike membar_consumer.

   Paul's question really amounted to, I think, whether either the fact that
   proc_lock will be held while p_waited is set to 0 would make a difference
   (I can't see why that would really matter) or more likely, since the code
   will become

                   p->p_waited = 0;
                   p->p_pptr->p_nstopchild++;
                   mutex_exit(proc_lock);
                   membar_producer();
                   p->p_stat = SSTOP;

   whether the mutex_exit() makes the membar_producer() redundant.

And the answer to that is yes.

Follow-Ups:
- Re: Anomalies while handling p_nstopchild count
  - From: Robert Elz
- Re: Anomalies while handling p_nstopchild count
  - From: Robert Elz

References:
- Re: Anomalies while handling p_nstopchild count
  - From: Robert Elz

Prev by Date: Re: Anomalies while handling p_nstopchild count
Next by Date: Re: Anomalies while handling p_nstopchild count
Previous by Thread: Re: Anomalies while handling p_nstopchild count
Next by Thread: Re: Anomalies while handling p_nstopchild count
Indexes:

Home | Main Index | Thread Index | Old Index