Re: pserialize(9) vs. TAILQ

To: Masao Uebayashi <uebayasi%gmail.com@localhost>
Subject: Re: pserialize(9) vs. TAILQ
From: Dennis Ferguson <dennis.c.ferguson%gmail.com@localhost>
Date: Thu, 20 Nov 2014 15:45:22 +0800

On 20 Nov, 2014, at 11:07 , Masao Uebayashi <uebayasi%gmail.com@localhost> wrote:
> On Wed, Nov 19, 2014 at 2:49 PM, Dennis Ferguson
> <dennis.c.ferguson%gmail.com@localhost> wrote:
>> That's very true, but the code is only correct if the writes occur
>> in the order the C code has them written (i.e. tqe_next in the
>> new element is written before writing the tqe_next of the existing
>> list element to point at it).  To ensure the compiler or the
>> cache hardware doesn't reorder those writes it needs a write barrier
>> inserted to enforce the order as written.
> 
> Yes, in the pserialize(9) + TAILQ() case, everything except tqe_next
> is only written by writers, and protected with a mutex, which ensures
> (implies) ordering.  What is ordered by what has to be clearly
> documented.

We might be on the same page, but your response is confusing me enough
that it might be useful to make this explicit.  The guts of the current
TAILQ_INSERT_TAIL() macro look like this:

        (elm)->field.tqe_next = TAILQ_END(head);
        (elm)->field.tqe_prev = (head)->tqh_last;
        *(head)->tqh_last = (elm);
        (head)->tqh_last = &(elm)->field.tqe_next;

In standard C (assuming no volatile variable declarations) this replacement
sequence is exactly equivalent to the above

        *(head)->tqh_last = (elm);
        (elm)->field.tqe_prev = (head)->tqh_last;
        (head)->tqh_last = &(elm)->field.tqe_next;
        (elm)->field.tqe_next = TAILQ_END(head);

because the end result of executing the 4 lines in either order is identical
and, in a single-threaded process, that is all that matters.  Since the
sequences are equivalent the compiler's optimizer and the machine's cache
hardware are free to turn one sequence into the other if there is a
performance advantage gained by doing so.

If (and only if) there may be a concurrent reader of the structure it is
pretty clear that the second sequence can't work; a concurrent reader may
see (elm)->field.tqe_next uninitialized.  What is also true, but not so
clear, is that the while the first sequence looks right it is equally incorrect
since it is equivalent to the second.  The code is only correct if the required
ordering is enforced by a barrier, e.g.

        (elm)->field.tqe_next = TAILQ_END(head);
        (elm)->field.tqe_prev = (head)->tqh_last;
	membar_producer();
        *(head)->tqh_last = (elm);
        (head)->tqh_last = &(elm)->field.tqe_next;

That tells the compiler to avoid reordering the stores done in code before
the barrier to after the barrier, and vice versa, and to do what is needed
(if anything) to tell the cache hardware to do the same (on an x86 only
the compiler barrier is necessary).  It makes the ordering you need
explicit.

The conclusion is that while the current TAILQ_REMOVE() macro will work
unchanged to maintain a list with concurrent readers (given no requirement
to run on an MP DEC Alpha) the TAILQ_INSERT_*() macros will not, even
though they look correct.  The mutex and/or the use of pserialize() does
nothing to change this.  If you want to insert things into a TAILQ list
without blocking concurrent readers you must use alternative macros that
have membar_producer() calls in the right spots.

Are we on the same page?

Dennis Ferguson

Follow-Ups:
- Re: pserialize(9) vs. TAILQ
  - From: Masao Uebayashi

References:
- pserialize(9) vs. TAILQ
  - From: Masao Uebayashi
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson
- Re: pserialize(9) vs. TAILQ
  - From: Masao Uebayashi
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson
- Re: pserialize(9) vs. TAILQ
  - From: Masao Uebayashi

Prev by Date: Re: pserialize(9) vs. TAILQ
Next by Date: Re: struct ifnet and ifaddr handling [was: Re: Making global variables of if.c MPSAFE]
Previous by Thread: Re: pserialize(9) vs. TAILQ
Next by Thread: Re: pserialize(9) vs. TAILQ
Indexes:

Home | Main Index | Thread Index | Old Index