Re: pserialize(9) vs. TAILQ

To: Dennis Ferguson <dennis.c.ferguson%gmail.com@localhost>
Subject: Re: pserialize(9) vs. TAILQ
From: Masao Uebayashi <uebayasi%gmail.com@localhost>
Date: Sat, 22 Nov 2014 01:03:58 +0900

On Thu, Nov 20, 2014 at 4:45 PM, Dennis Ferguson
<dennis.c.ferguson%gmail.com@localhost> wrote:
>
> On 20 Nov, 2014, at 11:07 , Masao Uebayashi <uebayasi%gmail.com@localhost> wrote:
>> On Wed, Nov 19, 2014 at 2:49 PM, Dennis Ferguson
>> <dennis.c.ferguson%gmail.com@localhost> wrote:
>>> That's very true, but the code is only correct if the writes occur
>>> in the order the C code has them written (i.e. tqe_next in the
>>> new element is written before writing the tqe_next of the existing
>>> list element to point at it).  To ensure the compiler or the
>>> cache hardware doesn't reorder those writes it needs a write barrier
>>> inserted to enforce the order as written.
>>
>> Yes, in the pserialize(9) + TAILQ() case, everything except tqe_next
>> is only written by writers, and protected with a mutex, which ensures
>> (implies) ordering.  What is ordered by what has to be clearly
>> documented.
>
> We might be on the same page, but your response is confusing me enough
> that it might be useful to make this explicit.  The guts of the current
> TAILQ_INSERT_TAIL() macro look like this:
>
>         (elm)->field.tqe_next = TAILQ_END(head);
>         (elm)->field.tqe_prev = (head)->tqh_last;
>         *(head)->tqh_last = (elm);
>         (head)->tqh_last = &(elm)->field.tqe_next;
>
> In standard C (assuming no volatile variable declarations) this replacement
> sequence is exactly equivalent to the above
>
>         *(head)->tqh_last = (elm);
>         (elm)->field.tqe_prev = (head)->tqh_last;
>         (head)->tqh_last = &(elm)->field.tqe_next;
>         (elm)->field.tqe_next = TAILQ_END(head);
>
> because the end result of executing the 4 lines in either order is identical
> and, in a single-threaded process, that is all that matters.  Since the
> sequences are equivalent the compiler's optimizer and the machine's cache
> hardware are free to turn one sequence into the other if there is a
> performance advantage gained by doing so.
>
> If (and only if) there may be a concurrent reader of the structure it is
> pretty clear that the second sequence can't work; a concurrent reader may
> see (elm)->field.tqe_next uninitialized.  What is also true, but not so
> clear, is that the while the first sequence looks right it is equally incorrect
> since it is equivalent to the second.  The code is only correct if the required
> ordering is enforced by a barrier, e.g.
>
>         (elm)->field.tqe_next = TAILQ_END(head);
>         (elm)->field.tqe_prev = (head)->tqh_last;
>         membar_producer();
>         *(head)->tqh_last = (elm);
>         (head)->tqh_last = &(elm)->field.tqe_next;
>
> That tells the compiler to avoid reordering the stores done in code before
> the barrier to after the barrier, and vice versa, and to do what is needed
> (if anything) to tell the cache hardware to do the same (on an x86 only
> the compiler barrier is necessary).  It makes the ordering you need
> explicit.

OK, I overlooked the obvious "new elm's next" -> "prev elm's next" ordering.

> The conclusion is that while the current TAILQ_REMOVE() macro will work
> unchanged to maintain a list with concurrent readers (given no requirement
> to run on an MP DEC Alpha) the TAILQ_INSERT_*() macros will not, even
> though they look correct.  The mutex and/or the use of pserialize() does
> nothing to change this.  If you want to insert things into a TAILQ list
> without blocking concurrent readers you must use alternative macros that
> have membar_producer() calls in the right spots.

I completely disagree with your conclusion.

Your basic idea seems that ``member_{prroducer,consumer}() are cheaper
than membar_{enter,leave}, let's use cheaper ones''.  That way you end
up concluding that you need memory barrier in every iteration.  Thant
is contradictory to what you previously said.

The point is, pserialize(9) is totally different from pcq(9).

In pcq(9), like pkgqueue(9), the data structure is updated all the
time by producers and consumers.  In pserialize(9), like ifnet_list,
the data structure is updated extremely rarely.  (How many times is
ifnet_list updated on your machine a day?)

The problem of TAILQ_INSERT_*() macros with respect to pserialize(9)
is that, as you are pointing out, they update new elm's next pointer
and prev elm's next pointer almost simultaneously.  So you have to
ensure those ordering in the reader's code path.  But consider that
this is pserialize(9); where the target elements (E_p and E_q) are
unlikely to move while inserting a new element (E_new).  There is no
point to update E_new.next in the "update" section.  If E_new.next has
been updated *long enough before* E_p.next is updated, that ordering
is not a problem at all.

You are also ignoring mutex_{enter,exit} imply membar_{enter,leave}.

This pseudo code is what I have come up with.  Still WIP, but at least
solves E_new.next -> E_p.next ordering.

int
insert(E *E_new, ...)
{
	/* Insert E_new between E_p and E_q. */
	E *E_p, *E_q;

	/* Generation. */
	int gen;

retry:
	E_p = E_q = NULL;

	/*
	 * Find the target entries.
	 */
	pserialize_read_enter();
	TAILQ_FOREACH(...) {
		if (...) {
			/* Ensure that E_p.next and generation are sync'ed. */
			mutex_spin_enter();

			/* Remember the found target. */
			E_p = ...;
			E_q = E_p.next;

			/* Remember the generation. */
			gen = TAILQ_GENERATION();

			mutex_spin_exit();

			break;
		}
	}
	pserialize_read_exit();

	if (E_p == NULL)
		/* No target found; insertion failed. */
		return 1;

	/* Prepare E_new.  Not globally visible, no lock required yet. */
	E_new.prev = E_p;
	E_new.next = E_q;

	/*
	 * Enter the writer section.
	 *
	 * This implies member_enter(); Assignment to E_new.next is
	 * ensured to be globally visible by subsequent read/write.
	 *
	 * Especially E_new.next is updated before E_p.next is updated.
	 */
	mutex_enter();
	{
		/*
		 * Check if the generation has changed.
		 */
		if (gen != TAILQ_GENERATION()) {
			/* If changed, retry. */
			mutex_exit();
			goto retry;
		}

		/*
		 * Atomically update the pointer link.
		 *
		 * E_new.next is already ensured to be globally visible.
		 */
		E_p.next = E_new;

		/*
		 * Ensure that all readers see the updated E_p.next.
		 */
		pserialize_perform();

		/*
		 * Lazily update prev pointer protected with mutex.
		 */
		E_q.prev = E_new;

		/*
		 * Update the generation.
		 */
		TAILQ_GENERATION()++;
	}
	mutex_exit();

	return 0;
}

Follow-Ups:
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson
- Re: pserialize(9) vs. TAILQ
  - From: Taylor R Campbell

References:
- pserialize(9) vs. TAILQ
  - From: Masao Uebayashi
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson
- Re: pserialize(9) vs. TAILQ
  - From: Masao Uebayashi
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson
- Re: pserialize(9) vs. TAILQ
  - From: Masao Uebayashi
- Re: pserialize(9) vs. TAILQ
  - From: Dennis Ferguson

Prev by Date: Re: drivers customizing mmap(2)
Next by Date: Re: pserialize(9) vs. TAILQ
Previous by Thread: Re: pserialize(9) vs. TAILQ
Next by Thread: Re: pserialize(9) vs. TAILQ
Indexes:

Home | Main Index | Thread Index | Old Index