tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pserialize(9) vs. TAILQ



On 22 Nov, 2014, at 00:03 , Masao Uebayashi <uebayasi%gmail.com@localhost> wrote:

> On Thu, Nov 20, 2014 at 4:45 PM, Dennis Ferguson
> <dennis.c.ferguson%gmail.com@localhost> wrote:
> OK, I overlooked the obvious "new elm's next" -> "prev elm's next" ordering.
> 
>> The conclusion is that while the current TAILQ_REMOVE() macro will work
>> unchanged to maintain a list with concurrent readers (given no requirement
>> to run on an MP DEC Alpha) the TAILQ_INSERT_*() macros will not, even
>> though they look correct.  The mutex and/or the use of pserialize() does
>> nothing to change this.  If you want to insert things into a TAILQ list
>> without blocking concurrent readers you must use alternative macros that
>> have membar_producer() calls in the right spots.
> 
> I completely disagree with your conclusion.
> 
> Your basic idea seems that ``member_{prroducer,consumer}() are cheaper
> than membar_{enter,leave}, let's use cheaper ones''.  That way you end
> up concluding that you need memory barrier in every iteration.  Thant
> is contradictory to what you previously said.

You need to fix the store ordering in TAILQ_INSERT_*().  If you want
to use a different barrier to do it that's fine with me.

I was happy with no barrier in the reader when I thought providing
support for the DEC Alpha architecture's unique cache quirk wasn't
necessary.  When it turned out support for the Alpha was necessary
the reader needed to have a barrier inserted that looks like it gets
executed after every read pointer read.  Note, however, that this is
a barrier which does nothing at all on any machine other than an
Alpha. Nothing has changed, nothing has been contradicted, no cost
has been added to readers.  Taylor might like to see barrier
operations paired up in C code but there is no pairing at the
hardware instruction level.  The reader barrier has no cost on any
machine other than the Alpha, the only machine which has an
architectural requirement to generate an instruction there.

I understand why you are unhappy with the barrier.  It seems
stupid to have to execute a not-inexpensive hardware instruction
over and over and over again in the reader to deal with something
that a writer is almost never going to do, but since you have to do
this on an Alpha (and only an Alpha) to make this code correct,
why can't the conclusion be that it sucks to be a DEC Alpha,
and just get on with it?  This is probably a perfect example
of why no processor other than the Alpha has had this quirk in
its cache architecture, nor is there likely to be another in
future; this quirk has a cost that is annoying.  Indeed, judged
from the fact that people think NetBSD runs on an MP Alpha
okay without this barrier, it may be that even later DEC Alpha
implementations omitted this misfeature.

> point to update E_new.next in the "update" section.  If E_new.next has
> been updated *long enough before* E_p.next is updated, that ordering
> is not a problem at all.

"long enough" is not a specific length of time.  If you can
find a spec for the amount of time which is "long enough" you
can wait that long instead.  Just calling pserialize_perform()
when inserting entries (where no pseralize_perform() is required)
because pserialize_perform() is a fantastically expensive function
and takes a long time, doesn't do anything if you don't know
how long is "long enough", not to mention that you'd be executing
this extra-expensive DELAY() on all archectures to eliminate a cost
elsewhere that is unique to the DEC Alpha.

This is only a problem for the Alpha.  The barrier fixes it for
the Alpha without costing other architectures anything.  That seems
like the right thing to do.

Dennis Ferguson



Home | Main Index | Thread Index | Old Index