Subject: Re: bus_space and barriers
To: None <,>
From: Chris Torek <torek@BSDI.COM>
List: tech-kern
Date: 10/24/2000 03:28:44
One should consider that, in modern systems, one of the biggest
"time-wasters" inside any given device driver can be each of the
actual reads or writes from the hardware.  That is, if:

	p->x = 3;
	p->y = 4;

has to write out to some legacy bus that takes 400 ns per write,
and the CPU runs at 1 instruction per 4 ns on average, these two
writes are 100 "instructions" long each.  The quickest way to speed
these up is to allow write-buffer delays in any case where these
are not a problem.

In other words, in this case, optimization directly interferes with
"cheap safety": if each bus "read" or "write" op implies full barriers,
programmers will not get surprised, but there is no way to optimize
the driver.

Unfortunately, the "most optimal" solution might in some cases be
overly complicated.  For instance, imagine hardware in which the
order of five writes and one read is irrelevant *except* that the
fourth write must occur last of all the writes, and the one read
must occur after the second write.  The "easy" way to express this

	x = read_item();

but this is overly constraining.  The "read" could have been deferred
to happen after "write E", but the "barrier" semantic is too big
a hammer, and the "write E" will wait for the "read" instead.

I think the only really sane way to express this all is with a
"topological map" and a compiler that goes from the map to the
native instruction set.  The "default map" could then be the 100%
safe "all read and write ops must be in sequence", and if you wanted
to optimize any given driver, you would write up a real map for
it:  "reads of X may be deferred until Y occurs", etc.  The
"map-optimizing compiler" would translate those into wmb()s, or
membar("#StoreLoad"), or whatever the target machine uses, as
appropriate.  This is a fairly tall order, though.