Subject: Re: Machine-independent bus DMA interface proposal
To: Dennis Ferguson <dennis@jnx.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 09/23/1996 18:03:54
On Mon, 23 Sep 1996 14:53:40 -0700
Dennis Ferguson <dennis@jnx.com> wrote:
> What I think Justin is complaining about is that the machine-independent
> format used for parameter passing becomes "much more" expensive (relatively)
> for buses which would otherwise require no bus-dependent state be kept
> at all, as it requires you to allocate something when you could get
> by with nothing (since allocating something is always many times more
> expensive than allocating nothing). Since buses which require no
> bus-dependent mapping state are quite common, this does cost.
The basis of Justin's arguments is that it's somehow significantly
more expensive. However, he has _not_ provided any evidence that
the additional overhead of copying the bus_dma_segment_t's address
and length parameters into a device's scatter/gather list is anything
more than negligible.
As an exercise, I spent a some time actually measuring the expense of
doing it "my way" (a) vs. "his way" (b).
Basically, I wrote a 2 small C programs ... Each of them calls a function
to fill an array of segments. (a) then loops through the array of segments
copying them manually to a new array of segments.
Both programs were compiled with -O0 -pg, and gmon run on the resulting
profile data.
For a 1024 segment loop (which is a corner case, but addresses the
concern Justin has about large DMAs), the amount of time spent in (a)
was not distinguishable from the amount of time spent in (b). That
is, it was so small that gmon could not measure it.
a:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 1 0.00 0.00 _foo [10]
0.0 0.00 0.00 1 0.00 0.00 _main [11]
b:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 1 0.00 0.00 _foo [10]
0.0 0.00 0.00 1 0.00 0.00 _main [11]
So, to simulate running over time, I wrapped the 1024 segment loop
inside another loop that ran 10,000 times. Here, you see the two
diverge a little:
a:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
56.2 6.78 6.78 1 6780.00 12000.00 _main [2]
43.3 12.00 5.22 10000 0.52 0.52 _foo [3]
b:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
99.4 5.35 5.35 10000 0.54 0.54 _foo [3]
0.2 5.36 0.01 1 10.00 5360.00 _main [2]
While "my way" is clearly a bit more expensive (6.64 extra seconds for
10,000 corner-case calls), I don't see the relative cheapness of "his way"
as a compelling argument to implement machine-dependent portions of
otherwise totally machine-independent drivers, while the need for
machine-independent bus DMA mapping still does not go away. The alpha
needs it, the ARC needs it, the pmax needs it, and even the i386 needs it.
> Given that the machine-independent state is only needed for parameter
> passing, it seems to me that that there are other possibilities which
> eliminate the machine-independent state altogether. For example,
> in bus_dmamap_load() allow the driver to specify two additional
> arguments, an opague buffer into which the driver-dependent data
> will be formatted and a procedure handle (into the driver code)
> which is called with each address/length generated by the s/g code.
This is actually somewhat expensive ... you're talking about jumping
through a function pointer every time you translate a kva to a bus
physcal address. That also creates some unneeded spaghetti.
-- save the ancient forests - http://www.bayarea.net/~thorpej/forest/ --
Jason R. Thorpe thorpej@nas.nasa.gov
NASA Ames Research Center Home: 408.866.1912
NAS: M/S 258-6 Work: 415.604.0935
Moffett Field, CA 94035 Pager: 415.428.6939