Subject: Re: But why?
To: None <torek@BSDI.COM>
From: David S. Miller <firstname.lastname@example.org>
Date: 10/23/1996 22:04:49
Date: Wed, 23 Oct 1996 19:10:22 -0600 (MDT)
From: Chris Torek <torek@BSDI.COM>
Benchmarks are useful because they give you a consistent measure.
Benchmarks are harmful, however, when the measure they give you is
not a measure of `real' performance on `real' applications.
Unfortunately, `real' applications (a) vary from one person to the
next and (b) rarely work well as benchmarks.
I think lmbench certainly measures 'real' performance, at least for
the application types it was geared for. (I guess larry can give us
what this "application type" was when he initially wrote it all)
One problem with optimizing system calls in general is that only
benchmarks spend a large fraction of time making repeated getpid()
calls, and speeding up such a benchmark is not useful.
lmbench uses read() on /dev/null if you didn't know.
On the other hand, applications that are important to someone *do*
spend a lot of time making, say, read() or write() calls -- and
making getpid() faster also makes those faster. The question (for
which I do not have the answer) is, how *much* faster, and should
the effort be put into the syscall stub, or into the path within
the file system read() call? The time for a read() may turn out to
be dominated by byte copies that could be eliminated entirely via
page-mapping (e.g., replace the user's buffer pages with COW pages
that alias the buffer cache).
I'd say many applications sit around doing:
a) reading and writing small "protocol control" information
over TCP between client and server
b) doing bulk transfers over tcp
c) mmap()'ing a file and scanning over large tracts or it
d) read()'ing from a heavily accessed file, which most likely
is sitting in the buffer cache already
e) fork()'ing and exec()'ing new tasks
f) switch()'ing from client to server
g) transferring data via a pipe (see 'f')
I could go on and on, and lmbench measures everything I have mentioned
thus far. As do some other benchmarks, to different degrees of
>Alan Cox just devised a way for Linux/SPARC to avoid packet
copying on >our networking stack ...
This is not a micro-optimization. (Neither, for that matter, is
the `system calls via normal subroutine calls' trick, although this
is probably not the place to *start* optimizing.)
If you lack a firm foundation (ie. you thought about the trick early
on when you first put the pieces of the system together) it is much
more difficult to go back and "do it later". At least has been my
painful experience most of the time when I messed up an interface and
had to completely redo it later to get it "right".
In particular, for applications that spend all their time sending
bulk network data, eliminating these copies eliminates the place
they spend most of their time -- a network send is, or should be,
dominated by the time spent copying those bytes.
`If the performance ain't crankin', you're just yankin'.''
- Steve Alexander
David S. Miller