tech-userlevel: Re: mmap (was Re: bin/10625: /usr/bin/cmp)

Subject: Re: mmap (was Re: bin/10625: /usr/bin/cmp)
To: None <tech-userlevel@netbsd.org>
From: Chris Torek <torek@BSDI.COM>
List: tech-userlevel
Date: 07/29/2000 12:05:06

>> Theoretical exercises for you: :-)
>> 
>> - When and why would read() necessarily be any different from
>>   mmap()-and-access?  

>In what sense?  :-)

In any sense -- hence "theoretical". :-)

Seriously, read() and write() actually convey more information
far more directly than does mmap().  In particular, when dealing
with "real world" I/O devices, you often need the input or output
operation to be "atomic", and/or to occur in sizes other than
"one page", and so on.

Still:

>If read() were implemented in such a way that the kernel could map the
>user-supplied buffer into its own space so that the device driver
>(perhaps even through DMA) could fill it directly then there would be
>literally no difference in the underlying behaviour between read() and
>mmap().

Right.  In particular, a block-sized through-the-file-system read
on a block boundary into a whole number of pages "ought" (in some
cases; efficiency becomes a big issue, especially with virtually
indexed caches with penalties for wrong colorings) to result in
having the buffer cache page(s) themselves mapped copy-on-write
into the user's address space.  (If the reads are behaving in FIFO
fashion, the page(s) can be given directly to the user -- lost from
the buffer cache -- rather than handed out COW, especially if the
target of the read has been madvise()ed appropriately.  In this
case there is indeed no difference at all.)

Writes are less important than reads, in general memory and I/O
traffic analyses.  (Reads tend to use about 80% of the work, and
writes tend to be done write-behind so that they do not affect
performance as much.)  There are lots of specific counterexamples
of course, but most of the payoff lies mostly in reads.

Note that mmap() followed by "access" might sometimes want to
populate the buffer cache (!), because some library may be using
mmap+access to simulate read, in which case you want to get read-ahead
to work for you properly.  (Imagine a library that mmap's the file
in 64K chunks, while you happen to be using a RAID where I/O is
best done in 32 MB stripes, on a machine with 400 GB of physical
RAM, or something.)

Chris