tech-userlevel: Re: mmap (was Re: bin/10625: /usr/bin/cmp)

Subject: Re: mmap (was Re: bin/10625: /usr/bin/cmp)
To: Wolfgang Rupprecht <wolfgang@wsrcc.com>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-userlevel
Date: 07/30/2000 11:12:36
    Date:        28 Jul 2000 16:32:42 -0700
    From:        Wolfgang Rupprecht <wolfgang@wsrcc.com>
    Message-ID:  <x7snstltk5.fsf@capsicum.wsrcc.com>

  | It would simplify the code greatly if I could just map my DB into
  | memory, have the OS to suck in only the pages the code touched and
  | have the pages that changed automatically flushed back out at some
  | later time.  In one case the code is simple enough to inspect for
  | correctness.  In the other case I basically re-implement a VM in user
  | space and hope that I didn't pooch the logic.  ;-)

That sounds like the kind of application for which mmap() (or perhaps
an mmap that worked slightly differently) would be well suited.  If there
were no such applications, the interface wouldn't have been invented in
the first place.

The point Chris Torek was making (I believe) is that replacing read()
for processes that simply want to sequentially read a file (like cmp,
or grep) just isn't what mmap is good at, or should be.

While it might, today, be more system efficient to use mmap() for i/o,
rather than spending time finding applications and converting them,
vastly more progress could be made by spending the time and implementing
a better read() model, and then making sure that stdio is able to use
it effectively (which it most probably already is).   That way all the
common applications benefit, rather than just those few that have had
someone painstakingly rip out the simple interface and replace it with
a complicated one (it needs to be complicated, because of the limitations
on mmap() that have been pointed out).

Watching read behaviour is comparatively simple for the OS - all unix
systems (that I have seen anyway, since 6th edition or before) notice
that a process is doing sequential reads, and implement read ahead, to
reduce the latency for the next read the process is almost certain to do.
In the late 70's, at the University of Sydney, there was a bunch of work
done on extending that from just one read ahead block to many, to find
the optimum number for the process, so processes essentially never needed
to wait for filesystem data - it would be there when needed (other than
no-op dummy programs that don't actually use the data being read, where
no drive built has even been fast enough).   That kind of optimisation would
probably still be useful - but it relies upon detecting sequential read
behaviour.  mmap() is a random access interface, and while it would be
possible to trace the page faults in the mmap area (while excluding others)
and infer a sequential access pattern, that's a lot harder than just
watching the file offset sequentially increasing...

kre