Subject: Re: RAW access to files
To: None <>
From: David Laight <>
List: tech-kern
Date: 12/12/2001 13:26:31
IMHO 'direct io' seems the wrong solution, and will generate its own
set of problems...

mmap() ought to be more efficient than any direct i/o scheme since
is guarantees adequate alignment of the disk transfer.

The actual problem is persuading the kernel to discard the correct pages
(ie those you have just used, rather than the working set of your interactive
session) when doing (say) a large cp.

I suspect that pages that hold data that was used by an application mmap()
that has been unmapped should be near the top of the list of pages to reuse.
(not those mapped by temporary mappings for standard io though, and maybe
not code).  Certainly if certain flags are set - eg MADV_SEQUENTIAL.

The other 'trick' is that page aligned write() calls can be done by
'stealing' the memory page from the application (and using copy-on-write).
(anyone want to work out how to push a memory page through a pipe?)

For large databases, it might be appropriate to define some way of telling
the system that it has finished with a part of an mmaped file.


----- Original Message ----- 
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
To: <>
Sent: Wednesday, December 12, 2001 7:24 AM
Subject: Re: RAW access to files

> >> Because that's what benchmarks and real production use of some
> >> applications with and without direct io show :-)  I don't think "cp"
> >> is such a good example though, since you lose the readahead.
> > yes it is.  option -D<size> should do copy with direct I/O and given
> > buffer size (1MB should be ok for modern disk)
> So?  You still lose readahead.  If the disk does readahead (which most
> disks do, but not necessarily to the extent you want) and no other I/O
> to it happens in between (which may or may not be true), you may avoid
> the overhead of seek and rotate, but you'll still pay the penalty of
> various bus arbitrations and the data transfer.
> For this paradigm, what you really want is a call that says "start
> doing this I/O, let me continue to run, and give me some way to, later,
> block until it's finished".  Or perhaps "block until one of various
> such has finished".
> Given that, cp could do the readahead.  But do you really want it to?
> I don't see why it should have to.
> What you really want is a way to tell the kernel that buffers filled
> solely to satisfy this I/O should be aged into oblivion once the I/O
> completes, so that they are _first_ to be re-used.  Then something like
> cp of a file bigger than RAM doesn't push everything else out.
> ISTM that mmap with MADV_SEQUENTIAL and possibly judicious use of
> MADV_DONTNEED/MADV_FREE should do the right thing.  You'd still have to
> take care to make cp handle files bigger than it can mmap(), but I
> don't see it as unreasable to (say) have to do a half-dozen syscalls
> for every 64MB copied....
> >> For some (typically database-like) apps it can be a pretty big boost
> > and: gimp (tempfile), movies, audio editors, everything operating on
> > LARGE files
> Depends on what it does with them, of course.  And, especially these
> days with RAM so cheap, "LARGE" in the here-relevant sense is a lot
> larger than it used to be.  (For example, it's not that unusual to find
> a machine with enough RAM to hold a whole CD's contents.)
> /~\ The ASCII der Mouse
> \ / Ribbon Campaign
>  X  Against HTML
> / \ Email!      7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B