Subject: Re: RAW access to files
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Wojciech Puchar <wojtek@chylonia.3miasto.net>
List: tech-kern
Date: 12/12/2001 22:50:22
> disks do, but not necessarily to the extent you want) and no other I/O
> to it happens in between (which may or may not be true), you may avoid
> the overhead of seek and rotate, but you'll still pay the penalty of
> various bus arbitrations and the data transfer.

with such large blocks this isn't noticable.

> For this paradigm, what you really want is a call that says "start
> doing this I/O, let me continue to run, and give me some way to, later,
> block until it's finished".  Or perhaps "block until one of various
> such has finished".

O_DIRECT and O_ASYNC should give it (assuming O_DIRECT will be
implemented)

>
> Given that, cp could do the readahead.  But do you really want it to?
> I don't see why it should have to.
me too
>
> What you really want is a way to tell the kernel that buffers filled
> solely to satisfy this I/O should be aged into oblivion once the I/O
> completes, so that they are _first_ to be re-used.  Then something like
> cp of a file bigger than RAM doesn't push everything else out.

this is one thing and doing something like slowlaris does (marking this
way buffers when more than X bytes is read sequentially from file) will
solve this BUT direct  I/O mean NO buffers at all = directly read data to
userspace (NO MEMCPY)
>
> ISTM that mmap with MADV_SEQUENTIAL and possibly judicious use of
> MADV_DONTNEED/MADV_FREE should do the right thing.  You'd still have to
> take care to make cp handle files bigger than it can mmap(), but I
> don't see it as unreasable to (say) have to do a half-dozen syscalls
> for every 64MB copied....
>
> >> For some (typically database-like) apps it can be a pretty big boost
> > and: gimp (tempfile), movies, audio editors, everything operating on
> > LARGE files
>
> Depends on what it does with them, of course.  And, especially these
> days with RAM so cheap, "LARGE" in the here-relevant sense is a lot
> larger than it used to be.  (For example, it's not that unusual to find

"large" should depend of machine's resources. but it always can be defined