Subject: Re: Alternative to memory maps for large (> 10GB) files on ix86.
To: Alicia da Conceicao <alicia@cyberstation.ca>
From: Steven M. Bellovin <smb@research.att.com>
List: port-i386
Date: 10/16/2000 09:51:03
In message <200010160518.BAA11321@pikachu.cyberstation.ca>, Alicia da Conceicao
writes:
>Writing alternatives to tail & more/less that work using seek & read
>instead of memory maps would be quite easy to do. But for applications
>that needs to read large amounts of data in random order from multiple
>huge (> 10 GB) files, using many, many seeks & reads would be too
>inefficent.
Actually, that's not clear to me. The I/O to the disk has to take
place in any event, and learning what blocks to read via a seek() call
is probably cheaper than fielding a page fault interrupt. And if
you're doing a lot of this type of I/O, having the files memory-mapped
would also drive the VM system crazy; it's not the sort of behavior
it's tuned for.
There are two caveats here. First, *if* there will moderate locality
of reference, a memory-mapped solution might win, because the necessary
pages may still be resident and mapped. You can deal with that by
maintaining your own cache. Second, if the chunks of data being
processed are moderately large, there is considerable expense in
copying the data to/from user space. If that's your problem, the right
sort of solution might just be lots of mmaps. I suggest
experimentation; it's not at all clear to me that there's an efficiency
issue. Mapping entire files is much more convenient for the
programmer, but it's rarely faster.
--Steve Bellovin