Subject: Re: Alternative to memory maps for large (> 10GB) files on ix86.
To: Alicia da Conceicao <alicia@cyberstation.ca>
From: Steven M. Bellovin <smb@research.att.com>
List: port-i386
Date: 10/16/2000 09:51:03
In message <200010160518.BAA11321@pikachu.cyberstation.ca>, Alicia da Conceicao
 writes:

>Writing alternatives to tail & more/less that work using seek & read
>instead of memory maps would be quite easy to do.  But for applications
>that needs to read large amounts of data in random order from multiple 
>huge (> 10 GB) files, using many, many seeks & reads would be too
>inefficent.

Actually, that's not clear to me.  The I/O to the disk has to take 
place in any event, and learning what blocks to read via a seek() call 
is probably cheaper than fielding a page fault interrupt.  And if 
you're doing a lot of this type of I/O, having the files memory-mapped 
would also drive the VM system crazy; it's not the sort of behavior 
it's tuned for.

There are two caveats here.  First, *if* there will moderate locality 
of reference, a memory-mapped solution might win, because the necessary 
pages may still be resident and mapped.  You can deal with that by 
maintaining your own cache.  Second, if the chunks of data being 
processed are moderately large, there is considerable expense in 
copying the data to/from user space.  If that's your problem, the right 
sort of solution might just be lots of mmaps.  I suggest 
experimentation; it's not at all clear to me that there's an efficiency 
issue.  Mapping entire files is much more convenient for the 
programmer, but it's rarely faster.

		--Steve Bellovin