port-i386: Re: Alternative to memory maps for large (> 10GB) files on ix86.

Subject: Re: Alternative to memory maps for large (> 10GB) files on ix86.
To: Alicia da Conceicao <alicia@cyberstation.ca>
From: Andreas Persson <pap@garen.net>
List: port-i386
Date: 10/16/2000 14:10:40

On Mon, Oct 16, 2000 at 01:18:45AM -0400, Alicia da Conceicao <alicia@cyberstation.ca> wrote:
>Memory maps provide the most efficent means of randomly accessing data
>within a file.  However they cannot be used for a file that exceed the
>available address space for a process (2 GB - 1B), used by signed 32 bit
>pointers.  In fact the available address space has to be shared by all
>memory mapped files loaded by the process.
The main advantage of mmap is zero-copy and less syscall overhead. Your
application sounds like it would be mostly io bound.

>Is there an efficent alternative to memory maps for huge (> 10 GB) files
>on NetBSD ix86, that can be used for many random database accesses.  I am
>hoping for a solution which doesn't involve many, many seeks & reads, or
>constant loading & unloading of memory maps containing small sections of
>each file?
Well to half seeks and reads, you can use pread(). Also, how "random"
is random? Most databases tend to have local working sets.

>Yes I know that 64 bit hardware like Sparc64, Alpha, & Mac can memory map
>huge files, but I need a solution for 32 bit ix86 hardware, since these
>are to be deployed on the client side with cheap, existing PC computers.
>
>Writing alternatives to tail & more/less that work using seek & read
>instead of memory maps would be quite easy to do.  But for applications
>that needs to read large amounts of data in random order from multiple 
>huge (> 10 GB) files, using many, many seeks & reads would be too
>inefficent.
> 
>Any advice on an efficent NetBSD ix86 memory map alternative for huge
>files would be appreciated.  Maybe combining some type of elaborate
>caching with seek & reads, like that which is done at the kernel level 
>with mmap's "SYS_mmap" system call?
I would try the mmap() slice approach first, and see what kind of
performance you get. Its simple and quick. If that doesn't work there
are probably more efficient cache methods which depend upon the database
and application.

>Thanks in advance. 
>Alicia.
> 
>

-- 
Andreas Persson
pap@garen.net

"I could be bounded in a nutshell and count myself a king of infinite space."
	--Shakespeare