Subject: Re: RAW access to files
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Chuck Silvers <>
List: tech-kern
Date: 12/12/2001 22:40:59
On Wed, Dec 12, 2001 at 12:12:53PM -0800, Jonathan Stone wrote:
> In message <>Chuck Silvers writes
> [  access patterns with poor (or negatively correlated) locality
>   where readahead is obviously not a win, and may do positive harm]
> > [...] for applications that do large runs of sequential i/o, the same logic
> >applies as well, if the application isn't going to access the data multiple
> >times and it uses i/os of at least 64k.  read ahead doesn't gain you all
> >that much when you're doing large i/os, especially on modern disks that
> >do read ahead into the cache in the disk.  
> Yes, that is a conventional wisdom.
> I have some exeperience with very large apps, processing much more
> than 2^32 bits of data (think a linear pass over a terabyte or so,
> updating a one-gigabyte array as it goes). There, we found a huge
> performance win from doing reads in 1 or 2 Mbyte blocks -- the size of
> the disk buffer-- and using POSIX aio() or real threads to schedule
> read-ahead of those 2mbyte chunks. (At the time, this design forced a
> non-BSD solution.)

er, yea, I was over-generalizing there.  64k isn't "large" for some
applications.  the larger your disk throughput, the larger "large" becomes.
and I'm sure there are contexts where read ahead helps no matter what
the i/o size as well.

> Mmap() was a nonstarter; the app didnt fit in physical memory anyway,
> nevermind the linear once-only pass.
> I didnt dare try an ffs with 1meg blocks and 128k fragments and
> 2-block readahead: would that have worked?

FFS isn't especially well-defined for frag sizes larger than 8k.
the superblock has to be 8k into the device, but some of the code
wants the superblock size to be a multiple of the fragment size,
so that ends up with the superblock starting (and possibly ending)
in the middle of a fragment, which is a bit funky.  I'm sure someone
could make this work, but neither of the commercial unices I have
at hand just now (solaris and HP-UX) allow UFS frag sizes larger
than 8k, and netbsd just panics if you try to mount a 16k-frag fs.