tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Using mmap(2) in sort(1) instead of temp files



On Wed, Apr 03, 2024 at 05:40:47PM +0000, Ice Cream wrote:
 > I'm trying to speed up sort(1) by using mmap(2) instead of temp
 > files.
 > 
 > ftmp() (see code below) is called in the sort functions to create and
 > return a temp file. mkstemp() is used to create the temp file, then
 > the file pointer (returned by fdopen) is returned to the sort functions
 > for use. I'm trying to understand where and how mmap should come
 > into the picture here, and how to implement this feature.

I expect the intent was to mmap the temporary files to avoid the
overhead incurred by stdio. (As opposed to just allocating memory,
which as Mouse points out carries problems.)

I'm not sure this is actually a good idea. Using raw file handles
instead of stdio FILEs might provide some speedup (depending on the
write patterns and how big the blocks written are) but it's never been
entirely clear that mmap is actually substantively faster than using
raw file handles. Meanwhile, there are several disadvantages:
   - as Mouse pointed out, you need to know the size in advance;
   - read or write errors on memory mapped files result in SIGSEGV,
     which is annoying to deal with and does actually turn up in the
     field sometimes (*);
   - even if you apply MADV_SEQUENTIAL with madvise(2) the mmap
     interface can't really do as good a job of prefetching;
   - on 32-bit platforms the size is limited.
 
FWIW.

 > PS: It was mentioned in the TODO file
 > > speed up sort(1) by using mmap(2) rather than temp files
 
I can't find this reference :-(

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index