tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Using mmap(2) in sort(1) instead of temp files



On Thu, Apr 04, 2024 at 12:02:30PM +0000, Ice Cream wrote:
> Given the issues about using mmap, can anybody suggest how
> I should proceed with the implementation, or if I should at all?

There are two potential ways where mmap(2) could help improve the speed
of sort:

 - If you know the input file name, use a read-only mmap() of that file
   and avoid all buffering. Downside: you can not store \0 at the end of
   a line anymore and need to deal with char*/size_t pairs for strings.

 - You use "swap space" instead of a temporary file by doing

	mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0);

   and then use the returned pointer for temporary stuff. Obviously you
   can not do that for arbitarary sizes, so maybe you have to keep
   the old code and only do this trick if the file size is small enough
   or you process the file in pieces or whatever.

Since the original comment hints at "instead of temp files" it is pretty
clear that the second variant is meant. This avoids all file system operations
and if the machine you run on has enough free memory it might not even actually
touch swap space.

Martin


Home | Main Index | Thread Index | Old Index