Subject: Re: Bug in mmap() for files with holes
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/14/2007 18:36:52
> Actually, there's a difference in behavior here that still really
> bothers me.

> If I write one byte of 0 *at offset 0* then lseek to 1024, then mmap
> the file with a length of 1024, memory-reads from the entire mapping
> return 0.

> If I just create the file and do the lseek, memory-reads from any
> part of the mapping cause a SEGV.

> This seems wrong.

Abstractly, it is.  But it is hard and expensive to make mmap() draw
the line between working and segfaulting other than at a page boundary;
on architectures with fixed-size pages (which AIUI is most of them),
doing so requires taking a trap to the kernel for every access to that
page.  I recall a discussion about basically this issue, recently;
specifically, about how writing to a mapped region past EOF and then
re-mapping it later can return the previously-written data.

I suspect that if you create a file that's getpagesize() bytes long and
mmap 2*getpagesize() bytes of it, you'll find that the second page
segfaults.  Make the file getpagesize()+1 bytes long and I suspect
you'll find that the second page "works", much as the first (only) page
of your test file here "works".  It's just rounding the file size up to
the next page boundary; your 0-byte vs 1-byte difference is just a
special case of that rounding up.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B