Subject: Re: lseek() extension for spare files
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 09/21/2006 22:01:45
> So, it looks like these new whence arguments are essentially designed
> to enumerate the holes or filled regions of the file.

That was basically my initial reaction too.

On reflection, I think that was a rather shallow reaction.  I now think
they are a very smart choice.  Enumerating holes in a file could have
been done many other ways, and this mechanism supports a lot more than
just hole enumeration.  (I'd actually like to see versions that look
for the last hole/data *before* the offset, too....)

Actually, this sets a precedent of exposing holes in files to the API,
but does it only halfway.  I think that if we're going to do that, we
should also add a way to say "discard this byte-range, creating holes
where possible", with semantics equivalent to overwriting with zeros
except with respect to holes.

> I don't see anything particularly wrong with having it (note I have
> not yet reviewed the actual patch), but what's the application usage?

Dealing with large, very sparse files with reasonable efficiency?
Without something like that I can create a file that fits on a floppy
but is many, many terabytes long, and nobody can find the six bytes in
it that aren't zeros with anything less than ruinously expensive
searches through huge quantities of zeros - or going under the hood of
the filesystem.

It also would be nice to be able to fill in the holes in a file without
unnecessary rewriting of non-hole regions.  I know of only two things
I'd want to do this for - kernels and vnd backing files - but they're
useful enough it would be nice to have.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B