tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: posix_fallocate

    Date:        Sun, 17 Nov 2013 03:18:56 +0000 (UTC)
    From: (Christos Zoulas)
    Message-ID:  <l69cj0$f0v$>

  | In article <>,
  | Emmanuel Dreyfus <> wrote:
  | >NetBSD-current seems to lack posix_fallocate(2)
  | FreeBSD has it as a system call. It should be easy to dup.

I would suggest avoiding it.   While the objective for it looks clear,
and perhaps even useful, to me it doesn't seem to be implementable safely.

To me there appears to be just two ways to implement this - the safest would
be a complex reservation scheme, which would account for blocks reserved to
a file as if they were actually allocated, and so reducing the available
space for other allocations on the filesystem.   To me that looks to be
an accounting nightmare to actually implement correctly in all cases (there
are so many weird situations that would need solutions.)

Alternatively, the system could actually allocate all required blocks at
the time of the posix_fallocate() call - effectively filling in any holes
in the designated region of the file.   The spec doesn't say what data is
to be put in the blocks allocated to fill the holes (a well behaved
application wouldn't care, as it would normally write to the file before
reading it, and would be using fallocate to guarantee that the entire set
of write sys call it needed to make would succeed (or the fallocate()
would fail), and the system could not run out of space half way through.)

There would seem to be just two viable choices - fill the blocks with 0's,
or leave random data there.

The latter isn't really a choice, it is a security hole a mile wide, so
fill with 0's would be the only real option.  The problem is that this opens
a trivial DoS attack like ....

        for (;;) {
                posix_fallocate(fd, (off_t)0, huge);

where the (off_t) huge is howwver big the application can get away with
without failing.

For a sys call that is merely advisory to implement (not required)
this all seems like a poor idea to me.

Any application that really needs the function can duplicate it in user
space (just a loop of read/write sys calls over the range required) which
then costs user space resources, rather than kernel (or at least, not just


ps: I have not examined the FreeBSD implementation - if they've done it the
hard, safe, way, and worked out all the potential kinks, and if it doesn't
depend too much upon other aspects of their I/O system implementation (like
whatever they have to make softdeps work) then perhaps copying that might be
feasible -- if the demand for this really exists, and it isn't being requested
just because it is in the spec and NetBSD is lacking it.

Home | Main Index | Thread Index | Old Index