tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: posix_fallocate



    Date:        Sun, 17 Nov 2013 10:33:43 +0100
    From:        manu%netbsd.org@localhost (Emmanuel Dreyfus)
    Message-ID:  <1lch6me.jn7y3m16232ejM%manu%netbsd.org@localhost>

  | We could fork a kernel thread that would go to userspace to do the work
  | with a write() loop, with appropriate credentials. Does it makes sense?

It would need to be a read/write loop, nothing says that there cannot already
be blocks allocated in the space being fallocated, and their content should
not change.

But yes, if implemented that way it would be much less of a problem.

But if implemented that way, why bother at all?  Why not just put the
code in a user space libc posix_fallocate() function, and be done with
it, it should not require any kernel support at all.

That's not true of the inverse function that David Holland referred to
(though like Rhialto, I can't see what relationship that has with the
posix_fallocate() call that we were asked about) for making holes in files.
That one is not a problem, and needs to be in the kernel to be implemented
(as the physical structure of a file is deliberately not exposed to userland.)
Implementing that (assuming there's some standard interface definition for
it) might be sensible, I still see no use at all for a (kernel) 
posix_fallocate().

kre

ps: another reason that a userland process is less of a problem than the
kernel interface described in the opengroup posix_fallocate() spec, is
that a user process must either do multiple sys calls (and is subject to
being signalled, and hence terminated, between sys calls) or malloc (or
brk(2)) enough space for a buffer as big as the write call - that is
typically going to limit a single sys call to no more than a few tend of GBs
(on today's systems) as that's generally as big as a process can grow.

On the other hand, posix_fallocate() could allocate pitabytes in a single
invocation of the sys call, assuming that the filesystem had that much
space available.   I haven't looked recently, but last time I did,
preemptible sys calls still didn't mean that userland signals would be
delivered in the middle of the operation of a single sys call, nor does
anything suggest that signals are supposed to interrupt the operation of
posiz_fallocate() half way through - which suggests to me, that as designed,
it should continue until it is finished once invoked, whatever anyone tries
to do to the process that invoked it.



Home | Main Index | Thread Index | Old Index