tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: DIOCGDISCARDINFO and DIOCDISCARD



On Sat, Jun 15, 2013 at 11:56:55PM +0000, David Holland wrote:
 > I had almost forgotten about this; but a few months back when I came
 > into contact with the wd TRIM support in current I wanted to change
 > the interface around before it appears in a release.

Ok, as of this writing I have preliminary patches for about half of
the following:

   - add a d_discard op to struct bdevsw
   - add VOP_FALLOCATE and VOP_FDISCARD
   - implement spec_fdiscard that calls d_discard
   - add posix_fallocate, fallocate, and fdiscard syscalls that
     call VOP_FALLOCATE and VOP_FDISCARD
   - add posix_fallocate, fallocate, and fdiscard to libc
   - change existing device DIOCDISCARD implementations to d_discard
     implementations
   - update the ffs -odiscard code to use the new interface
   - fix existing discard implementations to not require the caller to
     check the maximum range that can be discarded at once
   - remove DIOCGDISCARDPARAMS
   - remove DIOCDISCARD
   - implement d_discard on dk, closing PR 47940
   - fix PR 47435, where wd's discard lets you zoom off the end of a
     partition or (I think) even the whole disk
   - add compat_linux support for linux's native fallocate and
     linux's posix_fallocate

Notes and comments on this:

 - I hope adding an op to struct bdevsw isn't going to cause a ruckus
or open a can of worms. It seems like the right thing to do. However,
there are other disk ioctls that I would think probably ought to be
first-class bdevsw ops and I'm not sure what the conventional wisdom
is.

 - I'm not adding either fallocate or fdiscard support for regular
files to any fs yet; this batch of changes is all plumbing and for
access to existing TRIM support. Doing this shouldn't be that hard.

 - I have defined all the interfaces (including the bdevsw one) to use
off_t and byte counts and offsets. I don't think proliferation of
DEV_BSIZE is beneficial; plus if everything is bytes it reduces the
chance of using the wrong shift macro and messing everything up. For
an operation that erases data in a bulk fashion at a low level, I
think this is fairly important.

 - I don't think we should adopt Linux's native fallocate interface.
It has the ugly misfeature that discard is, rather than a separate
call, an extra flag passed to fallocate -- that is, you delete blocks
by allocating.

 - I haven't decided if our native fallocate and fdiscard interface
should expose the Linux option to leave the file size unchanged. (In
Linux this allows allocating blocks past EOF, which is wrong on many
levels...) If not, our native fallocate will end up the same as
posix_fallocate. Another option is to allow this for fdiscard (where
it makes more sense) but not fallocate.

        int fallocate(int fd, off_t pos, off_t len, int keepsize);
        int fdiscard(int fd, off_t pos, off_t len, int keepsize);
   or
        int fallocate(int fd, off_t pos, off_t len);
        int fdiscard(int fd, off_t pos, off_t len);
   or
        int fallocate(int fd, off_t pos, off_t len);
        int fdiscard(int fd, off_t pos, off_t len, int keepsize);

 - I think it's ok to have a fallocate that's incompatible with the
linux fallocate, as long as we also have posix_fallocate.

 - The vnode-level interface supports the keep-size option because
it's supposed to be able to support the linux fallocate.

 - Someone(TM) should add discard support to vnd and raidframe, and
probably also ld and in a few other places; I might do this afterwards
but I'm not (I think) going to do it as part of the main patch set.

 - This patch set is all interconnected so it looks unlikely that it
can be committed incrementally.

Thoughts and objections? I'll post a link to the patches when they're
more fully baked.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index