Re: partial failures in write(2) (and read(2))

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: partial failures in write(2) (and read(2))
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Fri, 19 Feb 2021 00:06:52 +0000

On Tue, Feb 16, 2021 at 05:29:00PM +0700, Robert Elz wrote:
 > We could, of course, invent new interfaces (a write variant with an
 > extra pointer to length written arg perhaps, or where the length arg
 > is a pointer to a size_t and that is read and then written with either
 > the amount written, or the amount not written).
 > 
 > But I don't believe that any of this is needed, or desirable.

Right, I think succeeding with a short count is preferable in all
cases where anyone actually cares what happened.

 > We should first make sure that we do what POSIX requires, and simply
 > return a short write count (and no error) in the cases where that
 > should happen (out of space, over quota, exceeding file size limit,
 > and writing any more would block and O_NONBLOCK is set, more?).

As far as I can tell these errors are not currently handled in this
way, except maybe the EWOULDBLOCK case.

(And there's one other: signal delivery after writing some data to a
slow device. But that already works correctly.)

 > In the other error cases we should simply leave things alone and
 > accept it - it is the way unix always has been, and we have survived.
 > If we have a drive returning I/O errors (on writes), do we really
 > expect that earlier data written will have been written correctly?

Since writes to regular files will always go into the cache and not (I
think ever, absent O_DIRECT) be written to disk directly, I don't
think that case actually arises. Instead it will be filtering through
the completely broken fsync error reporting chain. (More on that
elsewhere.)

However, for reads... if you read part of a file and then get EIO
because the disk is going bad, it's reasonably likely that the part
you did get is ok, and moreover, if what you're trying to do is rescue
data from a dying disk, chances are you _do_ want it, even if there's
a moderate chance of it being corrupted. So I kind of think the EIO
case should succeed with a short count too.

As for EFAULT, I was testing with that because it's easy to test, but
I agree that it isn't particularly useful to continue, the one thing
I'm not sure of being possible interactions with generational garbage
collectors.

 > So, let's all forget fanciful interface redesigns, fix whatever we
 > need to fix to make things work the way they are supposed to work
 > (if there is anything) and leave the rest as "the world just broke"
 > type territory.

I'm pretty sure the only on-the-fly error that _does_ work in this
sense (in the sense of being converted to success with a short count)
is EINTR.

-- 
David A. Holland
dholland%netbsd.org@localhost

References:
- Re: partial failures in write(2) (and read(2))
  - From: Rhialto
- partial failures in write(2) (and read(2))
  - From: David Holland
- Re: partial failures in write(2) (and read(2))
  - From: Mouse
- Re: partial failures in write(2) (and read(2))
  - From: Thor Lancelot Simon
- Re: partial failures in write(2) (and read(2))
  - From: John Franklin
- Re: partial failures in write(2) (and read(2))
  - From: Robert Elz

Prev by Date: Re: fsync_range and O_RDONLY
Next by Date: Re: fsync_range and O_RDONLY
Previous by Thread: Re: partial failures in write(2) (and read(2))
Next by Thread: Re: partial failures in write(2) (and read(2))
Indexes:

Home | Main Index | Thread Index | Old Index