tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: partial failures in write(2) (and read(2))



> It is possible for write() calls to fail partway through, after
> already having written some data.

It is.  As you note later, it's also possible for read().

The rightest thing to do, it seems to me, would be to return the error
indication along with how much was successfully written (or read).  But
that, of course, requires a completely new API, which I gather is more
intrusive than you want to get into here.

> Basically, it is not feasible to check for and report all possible
> errors ahead of time,

In some cases - such as EIO - it is not possible even in theory.

> nor in general is it possible or even desirable to unwind portions of
> a write that have already been completed,

Agreed.  In some cases, by the time the error is detected, the bits may
not even exist on the local machine any longer.

> which means that if a failure occurs partway through a write there
> are two reasonable choices for proceeding:
>    (a) return success with a short count reporting how much data has
>        already been written;
>    (b) return failure.

Right.

Personally, my own preference is for (a), with the error remembered and
returned on the next write (resp. read) even if there is nothing (else)
erroneous about that next operation.

> It seems to me that for most errors (a) is preferable, since
> correctly written user software will detect the short count, retry
> with the rest of the data, and hit the error case directly, but it
> seems not everyone agrees with me.

Well, if it _will_ "hit the error case directly", maybe.  It is not
clear to me that it will.  Except for EPIPE (which will rarely be
returned; most writers will die on SIGPIPE instead), none of those is
guaranteed to repeat on the next write - though admittedly some are
more likely to than others, and some of them (eg, EFAULT) definitely
will recur unless something in the writing process intervenes.

> [test with deliberately mprotect()ed part of buffer]
>    - for regular files on ffs and probably most things that use
>      uiomove_ubc, the data in the accessible part of the buffer is
>      written, the call fails with EFAULT, and the size of the file is
>      reverted to what it was at the start.

!!  That, I would say, strongly violates POLA.  It is not behaviour I
would have been likely to guess.

> Anyhow, if you've made it this far, the actual question is: is the
> current behavior really what we want?

It is not what _I_ would prefer.  If we _had_ a more elaborate API, one
that could return partial success followed by an error, then I'd say we
could ignore the question of what write() and read() do on the grounds
that code that really cares can always use the more detailed call.

If adding that is an option, great.  If not, well, I think returning a
short count and remembering the error for the next call is about the
best option available.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index