Subject: Re: Not beer, or why is the pipe so small?
To: Andrew Brown <atatat@atatdot.net>
From: Viktor Dukhovni <viktor@dukhovni.org>
List: tech-kern
Date: 02/26/2003 02:44:57
On Tue, 25 Feb 2003, Andrew Brown wrote:

> >Posix does not say it explicitly, but it does if you read "between the
> >lines", since writes of PIPE_BUF bytes are atomic with pipes, and since a
> >writable pipe should absorb *some* output when a non-blocking writer
> >writes to it (the sender should not see EWOULDBLOCK), a writable pipe must
> >have at least PIPE_BUF bytes free. Otherwise a non-blocking writer would
> >wake up in a tight loop waiting for PIPE_BUF bytes to become available.
>
> just to play the devil's advocate...
>
> i can easily interpret what you say above:
>
> 	writes of PIPE_BUF bytes are atomic with pipes
>
> to mean that "only writes of up to PIPE_BUF bytes are atomic with
> pipes, and that larger writes may not be atomic".
>
> that would allow both a write of one byte to succeed and be perfectly
> correct, and for select()/poll() to return when room for one byte to
> be written was available.
>

Think harder. Select() does not know (Thor's point) how much data the
caller intends to write. Select *must not* claim a descriptor is writable
if a non-blocking writer might (with no intervening I/O events from other
sources) encounter EAGAIN (EWOULDBLOCK) when trying to write *any* amount
of data to the pipe.

If this is true select() must ensure PIPE_BUF free bytes.

If it is false the writer can go into a tight select() loop with the
descriptor always ready, but all writes unsuccessful. No select()
implementation should put a writer a tight select()/write() loop.

It would be decent of POSIX to say so, but not necessary. This is just
logic, the conclusion is inescapable, so I would rather not wrestle the
point.

Again I hope (wisely or otherwise) that someone will consider rasing the
PIPE_BUF value, or perhaps even may be in a position to do a system
benchmark profiling a large collection of diverse shell scripts (a
system build?) with a 512 byte and a 4K PIPE_BUF.

The results for single CPU and multi-CPU systems might be different in
magnitude or the direction of the effect. It would sure be interesting to
to have some numbers in hand, though getting them is likely a lot of work.

A USENIX paper?

-- 
	Viktor.