Subject: Re: Not beer, or why is the pipe so small?
To: Viktor Dukhovni <viktor@dukhovni.org>
From: Jaromir Dolecek <jdolecek@netbsd.org>
List: tech-kern
Date: 02/25/2003 19:26:42
Obviously the start of the discussion is missing, but let me point
out parts which I know.

Viktor Dukhovni wrote:
> If on the other hand a selected for write
> pipe can have only 512 bytes free, I consider waking up the writer with so
> little space a (legal) misfeature.

POSIX requires that at least PIPE_BUF bytes can be written to pipe
'atomically', meant 'without blocking'. Anything more than PIPE_BUF
is subject to flow control. As far as PIPE_BUF bytes can be written,
it's correct to flag the pipe as 'writable'.

Now, for non-blocking I/O, write() writes as much as possible
without blocking, and returns to caller at the point where it
would be necessary to block. If no data could be writen without
blocking, write(2) would return -1 and errno EWOULDBLOCK. If partial
write happened, it returns successfully with short write.
Applications MUST check for this and handle this correctly.

> The old code did not bother with non-blocking mode, it assumed that 4K
> will always fit. That assumption has been fixed, so I merely raise the
> question as to why the fix is necessary. It would seem that if the OS
> assured the writer of adequate space, even the select before write
> blocking code would have worked.

The point is that the OS didn't assure the writer of 'adequate' space.
It merely assured the writer that it's possible to write _some_
data. In case of pipes, the POSIX semantics means that writer can
write at least PIPE_BUF bytes. For other objects (such as sockets),
this might be less or more (depends on SO_SNDLOWAT settings).

Any application assuming it could write more than PIPE_BUF bytes
to pipe without blocking is Just Plain Broken and needs to be fixed.
The 4K write happened to work in Linux, since the default kernel
pipe buffer is 4K on Linux; this is probably how the incorrect
assumption arised. With 'NEW_PIPE' in NetBSD 1.6 and later,
this happens to work too, since the memory loan code kicks in
only for transfers bigger than PIPE_MINDIRECT (8k currently),
and the pipe kernel buffer is 16k.

So a summary:
1. only transfers <= PIPE_BUF are guaranteed without blocking
   if pipe descriptor is 'writable' (as flagged by select(2))
2. non-blocking write(2) may return successfully with short write count,
   or with EWOULDBLOCK error

Applications making other assumptions about pipe behaviour
are broken and need to be fixed.

Jaromir
-- 
Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.org/
-=- We should be mindful of the potential goal, but as the tantric    -=-
-=- Buddhist masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow.   Do not let this distract you.''     -=-