Subject: Re: Not beer, or why is the pipe so small?
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Viktor Dukhovni <viktor@dukhovni.org>
List: tech-kern
Date: 02/25/2003 11:42:58
On Tue, 25 Feb 2003, Thor Lancelot Simon wrote:

> It is *not* the case that I/O to pipes under NetBSD occurs in 512
> byte "chunks".  It *is* the case that there is no guarantee that a
> write of more than 512 bytes will be available in one read to the
> recipient at the other end of the pipe.  This is even *more* true
> with the NEWPIPE code that is also present in FreeBSD and OpenBSD.
>

I do not claim that it does occur in such chunks, nor do I care about the
I/O units seen by the reader. So you misunderstand I am most definitively
not looking for datagram equivalent service. With Postfix, there is only
one writer and and one reader per pipe, so the atomicity issues come up
only indirectly in the following way.

The old timed write to a pipe code (<= 2.0.3) would loop selecting for the
pipe to become writable, if the select timed out, Postfix bailed on
delivery to the command, if the select found the pipe writable, Postfix
attempted to write 4K to the pipe.

The problem with a small PIPE_BUF is not any reader side behaviour, but
rather that a pipe with as few as 512 bytes free may select ready for
writing. Now perhaps the moral equivalent of the low-water mark for the
new pipes is 4K or more, and the problem with commands not timing out was
due to the problem with Postfix trying to kill the child process with the
wrong credentials discovered while analyzing the I/O issues. If that is
so, I withdraw my complaint. If on the other hand a selected for write
pipe can have only 512 bytes free, I consider waking up the writer with so
little space a (legal) misfeature.

> Fundamentally, trying to abuse PIPE_BUF to get datagram semantics
> from the *stream* transport that is a Unix pipe is as broken as
> trying to use knowledge of the TCP flow control algorithm to get
> datagram semantics from a TCP socket.  If datagram semantics -- a
> whole message read or no message read, or any close approximation
> thereof -- are desired, you need to either A)

This is not what is desired, the reader can read 1 byte a time if it
pleases.

> Code that puts a stream descriptor in non-blocking mode and then
> doesn't bother to handle EWOULDBLOCK is Just Plain Broken.  Code
> that reads from a stream descriptor and expects to always get
> some "atomic" unit that was written by the other end is _also_
> Just Plain Broken -- indeed, I have always regarded the coddling
> of this approach exemplified by the treatment of PIPE_BUF in POSIX
> as a mistake, but if you break even _that_ rule, all bets are off.

The old code did not bother with non-blocking mode, it assumed that 4K
will always fit. That assumption has been fixed, so I merely raise the
question as to why the fix is necessary. It would seem that if the OS
assured the writer of adequate space, even the select before write
blocking code would have worked.

> It might be worth taking this up on tech-kern@netbsd.org if we
> really need to discuss it further...
>

Feel free to set Reply-To: to tech-kern@netbsg.org

-- 
	Viktor.