Subject: re: CVS commit: src/sys/kern
To: None <christos@zoulas.com>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-net
Date: 07/03/2006 09:42:36
> | maybe it depends on the definition of the right thing. :-)
> | (apparently the configure script doesn't think it the right thing.)
> | to me, it doesn't seem so strange.
> 
> What is strange to me is that now we have a situation whereby by
> default we have select() returning that the socket is not writable,
> but a write() to it will succeed and not block. This does not seem
> right to me. What's worse, is that now, we'll never return that
> the socket is writable, unless the buffer is entirely empty.

writeability wrt select() is defined with sb_lowat in posix.

    If a descriptor refers to a socket, the implied output function is the
    sendmsg() function supplying an amount of normal data equal to the current
    value of the SO_SNDLOWAT option for the socket.

so it's normal for select not to consider it writable.

> How about:
> 
> Index: uipc_socket2.c
> ===================================================================
> RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
> retrieving revision 1.73
> diff -u -u -r1.73 uipc_socket2.c
> --- uipc_socket2.c	1 Jul 2006 15:38:28 -0000	1.73
> +++ uipc_socket2.c	3 Jul 2006 00:12:35 -0000
> @@ -377,7 +377,11 @@
>  int
>  soreserve(struct socket *so, u_long sndcc, u_long rcvcc)
>  {
> +	u_long  lowat = MAX(sock_loan_thresh, MCLBYTES);
> +	u_long  hiwat = lowat + PIPE_BUF;
>  
> +	if (sndcc < hiwat)
> +		sndcc = hiwat;
>  	if (sbreserve(&so->so_snd, sndcc, so) == 0)
>  		goto bad;
>  	if (sbreserve(&so->so_rcv, rcvcc, so) == 0)
> @@ -385,7 +389,7 @@
>  	if (so->so_rcv.sb_lowat == 0)
>  		so->so_rcv.sb_lowat = 1;
>  	if (so->so_snd.sb_lowat == 0)
> -		so->so_snd.sb_lowat = MCLBYTES;
> +		so->so_snd.sb_lowat = lowat;
>  	if (so->so_snd.sb_lowat > so->so_snd.sb_hiwat)
>  		so->so_snd.sb_lowat = so->so_snd.sb_hiwat;
>  	return (0);

seems fine to me.  please put a comment.

	/*
	 * there's at least one application (a configure script of screen)
	 * which expects a fifo is writable even if it has "some" bytes
	 * in its buffer.
	 * so we want to make sure (hiwat - lowat) >= (some bytes).
	 *
	 * PIPE_BUF here is an arbitrary value chosen as (some bytes) above.
	 * we expect it's large enough for such applications.
	 */

> | > It could also be a POSIX violation, because we are using sockets
> | > internally to implement pipes, there is expected behavior for pipes
> | > and operations < PIPE_BUF.
> | 
> | i think <= PIPE_BUF writes are atomic as far as lowat >= PIPE_BUF.
> | i'm not aware of any posix pipe buffer requirements other than
> | PIPE_BUF atomicity.
> 
> This is the text I am referring to:
> 
>     * Write requests to a pipe or FIFO will be handled the same as a
>       regular file with the following exceptions:
> 
>     * Write requests of {PIPE_BUF} bytes or less will not be interleaved
>       with data from other processes doing writes on the same pipe. Writes
>       of greater than {PIPE_BUF} bytes may have data interleaved, on
>       arbitrary boundaries, with writes by other processes, whether or
>       not the O_NONBLOCK flag of the file status flags is set.
> 
> I am worried that because of the current FIFO socket-based implementation 
> setting lowat and hiwat inappropriately will break the above behavior.

it's what i meant by "PIPE_BUF atomicity".  i don't think it's broken.

> BTW, did you see my posting on port-sparc regarding the vmem quantum
> size on the sparc?

no.  i'll take a look later.

YAMAMOTO Takashi