Subject: re: CVS commit: src/sys/kern
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Christos Zoulas <christos@zoulas.com>
List: tech-net
Date: 07/02/2006 20:13:33
On Jul 3,  8:48am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
-- Subject: re: CVS commit: src/sys/kern

| maybe it depends on the definition of the right thing. :-)
| (apparently the configure script doesn't think it the right thing.)
| to me, it doesn't seem so strange.

What is strange to me is that now we have a situation whereby by
default we have select() returning that the socket is not writable,
but a write() to it will succeed and not block. This does not seem
right to me. What's worse, is that now, we'll never return that
the socket is writable, unless the buffer is entirely empty.

| "+ MCLBYTES" in your suggestion seems rather strange to me.
| i don't understand why MCLBYTES is used here.
| for the configure script in question, 4 is enough.
| as i'm not aware of any other affected application,
| i'm not sure how many bytes are expected.
| if you want to keep current (hiwat - lowat), it's
| MAX(0, PIPSIZ - MCLBYTES), not MCLBYTES.

How about:

Index: uipc_socket2.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.73
diff -u -u -r1.73 uipc_socket2.c
--- uipc_socket2.c	1 Jul 2006 15:38:28 -0000	1.73
+++ uipc_socket2.c	3 Jul 2006 00:12:35 -0000
@@ -377,7 +377,11 @@
 int
 soreserve(struct socket *so, u_long sndcc, u_long rcvcc)
 {
+	u_long  lowat = MAX(sock_loan_thresh, MCLBYTES);
+	u_long  hiwat = lowat + PIPE_BUF;
 
+	if (sndcc < hiwat)
+		sndcc = hiwat;
 	if (sbreserve(&so->so_snd, sndcc, so) == 0)
 		goto bad;
 	if (sbreserve(&so->so_rcv, rcvcc, so) == 0)
@@ -385,7 +389,7 @@
 	if (so->so_rcv.sb_lowat == 0)
 		so->so_rcv.sb_lowat = 1;
 	if (so->so_snd.sb_lowat == 0)
-		so->so_snd.sb_lowat = MCLBYTES;
+		so->so_snd.sb_lowat = lowat;
 	if (so->so_snd.sb_lowat > so->so_snd.sb_hiwat)
 		so->so_snd.sb_lowat = so->so_snd.sb_hiwat;
 	return (0);

| > It could also be a POSIX violation, because we are using sockets
| > internally to implement pipes, there is expected behavior for pipes
| > and operations < PIPE_BUF.
| 
| i think <= PIPE_BUF writes are atomic as far as lowat >= PIPE_BUF.
| i'm not aware of any posix pipe buffer requirements other than
| PIPE_BUF atomicity.

This is the text I am referring to:

    * Write requests to a pipe or FIFO will be handled the same as a
      regular file with the following exceptions:

    * Write requests of {PIPE_BUF} bytes or less will not be interleaved
      with data from other processes doing writes on the same pipe. Writes
      of greater than {PIPE_BUF} bytes may have data interleaved, on
      arbitrary boundaries, with writes by other processes, whether or
      not the O_NONBLOCK flag of the file status flags is set.

I am worried that because of the current FIFO socket-based implementation 
setting lowat and hiwat inappropriately will break the above behavior.

BTW, did you see my posting on port-sparc regarding the vmem quantum
size on the sparc?

christos