Subject: Re: race in select() ?
To: David Laight <david@l8s.co.uk>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 10/09/2003 21:43:58
--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, Oct 09, 2003 at 06:01:53PM +0100, David Laight wrote:
> On Thu, Oct 09, 2003 at 05:20:10PM +0200, Manuel Bouyer wrote:
> > On Thu, Oct 09, 2003 at 02:13:51PM +0100, David Laight wrote:
> > > 
> > > I presume inetd takes the fd out of its select list until the rpc.rstatd
> > > process exits.  Otherwise there would be a nasty loop.
> > 
> > Yes it does.
> 
> There is a timing bug in inetd itself.  rev 1.76 reads:
> 
> 	readable = allsock;
> 	if ((n = select(maxsock + 1, &readable, (fd_set *)0,
> 		...
> 
> and reapchild does:
> 	FD_SET(sep->se_fd, &allsock);
> 
> So if a child exits after the copy is made, but before the system call
> the select won't be looking for the correct fdset.
> If reapchild set the bit in readable then there could be a false
> positive if the SIGCHLD happened in the system call return path,
> that might be easier to check though (eg detect it happening, and
> call select again).
> 
> Of course, you may not be hitting this one...

Yes, I'm almost sur this is it. I may have got it off by one when I checked
the fd_sets values in the kernel (the rpc.statd socket is the last one).
I'll try to check this when the bug shows up again (it happens once in a
few days, but I couldn't reproduce it at will).

About this race: I think the attached patch should fix it.
To avoid the race before select, the sigchld handler adds the bit in allsock
and the fd_set we will select on (allsock_select).
To avoid the false positive on select return, we check both the fd_set
we got from select, and the one we had before copying allsock to
allsock_select.
I'm not sure if allsock should be declared volatile; could the compiler
optimise this with registers load/store and keep values in registers
between the 2 copies here:
           readable = allsock;
           allsock_select = allsock;
This would of course require an architecture with enouth registers, or
a smaller value for FD_SETSIZE.
There is still a race left between the copy to readable and the copy to
allsock_select, but it's harmless: in this case, we won't handle the descriptor
after select returns, and handle it at the next loop.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 24 ans d'experience feront toujours la difference
--

--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff

? .gdbinit
? inetd
? inetd.cat8
Index: inetd.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/inetd/inetd.c,v
retrieving revision 1.76
diff -u -r1.76 inetd.c
--- inetd.c	2002/01/21 14:42:28	1.76
+++ inetd.c	2003/10/09 19:39:02
@@ -272,7 +272,7 @@
 int	lflag;
 #endif
 int	nsock, maxsock;
-fd_set	allsock;
+fd_set	allsock, allsock_select;
 int	options;
 int	timingout;
 struct	servent *sp;
@@ -532,7 +532,8 @@
 		(void) sigsetmask(0L);
 	    }
 	    readable = allsock;
-	    if ((n = select(maxsock + 1, &readable, (fd_set *)0,
+	    allsock_select = allsock;
+	    if ((n = select(maxsock + 1, &allsock_select, (fd_set *)0,
 		(fd_set *)0, (struct timeval *)0)) <= 0) {
 		    if (n == -1 && errno != EINTR) {
 			syslog(LOG_WARNING, "select: %m");
@@ -542,7 +543,8 @@
 	    }
 	    for (sep = servtab; n && sep; sep = nsep) {
 	    nsep = sep->se_next;
-	    if (sep->se_fd != -1 && FD_ISSET(sep->se_fd, &readable)) {
+	    if (sep->se_fd != -1 && FD_ISSET(sep->se_fd, &readable) &&
+		FD_ISSET(sep->se_fd, &allsock_select)) {
 		n--;
 		if (debug)
 			fprintf(stderr, "someone wants %s\n", sep->se_service);
@@ -770,6 +772,7 @@
 					    sep->se_server, WTERMSIG(status));
 				sep->se_wait = 1;
 				FD_SET(sep->se_fd, &allsock);
+				FD_SET(sep->se_fd, &allsock_select);
 				nsock++;
 				if (debug)
 					fprintf(stderr, "restored %s, fd %d\n",

--LQksG6bCIzRHxTLp--