Subject: Re: race in select() ?
To: Charles M. Hannum <abuse@spamalicious.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 10/11/2003 16:51:23
On Thu, Oct 09, 2003 at 10:07:44PM +0000, Charles M. Hannum wrote:
> On Thursday 09 October 2003 09:55 pm, David Laight wrote:
> > > I have a different, somewhat wackier suggestion...
> > >
> > > Pass an actual timeval to select(), with a Really Large Value.  In the
> > > SIGCHLD handler, when we restore a socket in the mask, set the timeout to
> > > 0.  Handle EWOULDBLOCK by resetting the timeval to the Really Large Value
> > > and looping around again.
> >
> > That isn't the problem. select should (surely) fail EINTR?
> > The existing code can only work if the select fails.
> 
> Which is exactly my point.  Setting the timeval to 0 forces select(2) to 
> return immediately.
> 
> > The problems happen when the signal handler runs while the process
> > is in the system call shim.
> 
> Why is that a problem?  The system call is fetching the timeout inside the 
> kernel.  Once you're inside the kernel, it doesn't matter -- the signal would 
> force it to return EINTR immediately.  My hack handles all of the cases where 
> we get the signal before we enter the system call.
> 
> Yet another solution is to use a sig_atomic_t flag to signal that we need to 
> recopy the mask.  I.e. (using Manuel's variable names):
> 
> volatile sig_atomic_t whoops;
> volatile fd_set readable;
> 
> ...
> 	do {
> 		whoops = 0;
> 		readable = allsock_select = allsock;
> 	} while (whoops);
> ...
> 	select(..., &allsock_select, ...};
> 
> ...
> 	FD_SET(&allsock, new_descriptor);
> 	FD_SET(&readable, new_descriptor);
> 	whoops = 1;
> 
> This makes the copy atomic WRT the signal handler, without having to mask 
> signals.

I like this. It's cleaner than the timeval hack.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 24 ans d'experience feront toujours la difference
--