Subject: Re: race in select() ?
To: David Laight <david@l8s.co.uk>
From: Charles M. Hannum <abuse@spamalicious.com>
List: tech-kern
Date: 10/09/2003 22:07:44
On Thursday 09 October 2003 09:55 pm, David Laight wrote:
> > I have a different, somewhat wackier suggestion...
> >
> > Pass an actual timeval to select(), with a Really Large Value.  In the
> > SIGCHLD handler, when we restore a socket in the mask, set the timeout to
> > 0.  Handle EWOULDBLOCK by resetting the timeval to the Really Large Value
> > and looping around again.
>
> That isn't the problem. select should (surely) fail EINTR?
> The existing code can only work if the select fails.

Which is exactly my point.  Setting the timeval to 0 forces select(2) to 
return immediately.

> The problems happen when the signal handler runs while the process
> is in the system call shim.

Why is that a problem?  The system call is fetching the timeout inside the 
kernel.  Once you're inside the kernel, it doesn't matter -- the signal would 
force it to return EINTR immediately.  My hack handles all of the cases where 
we get the signal before we enter the system call.

Yet another solution is to use a sig_atomic_t flag to signal that we need to 
recopy the mask.  I.e. (using Manuel's variable names):

volatile sig_atomic_t whoops;
volatile fd_set readable;

...
	do {
		whoops = 0;
		readable = allsock_select = allsock;
	} while (whoops);
...
	select(..., &allsock_select, ...};

...
	FD_SET(&allsock, new_descriptor);
	FD_SET(&readable, new_descriptor);
	whoops = 1;

This makes the copy atomic WRT the signal handler, without having to mask 
signals.