Subject: Re: race in select() ?
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: David Laight <david@l8s.co.uk>
List: tech-kern
Date: 10/09/2003 14:13:51
On Thu, Oct 09, 2003 at 01:46:49PM +0200, Manuel Bouyer wrote:
> Hi,
> I think there is a race condition somewhere in select() when multiple
> processes work on the same file descriptor, but I've not been able to
> find where yet. I see it with UDP sockets.
> 
> I run rpc.rstatd from inetd on my servers. It gets polled every few seconds
> from xmeter running on my workstations, and also one a minute by cricket.
> 
> Occasionally, rpc.rstatd stop responding on a server. When in this
> state, inetd is blocked on select(), there is no rpc.rstatd running,
> and there are some data in the socket queue as reported by netstat.
> Today I played a bit with gdb and ddb on a box in this state.
> I can confirm that inetd passed the proper arguments to select(),
> and in the kernel sys_select() is waiting on the proper file descriptors.
> A kill -HUP on inetd makes it go again.
> To me it looks like the tsleep() failed to get awakened; after a HUP inetd
> will call select again; as there are data in the socket queue it doesn't
> sleep at all and process data waiting in the socket queue.
> 
> rpc.rstatd, when started from inetd, will also do a select on the socket
> in svc_run(). It will then process the first request in the socket queue and
> exit.

I presume inetd takes the fd out of its select list until the rpc.rstatd
process exits.  Otherwise there would be a nasty loop.

> I suspect a race condition in the kernel, but don't have much idea about it.
> Any idea welcome.

Is this 'current'? and single cpu?
There was a bug when more than 2 processes select on the same fd.

OTOH this does have rather look like a 'data arriving during setup' bug.
Did you find out what sel_pid and sel_collision were set to?

	David

-- 
David Laight: david@l8s.co.uk