Subject: race in select() ?
To: None <tech-kern@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 10/09/2003 13:46:49
Hi,
I think there is a race condition somewhere in select() when multiple
processes work on the same file descriptor, but I've not been able to
find where yet. I see it with UDP sockets.

I run rpc.rstatd from inetd on my servers. It gets polled every few seconds
from xmeter running on my workstations, and also one a minute by cricket.

Occasionally, rpc.rstatd stop responding on a server. When in this
state, inetd is blocked on select(), there is no rpc.rstatd running,
and there are some data in the socket queue as reported by netstat.
Today I played a bit with gdb and ddb on a box in this state.
I can confirm that inetd passed the proper arguments to select(),
and in the kernel sys_select() is waiting on the proper file descriptors.
A kill -HUP on inetd makes it go again.
To me it looks like the tsleep() failed to get awakened; after a HUP inetd
will call select again; as there are data in the socket queue it doesn't
sleep at all and process data waiting in the socket queue.

rpc.rstatd, when started from inetd, will also do a select on the socket
in svc_run(). It will then process the first request in the socket queue and
exit.

I suspect a race condition in the kernel, but don't have much idea about it.
Any idea welcome.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     NetBSD: 24 ans d'experience feront toujours la difference
--