Subject: poll vs recvfrom
To: None <tech-net@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-net
Date: 11/15/2005 13:50:19
I'm trying to do some SNMP code on a machine at work.  It's running
1.4.1 (it's an infrastructure machine and has been left alone for a
long time on "if it ain't broke, don't fix it" grounds).

My code creates a half-dozen UDP sockets (AF_INET/SOCK_DGRAM), fires
off SNMP queries on them, and drops into a loop, doing poll() to await
responses and recvfrom() to collect them.

However, one of the devices being queried is not accessible at the
moment (I don't know why, and it doesn't matter for the purposes of
this message).  I saw the code lock up; on investigation, it proved to
be hanging in recvfrom() on the socket that was trying to talk to the
dead device.  I made the code set O_NONBLOCK, and now I find it getting
into a livelock loop wherein poll returns showing that socket readable
but recvfrom() on the socket returns showing EWOULDBLOCK.  As I
understand it, this should not be possible, or at least should be
possible only very transiently.

So, my question: what can cause this, and is there anything to be done
about it short of switching OS revs?  (I realize that asking about
something this old is quite possibly pointless, but it also seems not
too unlikely to me that someone still remembers enough about 1.4.1 to
be able to help.)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B