Subject: Re: poll vs recvfrom
To: None <tech-net@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-net
Date: 11/15/2005 15:59:23
>> However, one of the devices being queried is not accessible at the
>> moment (I don't know why, and it doesn't matter for the purposes of
>> this message).  I saw the code lock up; on investigation, it proved
>> to be hanging in recvfrom() on the socket that was trying to talk to
>> the dead device.  I made the code set O_NONBLOCK, and now I find it
>> getting into a livelock loop wherein poll returns showing that
>> socket readable but recvfrom() on the socket returns showing
>> EWOULDBLOCK.  As I understand it, this should not be possible, or at
>> least should be possible only very transiently.

> Anything else set in revents?

No.  I made it print out the .revents value whenever recvfrom returns
EWOULDBLOCK, and it's always 0x41 (which is the same POLLIN|POLLRDNORM
I put in .events going in).

> It could be that poll() is trying to tell you that the host is
> unreachable (due to an ICMP message or arp timeout).

This is plausible; it's unpingable, and the next-upstream gateway could
well be sending back some kind of unreachable.  But it's not getting
pushed clear back to the poll interface, it appears.  (Nor to recvfrom,
and I'd expect that in that case, poll would show readable and recvfrom
would show some kind of more permanent error.)

I dug out the 1.4.1 kernel code, and it appears that poll() uses
soreadable() but the recvfrom() path does more complicated stuff, and I
suspect there's a mismatch between the two tests.

I'm going to see if I can reproduce it on a 1.6.1 system, but if I
can't, so many other things will be different that I'm not sure I'll be
able to be sure the difference is the OS rev....

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B