tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: checking for a closed socket

On Tue, Feb 02, 2021 at 19:20:22 +0100, Manuel Bouyer wrote:

> I've been debugging an issue wuth Xen, where xenstored loops at 100%
> CPU on poll(2).
> after code analysis it's looping on closed Unix socket desriptors.
> From what I understood the code expect poll(2) to return something
> different from POLLIN when the remote end of the socket is
> closed (it checks for (~(POLLOUT|POLLIN)) to it could be either
> POLLERR or POLLHUP I guess - or eventually POLLRDHUP which we don't have).
> Who is right here, linux or NetBSD (linux claims to be posix, while
> our man page doens't mention it) ?
> Is there a way to check if a connection has been closed without a read() ?

You have to be careful what you read into "claim to be posix",
especially when connection creation and termination are concerned.
Termination is extra fun because there are half-closed sockets.

My experience is that the only thing you can rely on is that if
POLLFOO is reported for an fd then the "foo" action on that fd will
not block - which is, essentially, poll's principal raison d'etre.
The details can vary wildly from system to system, so you might need
some strategic planning and experimentation.

I don't have all my relevant notes handy, but as an example, consider
a failed connect(2) that you poll for POLLOUT (posix: "A file
descriptor for a socket that is connecting asynchronously shall
indicate that it is ready for writing, once a connection has been
established.").  On failed connect(2) you will get:

- NetBSD, Solaris: POLLOUT

POLLHUP on "close" is even more fun because of half-closed
connections.  NetBSD and Solaris never report POLLHUP for sockets,
MacOS reports POLLHUP when remote closes, Linux reports POLLHUP when
both directions are closed.  Note that getting POLLHUP doesn't mean
that you can immediately "give up" on that socket, you still have to
read it b/c there may still be unread data.  E.g. consider sending a
request, half-closing your side, getting a reply from the server that
ends up in the kernel's socket buffer followed by the server
half-closing its end and thus completely closing the connection.  At
this point you haven't read anything yet in the application, but you
will get POLLHUP (and POLLIN for the data, iirc).  So that POLLHUP is
not really telling you much.

All of the above is strictly "IIRC" and might have changed since the
last time I checked.

To reiterate, my point is that 1) you can assume very little about
specific events reported for boundary conditions - different systems
report them differently; 2) you have to remember that the main promise
of the poll(2) is that the corresponding operation will not block.

PS: Sorry if that was a bit on the rambling side.


Home | Main Index | Thread Index | Old Index