Re: New class of receive error

To: Michael van Elst <mlelstv%serpens.de@localhost>, tech-net%netbsd.org@localhost
Subject: Re: New class of receive error
From: Roy Marples <roy%marples.name@localhost>
Date: Sun, 13 May 2018 15:39:07 +0100

On 13/05/2018 10:16, Michael van Elst wrote:


Recently the recv() system call (and variants) were changed to
return the ENOBUFS error for datagram sockets to indicate that
the sender failed to send a packet.

Apparently this only works for local communication like UNIX domain
sockets or the routing socket, as you don't know about sender errors
of a remote system if e.g. UDP is used.


No, this isn't entirely the case.

If any mechanism tries to send data to a socket, regardless of it'ssource (could be UDP, could be a local domain socket) and there isn'tenough buffer space it will now be reported.

I believe that once it's in the system it should't be silentlydiscarded. Comments that have existed in our code as far back as I'vechecked seem to agree with stuff like this:

/* XXX notify userland of overflow */

The intention was to make data sent to the routing socket more
reliable by informing the receiver that it lost information and
that it has to resynchronize. So far the only user that needs
that kind of information reliably is dhcpcd.

It's true that dhcpcd is the only entity that does something useful withthe error and re-loads state using getifaddrs and ioctl.

I would imagine that other applications we have in the base systemshould really care about this as well. Here's a list from a trivialsearch and basic examination:

	rarpd, racoon, ifwatchd, route (monitor), rtadvd

A side effect is that programs using UNIX domain datagram sockets
such as syslogd now fail when they can't keep up with messages
(but only for local messages).

I, for one, would be interested in knowing that the logging mechanism isfailing. We don't know the importance of the message that wouldotherwise have been silently discarded. But at least we now know, so wecan do something about it.

I don't think that's the right approach. In particular, it shouldn't
be pushed to netbsd-8 close to the release.

If you want reliable transport of messages, then you should use
a reliable transport protocol.

If you want complete routing information use synchronous queries
and use routing socket messages only as an optimization.


I believe that's what I've done?
Open PF_ROUTE socket and listen to it.

In the normal path everything works and kqueue (or whatever selectmechanism is in play) lets me know either there is an error or somethingneeds reading.

In the case of ENOBUFS, dhcpcd will take no assumptions about what waslost an will re-sync it's state using getifaddrs and ioctl. This is anexpensive operation that monitoring the route socket avoids.

The origin of this change seems to be the special case of Linux
NETLINK sockets. NETLINK sockets serve similar purposes as the
BSD routing socket for passing kernel information to userland
daemons. Reading from them may yield ENOBUFS errors to inform
userland that messages got lost.  This behaviour is also controlled
by the NETLINK_NO_ENOBUFS socket flag, so it can be turned off
again.


I see no reason to single out Linux here, nor a specific type of socket.

From consulting other OS documentation it seems that only the BSDfamily does not raise ENOBUFS for a recv call. It's certainly documentedby POSIX:

http://pubs.opengroup.org/onlinepubs/000095399/functions/recv.html

So any portable application can rightly expect to see ENOBUFS calling recv.

Roy

Follow-Ups:
- Re: New class of receive error
  - From: Robert Elz
- Re: New class of receive error
  - From: Jason Thorpe

References:
- New class of receive error
  - From: Michael van Elst

Prev by Date: Re: New class of receive error
Next by Date: Re: New class of receive error
Previous by Thread: Re: New class of receive error
Next by Thread: Re: New class of receive error
Indexes:

Home | Main Index | Thread Index | Old Index