tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: New class of receive error





On 13/05/2018 16:30, Robert Elz wrote:
     Date:        Sun, 13 May 2018 15:01:42 +0100
     From:        Roy Marples <roy%marples.name@localhost>
     Message-ID:  <12e84d93-9f83-359c-3fd4-17f359f289de%marples.name@localhost>

   | Other OS's document ENOBUFS on recv calls.

That other OS's (by which I assume you mean linux) are broken is no reason
that we should be too.

By other OS's I mean AIX, HPUX, Solaris and the POSIX specificaition?
I stopped checking others at this point but it's clearly not just Linux.

Datagram protoctols inherently lose packets - aside from in this one very
special case, there's no way at the network level to inform the receiver of
a lost packet, as there's no way to know one was ever sent.

I'll stop right here.
The network isn't involved in a few cases. AF_LOCAL and PF_ROUTE don't go over it. Also, even when the network is involved, once it's gotten into the kernel we should be dealing with it on a best case, including noting that whatever it got cannot be delivered.


The application
level needs to recover using its own mechanism - which could be using sequence
numbers in the packets as Christos suggested - but here I do not think can
work, as the messages come from a variety of sources (including other
processes) and there's no way to synchronise the (current) sequece numbers
(rtm_seq).

But we could add a kernel geneerated seq number - set whenever a routing
packet is generated, and delivered to the receiver - that at least would be a
application layer recovery mechanism.

Aside from dhcpcd every instance of "handling" this error is to (at most)
log it and ignore it - it really is pointless.

The routing socket is something special - it arguably should not be using the
socket interface at all - as it is puerly a local host communication mechanism,
so the local host OS knows when a packet is sent, and when one is received,
and when one is lost, and so can (reliably) inform the receiver that a  packet
has been lost - but that one is a very special case (along with the mobile IP
socket, which is just the routing socket with a different name (and purpose)).

When I originally worked out what the issue was that some people where having on NetBSD, I proposed (not on email) that we adopt RTM_DESYNC which OpenBSD implemented for route(4). That however was shot down in flames (oddly enough by one person now complaining on this list) as "we don't want another magical message on route(4)". After thinking about it more, I agreed as it makes it situation worse by trying to send more data.

joerg then suggested "What about a KNOTE in kqueue(2)?"
I did actually post a working implementation here:
https://mail-index.netbsd.org/tech-net/2018/03/15/msg006749.html

But it got no feedback.
And as you pointed out in another email it's a different way of doing something just to be different.

Then I thought - route(4) behaviour shouldn't be anything special. What should any socket do when it's internal buffer overflows? A quick search shows that this error case is documented by POSIX, implemented by other OS's (plural) and our code XXX commentary says we should be doing something about it but currently wasn't. Now it is.

In my view, the correct solution is to use ENOBUFS.

Roy


Home | Main Index | Thread Index | Old Index