tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Plan for improving IP_PKTINFO socket option handling
Just to add to the general approval - nice analysis, would be great to
have this!
Thanks,
Alistair
On 28 December 2017 at 18:29, John Nemeth <jnemeth%cue.bc.ca@localhost> wrote:
> On Dec 28, 4:27pm, Christos Zoulas wrote:
> } Subject: Re: Plan for improving IP_PKTINFO socket option handling
> } In article <m2bmii25i8.fsf%thuvia.hamartun.priv.no@localhost>,
> } Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> wrote:
> } >I'd like to make some changes to the IPv4 socket option handling.
> } >Specifically, I want to change how the IP_PKTINFO options are handled.
> } >Before I attempt to change any code, I'd like input on the plan.
> } >
> } >First, a bit of background.
> } >
> } >I've been looking at getting the PowerDNS applications (authoritative
> } >name server, recursive name server, and DNS load balancer/firewall) to
> } >compile cleanly on NetBSD, and while I've been able to do so, it took
> } >some ugly workarounds. Digging into the standards, the source code,
> } >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
> } >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
> } >as a partner to the traditional IP_RECVDSTADDR), I find that there are
> } >a number of differences, some for no good reason at all. In a couple
> } >of cases, our code is just wrong. Also, our documentation of these
> } >options is unclear, and contains errors.
> } >
> } >The reason these things exist at all is to enable the owner of a
> } >wildcard bound socket to find out which interface and address an
> } >incoming connection was actually received by, and, in the case of a
> } >UDP socket, to set the source address of an outgoing packet, typically
> } >so that the sender of a UDP request can recognize the response. For
> } >ease of use, recvmsg() delivers the extra information as a control
> } >message which may then be supplied unchanged to sendmsg() when sending
> } >the response, setting the source address to the original destination.
> } >
> } >The IPv4 implementation of the *PKTINFO options is not standardized.
> } >It has been implemented several times, modeled, with varying degrees
> } >of accuracy, on the IPv6 version, standardized by RFC3542.
> } >
> } >Here's a summary of the IPv6 functionality:
> } >
> } >Option IPV6_RECVPKTINFO on socket:
> } > recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
> } >
> } >Option IPV6_PKTINFO on socket:
> } > sets the default source address to be used when sending packets
> } >
> } >Control message IPV6_PKTINFO from recvmsg():
> } > contains an in6_pktinfo structure with the specific destination address
> } >
> } >Control message IPV6_PKTINFO to sendmsg():
> } > supply an in6_pktinfo structure with the source address to be used
> } >
> } >All of these work the same way on BSD, Solaris, and Linux (as per
> } >RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
> } >and the interface index (ipi6_ifindex).
> } >
> } >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
> } >control messages with incoming packets, while the IPV6_PKTINFO option
> } >sets a default source address for the socket, and the IPV6_PKTINFO
> } >control message on an outgoing packet sets the source address for that
> } >particular packet.
> } >
> } >Now to the IPv4 implementation. In Solaris, this was done as a direct
> } >translation of the IPv6 option set:
> } >
> } >Option IP_RECVPKTINFO on socket:
> } > recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
> } >
> } >Option IP_PKTINFO on socket:
> } > sets the default source address to be used when sending packets
> } >
> } >Control message IP_PKTINFO from recvmsg():
> } > contains an in_pktinfo structure with the specific destination address
> } >
> } >Control message IP_PKTINFO to sendmsg():
> } > supply an in_pktinfo structure with the source address to be used
> } >
> } >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
> } >instead using the IP_PKTINFO option to control the delivery of
> } >IP_PKTINFO control messages with incoming packets. In doing so, they
> } >lost the ability to set a default outgoing source address. This is
> } >arguably not a great loss, but it does break compatibility with
> } >Solaris, and it gratuitously breaks orthogonality with IPv6.
> } >
> } >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
> } >fields, they decided to add a new field, ipi_spec_dst. The name is
> } >supposed to refer to the "specific destination" described in RFCs 1122
> } >and 1123. They chose to differentiate between the destination address
> } >as supplied in the incoming IP packet itself, and the local address
> } >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
> } >said to be "the destination address of the routing table entry"). For
> } >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
> } >used as the source address.
> } >
> } >The only real example I can think of is where you listen on 0/0, and
> } >receive a packet on the loopback interface, addressed not to
> } >127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
> } >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
> } >ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
> } >set both to 127.1.2.3. Sending a response, if you pass that control
> } >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
> } >(instead of the documented 127.0.0.1, which wouldn't work), and this
> } >may be a hint to why Linux puts the packet header destination in both
> } >fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
> } >
> } >(This is a general difference in the handling of the loopback
> } >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
> } >127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
> } >
> } >Now, on to NetBSD.
> } >
> } >We've mostly copied the way things work in Solaris and Linux, but with
> } >a couple of little twists that break source compatibility with both.
> } >
> } >First, we don't have the ipi_spec_dst field at all. Since a lot of
> } >source code out there is written with Solaris and/or Linux in mind,
> } >this breaks compatibility at the source level. I don't have a Solaris
> } >system handy for testing, but from what I observe on Linux, and how
> } >its loopback handling differs from NetBSD, as described above, we
> } >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
> } >
> } >Next, we do something really silly with the name IP_RECVPKTINFO.
> } >Remember that this is the option to turn on the generation of
> } >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
> } >changing the IP_PKTINFO option to do this instead of setting the
> } >default source address for outgoing packets? Well, we've reinstated
> } >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
> } >control messages containing the *source* addresses of the incoming
> } >packets. This is completely meaningless, as we have that information
> } >in the standard message header from recvmsg() already, so it'll never
> } >be used for this purpose.
> } >
> } >What it does do, though, is trick source code that supports the
> } >Solaris IP_RECVPKTINFO option into thinking we work the same way. See
> } >external/bsd/dhcp/dist/common/socket.c for an example of functionality
> } >we're missing. Note how they test for the presence of both symbols
> } >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
> } >of Solaris is present. Other code I've read checks for IP_PKTINFO
> } >first, and then uses IP_RECVPKTINFO to decide whether to do things the
> } >Solaris or the Linux way. Our use of the latter symbol breaks this.
> } >
> } >Finally, here's what I'd like to change:
> } >
> } >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
> } >
> } >2) Change the IP_RECVPKTINFO option to control the generation of
> } > IP_PKTINFO control messages, the way it's done in Solaris.
> } >
> } >3) Remove the superfluous IP_RECVPKTINFO control message.
> } >
> } >4) Change the IP_PKTINFO option to do different things depending on
> } > the parameter it's supplied with:
> } > - If it's sizeof(int), assume it's being used as in Linux:
> } > - If it's non-zero, turn on the IP_RECVPKTINFO option.
> } > - If it's zero, turn off the IP_RECVPKTINFO option.
> } > - If it's sizeof(struct in_pktinfo), assume it's being used as in
> } > Solaris, to set a default for the source interface and/or
> } > source address for outgoing packets on the socket.
> } >
> } >5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
> } > their descriptions of these particular options and control messages.
> } >
> } >With this, we should have automatic source code compatibility with
> } >pretty much everything, and orthogonality between IPv6 and IPv4.
> }
> } I like and I support this proposal.
>
> For what it's worth, me too. :-) The lack of source code
> compatibility has really been annoying me when working on some
> packages. Also, tftpd not sending packets from the correct source
> address has been a problem (this may have been fixed in the mean
> time). Also, good work Tom with the research!
>
> }-- End of excerpt from Christos Zoulas
>
Home |
Main Index |
Thread Index |
Old Index