tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Plan for improving IP_PKTINFO socket option handling
In article <m2bmii25i8.fsf%thuvia.hamartun.priv.no@localhost>,
Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> wrote:
>I'd like to make some changes to the IPv4 socket option handling.
>Specifically, I want to change how the IP_PKTINFO options are handled.
>Before I attempt to change any code, I'd like input on the plan.
>
>First, a bit of background.
>
>I've been looking at getting the PowerDNS applications (authoritative
>name server, recursive name server, and DNS load balancer/firewall) to
>compile cleanly on NetBSD, and while I've been able to do so, it took
>some ugly workarounds. Digging into the standards, the source code,
>and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
>doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
>as a partner to the traditional IP_RECVDSTADDR), I find that there are
>a number of differences, some for no good reason at all. In a couple
>of cases, our code is just wrong. Also, our documentation of these
>options is unclear, and contains errors.
>
>The reason these things exist at all is to enable the owner of a
>wildcard bound socket to find out which interface and address an
>incoming connection was actually received by, and, in the case of a
>UDP socket, to set the source address of an outgoing packet, typically
>so that the sender of a UDP request can recognize the response. For
>ease of use, recvmsg() delivers the extra information as a control
>message which may then be supplied unchanged to sendmsg() when sending
>the response, setting the source address to the original destination.
>
>The IPv4 implementation of the *PKTINFO options is not standardized.
>It has been implemented several times, modeled, with varying degrees
>of accuracy, on the IPv6 version, standardized by RFC3542.
>
>Here's a summary of the IPv6 functionality:
>
>Option IPV6_RECVPKTINFO on socket:
> recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
>
>Option IPV6_PKTINFO on socket:
> sets the default source address to be used when sending packets
>
>Control message IPV6_PKTINFO from recvmsg():
> contains an in6_pktinfo structure with the specific destination address
>
>Control message IPV6_PKTINFO to sendmsg():
> supply an in6_pktinfo structure with the source address to be used
>
>All of these work the same way on BSD, Solaris, and Linux (as per
>RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
>and the interface index (ipi6_ifindex).
>
>Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
>control messages with incoming packets, while the IPV6_PKTINFO option
>sets a default source address for the socket, and the IPV6_PKTINFO
>control message on an outgoing packet sets the source address for that
>particular packet.
>
>Now to the IPv4 implementation. In Solaris, this was done as a direct
>translation of the IPv6 option set:
>
>Option IP_RECVPKTINFO on socket:
> recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
>
>Option IP_PKTINFO on socket:
> sets the default source address to be used when sending packets
>
>Control message IP_PKTINFO from recvmsg():
> contains an in_pktinfo structure with the specific destination address
>
>Control message IP_PKTINFO to sendmsg():
> supply an in_pktinfo structure with the source address to be used
>
>Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
>instead using the IP_PKTINFO option to control the delivery of
>IP_PKTINFO control messages with incoming packets. In doing so, they
>lost the ability to set a default outgoing source address. This is
>arguably not a great loss, but it does break compatibility with
>Solaris, and it gratuitously breaks orthogonality with IPv6.
>
>Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
>fields, they decided to add a new field, ipi_spec_dst. The name is
>supposed to refer to the "specific destination" described in RFCs 1122
>and 1123. They chose to differentiate between the destination address
>as supplied in the incoming IP packet itself, and the local address
>the packet was, in fact, delivered to (specifically, ipi_spec_dst is
>said to be "the destination address of the routing table entry"). For
>outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
>used as the source address.
>
>The only real example I can think of is where you listen on 0/0, and
>receive a packet on the loopback interface, addressed not to
>127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
>give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
>ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
>set both to 127.1.2.3. Sending a response, if you pass that control
>message unchanged to sendmsg(), you'll be sending from 127.1.2.3
>(instead of the documented 127.0.0.1, which wouldn't work), and this
>may be a hint to why Linux puts the packet header destination in both
>fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
>
>(This is a general difference in the handling of the loopback
>interface: if you 'ping 127.1.2.3' on Linux, you get responses from
>127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
>
>Now, on to NetBSD.
>
>We've mostly copied the way things work in Solaris and Linux, but with
>a couple of little twists that break source compatibility with both.
>
>First, we don't have the ipi_spec_dst field at all. Since a lot of
>source code out there is written with Solaris and/or Linux in mind,
>this breaks compatibility at the source level. I don't have a Solaris
>system handy for testing, but from what I observe on Linux, and how
>its loopback handling differs from NetBSD, as described above, we
>could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
>
>Next, we do something really silly with the name IP_RECVPKTINFO.
>Remember that this is the option to turn on the generation of
>IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
>changing the IP_PKTINFO option to do this instead of setting the
>default source address for outgoing packets? Well, we've reinstated
>the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
>control messages containing the *source* addresses of the incoming
>packets. This is completely meaningless, as we have that information
>in the standard message header from recvmsg() already, so it'll never
>be used for this purpose.
>
>What it does do, though, is trick source code that supports the
>Solaris IP_RECVPKTINFO option into thinking we work the same way. See
>external/bsd/dhcp/dist/common/socket.c for an example of functionality
>we're missing. Note how they test for the presence of both symbols
>IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
>of Solaris is present. Other code I've read checks for IP_PKTINFO
>first, and then uses IP_RECVPKTINFO to decide whether to do things the
>Solaris or the Linux way. Our use of the latter symbol breaks this.
>
>Finally, here's what I'd like to change:
>
>1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
>
>2) Change the IP_RECVPKTINFO option to control the generation of
> IP_PKTINFO control messages, the way it's done in Solaris.
>
>3) Remove the superfluous IP_RECVPKTINFO control message.
>
>4) Change the IP_PKTINFO option to do different things depending on
> the parameter it's supplied with:
> - If it's sizeof(int), assume it's being used as in Linux:
> - If it's non-zero, turn on the IP_RECVPKTINFO option.
> - If it's zero, turn off the IP_RECVPKTINFO option.
> - If it's sizeof(struct in_pktinfo), assume it's being used as in
> Solaris, to set a default for the source interface and/or
> source address for outgoing packets on the socket.
>
>5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
> their descriptions of these particular options and control messages.
>
>With this, we should have automatic source code compatibility with
>pretty much everything, and orthogonality between IPv6 and IPv4.
I like and I support this proposal.
christos
Home |
Main Index |
Thread Index |
Old Index