tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Plan for improving IP_PKTINFO socket option handling



Just to add to the general approval - nice analysis, would be great to
have this!

Thanks,
Alistair

On 28 December 2017 at 18:29, John Nemeth <jnemeth%cue.bc.ca@localhost> wrote:
> On Dec 28,  4:27pm, Christos Zoulas wrote:
> } Subject: Re: Plan for improving IP_PKTINFO socket option handling
> } In article <m2bmii25i8.fsf%thuvia.hamartun.priv.no@localhost>,
> } Tom Ivar Helbekkmo  <tih%hamartun.priv.no@localhost> wrote:
> } >I'd like to make some changes to the IPv4 socket option handling.
> } >Specifically, I want to change how the IP_PKTINFO options are handled.
> } >Before I attempt to change any code, I'd like input on the plan.
> } >
> } >First, a bit of background.
> } >
> } >I've been looking at getting the PowerDNS applications (authoritative
> } >name server, recursive name server, and DNS load balancer/firewall) to
> } >compile cleanly on NetBSD, and while I've been able to do so, it took
> } >some ugly workarounds.  Digging into the standards, the source code,
> } >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
> } >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
> } >as a partner to the traditional IP_RECVDSTADDR), I find that there are
> } >a number of differences, some for no good reason at all.  In a couple
> } >of cases, our code is just wrong.  Also, our documentation of these
> } >options is unclear, and contains errors.
> } >
> } >The reason these things exist at all is to enable the owner of a
> } >wildcard bound socket to find out which interface and address an
> } >incoming connection was actually received by, and, in the case of a
> } >UDP socket, to set the source address of an outgoing packet, typically
> } >so that the sender of a UDP request can recognize the response.  For
> } >ease of use, recvmsg() delivers the extra information as a control
> } >message which may then be supplied unchanged to sendmsg() when sending
> } >the response, setting the source address to the original destination.
> } >
> } >The IPv4 implementation of the *PKTINFO options is not standardized.
> } >It has been implemented several times, modeled, with varying degrees
> } >of accuracy, on the IPv6 version, standardized by RFC3542.
> } >
> } >Here's a summary of the IPv6 functionality:
> } >
> } >Option IPV6_RECVPKTINFO on socket:
> } >   recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
> } >
> } >Option IPV6_PKTINFO on socket:
> } >   sets the default source address to be used when sending packets
> } >
> } >Control message IPV6_PKTINFO from recvmsg():
> } >   contains an in6_pktinfo structure with the specific destination address
> } >
> } >Control message IPV6_PKTINFO to sendmsg():
> } >   supply an in6_pktinfo structure with the source address to be used
> } >
> } >All of these work the same way on BSD, Solaris, and Linux (as per
> } >RFC3542).  The in6_pktinfo structure holds the address (in ipi6_addr),
> } >and the interface index (ipi6_ifindex).
> } >
> } >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
> } >control messages with incoming packets, while the IPV6_PKTINFO option
> } >sets a default source address for the socket, and the IPV6_PKTINFO
> } >control message on an outgoing packet sets the source address for that
> } >particular packet.
> } >
> } >Now to the IPv4 implementation.  In Solaris, this was done as a direct
> } >translation of the IPv6 option set:
> } >
> } >Option IP_RECVPKTINFO on socket:
> } >   recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
> } >
> } >Option IP_PKTINFO on socket:
> } >   sets the default source address to be used when sending packets
> } >
> } >Control message IP_PKTINFO from recvmsg():
> } >   contains an in_pktinfo structure with the specific destination address
> } >
> } >Control message IP_PKTINFO to sendmsg():
> } >   supply an in_pktinfo structure with the source address to be used
> } >
> } >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
> } >instead using the IP_PKTINFO option to control the delivery of
> } >IP_PKTINFO control messages with incoming packets.  In doing so, they
> } >lost the ability to set a default outgoing source address.  This is
> } >arguably not a great loss, but it does break compatibility with
> } >Solaris, and it gratuitously breaks orthogonality with IPv6.
> } >
> } >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
> } >fields, they decided to add a new field, ipi_spec_dst.  The name is
> } >supposed to refer to the "specific destination" described in RFCs 1122
> } >and 1123.  They chose to differentiate between the destination address
> } >as supplied in the incoming IP packet itself, and the local address
> } >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
> } >said to be "the destination address of the routing table entry").  For
> } >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
> } >used as the source address.
> } >
> } >The only real example I can think of is where you listen on 0/0, and
> } >receive a packet on the loopback interface, addressed not to
> } >127.0.0.1, but, say, 127.1.2.3.  By the documentation, this should
> } >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
> } >ipi_spec_dst 127.0.0.1.  That's not how Linux works, though: it will
> } >set both to 127.1.2.3.  Sending a response, if you pass that control
> } >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
> } >(instead of the documented 127.0.0.1, which wouldn't work), and this
> } >may be a hint to why Linux puts the packet header destination in both
> } >fields.  On NetBSD, sending to 127.1.2.3 doesn't work at all.
> } >
> } >(This is a general difference in the handling of the loopback
> } >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
> } >127.1.2.3.  On NetBSD, you get a 'network unreachable' instead.)
> } >
> } >Now, on to NetBSD.
> } >
> } >We've mostly copied the way things work in Solaris and Linux, but with
> } >a couple of little twists that break source compatibility with both.
> } >
> } >First, we don't have the ipi_spec_dst field at all.  Since a lot of
> } >source code out there is written with Solaris and/or Linux in mind,
> } >this breaks compatibility at the source level.  I don't have a Solaris
> } >system handy for testing, but from what I observe on Linux, and how
> } >its loopback handling differs from NetBSD, as described above, we
> } >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
> } >
> } >Next, we do something really silly with the name IP_RECVPKTINFO.
> } >Remember that this is the option to turn on the generation of
> } >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
> } >changing the IP_PKTINFO option to do this instead of setting the
> } >default source address for outgoing packets?  Well, we've reinstated
> } >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
> } >control messages containing the *source* addresses of the incoming
> } >packets.  This is completely meaningless, as we have that information
> } >in the standard message header from recvmsg() already, so it'll never
> } >be used for this purpose.
> } >
> } >What it does do, though, is trick source code that supports the
> } >Solaris IP_RECVPKTINFO option into thinking we work the same way.  See
> } >external/bsd/dhcp/dist/common/socket.c for an example of functionality
> } >we're missing.  Note how they test for the presence of both symbols
> } >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
> } >of Solaris is present.  Other code I've read checks for IP_PKTINFO
> } >first, and then uses IP_RECVPKTINFO to decide whether to do things the
> } >Solaris or the Linux way.  Our use of the latter symbol breaks this.
> } >
> } >Finally, here's what I'd like to change:
> } >
> } >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
> } >
> } >2) Change the IP_RECVPKTINFO option to control the generation of
> } >   IP_PKTINFO control messages, the way it's done in Solaris.
> } >
> } >3) Remove the superfluous IP_RECVPKTINFO control message.
> } >
> } >4) Change the IP_PKTINFO option to do different things depending on
> } >   the parameter it's supplied with:
> } >   - If it's sizeof(int), assume it's being used as in Linux:
> } >     - If it's non-zero, turn on the IP_RECVPKTINFO option.
> } >     - If it's zero, turn off the IP_RECVPKTINFO option.
> } >   - If it's sizeof(struct in_pktinfo), assume it's being used as in
> } >     Solaris, to set a default for the source interface and/or
> } >     source address for outgoing packets on the socket.
> } >
> } >5) Fix our documentation.  Both ip(4) and ip6(4) contain errors in
> } >   their descriptions of these particular options and control messages.
> } >
> } >With this, we should have automatic source code compatibility with
> } >pretty much everything, and orthogonality between IPv6 and IPv4.
> }
> } I like and I support this proposal.
>
>      For what it's worth, me too.  :-)  The lack of source code
> compatibility has really been annoying me when working on some
> packages.  Also, tftpd not sending packets from the correct source
> address has been a problem (this may have been fixed in the mean
> time).  Also, good work Tom with the research!
>
> }-- End of excerpt from Christos Zoulas
>


Home | Main Index | Thread Index | Old Index