tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Plan for improving IP_PKTINFO socket option handling
On Dec 28, 4:27pm, Christos Zoulas wrote:
} Subject: Re: Plan for improving IP_PKTINFO socket option handling
} In article <m2bmii25i8.fsf%thuvia.hamartun.priv.no@localhost>,
} Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> wrote:
} >I'd like to make some changes to the IPv4 socket option handling.
} >Specifically, I want to change how the IP_PKTINFO options are handled.
} >Before I attempt to change any code, I'd like input on the plan.
} >
} >First, a bit of background.
} >
} >I've been looking at getting the PowerDNS applications (authoritative
} >name server, recursive name server, and DNS load balancer/firewall) to
} >compile cleanly on NetBSD, and while I've been able to do so, it took
} >some ugly workarounds. Digging into the standards, the source code,
} >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
} >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
} >as a partner to the traditional IP_RECVDSTADDR), I find that there are
} >a number of differences, some for no good reason at all. In a couple
} >of cases, our code is just wrong. Also, our documentation of these
} >options is unclear, and contains errors.
} >
} >The reason these things exist at all is to enable the owner of a
} >wildcard bound socket to find out which interface and address an
} >incoming connection was actually received by, and, in the case of a
} >UDP socket, to set the source address of an outgoing packet, typically
} >so that the sender of a UDP request can recognize the response. For
} >ease of use, recvmsg() delivers the extra information as a control
} >message which may then be supplied unchanged to sendmsg() when sending
} >the response, setting the source address to the original destination.
} >
} >The IPv4 implementation of the *PKTINFO options is not standardized.
} >It has been implemented several times, modeled, with varying degrees
} >of accuracy, on the IPv6 version, standardized by RFC3542.
} >
} >Here's a summary of the IPv6 functionality:
} >
} >Option IPV6_RECVPKTINFO on socket:
} > recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
} >
} >Option IPV6_PKTINFO on socket:
} > sets the default source address to be used when sending packets
} >
} >Control message IPV6_PKTINFO from recvmsg():
} > contains an in6_pktinfo structure with the specific destination address
} >
} >Control message IPV6_PKTINFO to sendmsg():
} > supply an in6_pktinfo structure with the source address to be used
} >
} >All of these work the same way on BSD, Solaris, and Linux (as per
} >RFC3542). The in6_pktinfo structure holds the address (in ipi6_addr),
} >and the interface index (ipi6_ifindex).
} >
} >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
} >control messages with incoming packets, while the IPV6_PKTINFO option
} >sets a default source address for the socket, and the IPV6_PKTINFO
} >control message on an outgoing packet sets the source address for that
} >particular packet.
} >
} >Now to the IPv4 implementation. In Solaris, this was done as a direct
} >translation of the IPv6 option set:
} >
} >Option IP_RECVPKTINFO on socket:
} > recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
} >
} >Option IP_PKTINFO on socket:
} > sets the default source address to be used when sending packets
} >
} >Control message IP_PKTINFO from recvmsg():
} > contains an in_pktinfo structure with the specific destination address
} >
} >Control message IP_PKTINFO to sendmsg():
} > supply an in_pktinfo structure with the source address to be used
} >
} >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
} >instead using the IP_PKTINFO option to control the delivery of
} >IP_PKTINFO control messages with incoming packets. In doing so, they
} >lost the ability to set a default outgoing source address. This is
} >arguably not a great loss, but it does break compatibility with
} >Solaris, and it gratuitously breaks orthogonality with IPv6.
} >
} >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
} >fields, they decided to add a new field, ipi_spec_dst. The name is
} >supposed to refer to the "specific destination" described in RFCs 1122
} >and 1123. They chose to differentiate between the destination address
} >as supplied in the incoming IP packet itself, and the local address
} >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
} >said to be "the destination address of the routing table entry"). For
} >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
} >used as the source address.
} >
} >The only real example I can think of is where you listen on 0/0, and
} >receive a packet on the loopback interface, addressed not to
} >127.0.0.1, but, say, 127.1.2.3. By the documentation, this should
} >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
} >ipi_spec_dst 127.0.0.1. That's not how Linux works, though: it will
} >set both to 127.1.2.3. Sending a response, if you pass that control
} >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
} >(instead of the documented 127.0.0.1, which wouldn't work), and this
} >may be a hint to why Linux puts the packet header destination in both
} >fields. On NetBSD, sending to 127.1.2.3 doesn't work at all.
} >
} >(This is a general difference in the handling of the loopback
} >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
} >127.1.2.3. On NetBSD, you get a 'network unreachable' instead.)
} >
} >Now, on to NetBSD.
} >
} >We've mostly copied the way things work in Solaris and Linux, but with
} >a couple of little twists that break source compatibility with both.
} >
} >First, we don't have the ipi_spec_dst field at all. Since a lot of
} >source code out there is written with Solaris and/or Linux in mind,
} >this breaks compatibility at the source level. I don't have a Solaris
} >system handy for testing, but from what I observe on Linux, and how
} >its loopback handling differs from NetBSD, as described above, we
} >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
} >
} >Next, we do something really silly with the name IP_RECVPKTINFO.
} >Remember that this is the option to turn on the generation of
} >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
} >changing the IP_PKTINFO option to do this instead of setting the
} >default source address for outgoing packets? Well, we've reinstated
} >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
} >control messages containing the *source* addresses of the incoming
} >packets. This is completely meaningless, as we have that information
} >in the standard message header from recvmsg() already, so it'll never
} >be used for this purpose.
} >
} >What it does do, though, is trick source code that supports the
} >Solaris IP_RECVPKTINFO option into thinking we work the same way. See
} >external/bsd/dhcp/dist/common/socket.c for an example of functionality
} >we're missing. Note how they test for the presence of both symbols
} >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
} >of Solaris is present. Other code I've read checks for IP_PKTINFO
} >first, and then uses IP_RECVPKTINFO to decide whether to do things the
} >Solaris or the Linux way. Our use of the latter symbol breaks this.
} >
} >Finally, here's what I'd like to change:
} >
} >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
} >
} >2) Change the IP_RECVPKTINFO option to control the generation of
} > IP_PKTINFO control messages, the way it's done in Solaris.
} >
} >3) Remove the superfluous IP_RECVPKTINFO control message.
} >
} >4) Change the IP_PKTINFO option to do different things depending on
} > the parameter it's supplied with:
} > - If it's sizeof(int), assume it's being used as in Linux:
} > - If it's non-zero, turn on the IP_RECVPKTINFO option.
} > - If it's zero, turn off the IP_RECVPKTINFO option.
} > - If it's sizeof(struct in_pktinfo), assume it's being used as in
} > Solaris, to set a default for the source interface and/or
} > source address for outgoing packets on the socket.
} >
} >5) Fix our documentation. Both ip(4) and ip6(4) contain errors in
} > their descriptions of these particular options and control messages.
} >
} >With this, we should have automatic source code compatibility with
} >pretty much everything, and orthogonality between IPv6 and IPv4.
}
} I like and I support this proposal.
For what it's worth, me too. :-) The lack of source code
compatibility has really been annoying me when working on some
packages. Also, tftpd not sending packets from the correct source
address has been a problem (this may have been fixed in the mean
time). Also, good work Tom with the research!
}-- End of excerpt from Christos Zoulas
Home |
Main Index |
Thread Index |
Old Index