tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Plan for improving IP_PKTINFO socket option handling



On Dec 28,  4:27pm, Christos Zoulas wrote:
} Subject: Re: Plan for improving IP_PKTINFO socket option handling
} In article <m2bmii25i8.fsf%thuvia.hamartun.priv.no@localhost>,
} Tom Ivar Helbekkmo  <tih%hamartun.priv.no@localhost> wrote:
} >I'd like to make some changes to the IPv4 socket option handling.
} >Specifically, I want to change how the IP_PKTINFO options are handled.
} >Before I attempt to change any code, I'd like input on the plan.
} >
} >First, a bit of background.
} >
} >I've been looking at getting the PowerDNS applications (authoritative
} >name server, recursive name server, and DNS load balancer/firewall) to
} >compile cleanly on NetBSD, and while I've been able to do so, it took
} >some ugly workarounds.  Digging into the standards, the source code,
} >and the documentation from Solaris, Linux, and our own NetBSD (FreeBSD
} >doesn't do IP_PKTINFO, having instead created an IP_SENDSRCADDR option
} >as a partner to the traditional IP_RECVDSTADDR), I find that there are
} >a number of differences, some for no good reason at all.  In a couple
} >of cases, our code is just wrong.  Also, our documentation of these
} >options is unclear, and contains errors.
} >
} >The reason these things exist at all is to enable the owner of a
} >wildcard bound socket to find out which interface and address an
} >incoming connection was actually received by, and, in the case of a
} >UDP socket, to set the source address of an outgoing packet, typically
} >so that the sender of a UDP request can recognize the response.  For
} >ease of use, recvmsg() delivers the extra information as a control
} >message which may then be supplied unchanged to sendmsg() when sending
} >the response, setting the source address to the original destination.
} >
} >The IPv4 implementation of the *PKTINFO options is not standardized.
} >It has been implemented several times, modeled, with varying degrees
} >of accuracy, on the IPv6 version, standardized by RFC3542.
} >
} >Here's a summary of the IPv6 functionality:
} >
} >Option IPV6_RECVPKTINFO on socket:
} >   recvmsg() will supply IPV6_PKTINFO cmsgs for incoming packets
} >
} >Option IPV6_PKTINFO on socket:
} >   sets the default source address to be used when sending packets
} >
} >Control message IPV6_PKTINFO from recvmsg():
} >   contains an in6_pktinfo structure with the specific destination address
} >   
} >Control message IPV6_PKTINFO to sendmsg():
} >   supply an in6_pktinfo structure with the source address to be used
} >
} >All of these work the same way on BSD, Solaris, and Linux (as per
} >RFC3542).  The in6_pktinfo structure holds the address (in ipi6_addr),
} >and the interface index (ipi6_ifindex).
} >
} >Note how the IPV6_RECVPKTINFO option is used to request IPV6_PKTINFO
} >control messages with incoming packets, while the IPV6_PKTINFO option
} >sets a default source address for the socket, and the IPV6_PKTINFO
} >control message on an outgoing packet sets the source address for that
} >particular packet.
} >
} >Now to the IPv4 implementation.  In Solaris, this was done as a direct
} >translation of the IPv6 option set:
} >
} >Option IP_RECVPKTINFO on socket:
} >   recvmsg() will supply IP_PKTINFO cmsgs for incoming packets
} >
} >Option IP_PKTINFO on socket:
} >   sets the default source address to be used when sending packets
} >
} >Control message IP_PKTINFO from recvmsg():
} >   contains an in_pktinfo structure with the specific destination address
} >
} >Control message IP_PKTINFO to sendmsg():
} >   supply an in_pktinfo structure with the source address to be used
} >
} >Then Linux almost copied this scheme, but they dropped IP_RECVPKTINFO,
} >instead using the IP_PKTINFO option to control the delivery of
} >IP_PKTINFO control messages with incoming packets.  In doing so, they
} >lost the ability to set a default outgoing source address.  This is
} >arguably not a great loss, but it does break compatibility with
} >Solaris, and it gratuitously breaks orthogonality with IPv6.
} >
} >Next, while Solaris and Linux still have the ipi_ifindex and ipi_addr
} >fields, they decided to add a new field, ipi_spec_dst.  The name is
} >supposed to refer to the "specific destination" described in RFCs 1122
} >and 1123.  They chose to differentiate between the destination address
} >as supplied in the incoming IP packet itself, and the local address
} >the packet was, in fact, delivered to (specifically, ipi_spec_dst is
} >said to be "the destination address of the routing table entry").  For
} >outgoing packets, the IP_PKTINFO option's ipi_spec_dst field will be
} >used as the source address.
} >
} >The only real example I can think of is where you listen on 0/0, and
} >receive a packet on the loopback interface, addressed not to
} >127.0.0.1, but, say, 127.1.2.3.  By the documentation, this should
} >give an IP_PKTINFO control message with ipi_addr set to 127.1.2.3, and
} >ipi_spec_dst 127.0.0.1.  That's not how Linux works, though: it will
} >set both to 127.1.2.3.  Sending a response, if you pass that control
} >message unchanged to sendmsg(), you'll be sending from 127.1.2.3
} >(instead of the documented 127.0.0.1, which wouldn't work), and this
} >may be a hint to why Linux puts the packet header destination in both
} >fields.  On NetBSD, sending to 127.1.2.3 doesn't work at all.
} >
} >(This is a general difference in the handling of the loopback
} >interface: if you 'ping 127.1.2.3' on Linux, you get responses from
} >127.1.2.3.  On NetBSD, you get a 'network unreachable' instead.)
} >
} >Now, on to NetBSD.
} >
} >We've mostly copied the way things work in Solaris and Linux, but with
} >a couple of little twists that break source compatibility with both.
} >
} >First, we don't have the ipi_spec_dst field at all.  Since a lot of
} >source code out there is written with Solaris and/or Linux in mind,
} >this breaks compatibility at the source level.  I don't have a Solaris
} >system handy for testing, but from what I observe on Linux, and how
} >its loopback handling differs from NetBSD, as described above, we
} >could just toss in a "#define ipi_spec_dst ipi_addr" and be good.
} >
} >Next, we do something really silly with the name IP_RECVPKTINFO.
} >Remember that this is the option to turn on the generation of
} >IP_PKTINFO control messages for recvmsg(), and that Linux dropped it,
} >changing the IP_PKTINFO option to do this instead of setting the
} >default source address for outgoing packets?  Well, we've reinstated
} >the option, but in NetBSD it enables the generation of IP_RECVPKTINFO
} >control messages containing the *source* addresses of the incoming
} >packets.  This is completely meaningless, as we have that information
} >in the standard message header from recvmsg() already, so it'll never
} >be used for this purpose.
} >
} >What it does do, though, is trick source code that supports the
} >Solaris IP_RECVPKTINFO option into thinking we work the same way.  See
} >external/bsd/dhcp/dist/common/socket.c for an example of functionality
} >we're missing.  Note how they test for the presence of both symbols
} >IP_PKTINFO and IP_RECVPKTINFO, and then assume that the functionality
} >of Solaris is present.  Other code I've read checks for IP_PKTINFO
} >first, and then uses IP_RECVPKTINFO to decide whether to do things the
} >Solaris or the Linux way.  Our use of the latter symbol breaks this.
} >
} >Finally, here's what I'd like to change:
} >
} >1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
} >
} >2) Change the IP_RECVPKTINFO option to control the generation of
} >   IP_PKTINFO control messages, the way it's done in Solaris.
} >
} >3) Remove the superfluous IP_RECVPKTINFO control message.
} >
} >4) Change the IP_PKTINFO option to do different things depending on
} >   the parameter it's supplied with:
} >   - If it's sizeof(int), assume it's being used as in Linux:
} >     - If it's non-zero, turn on the IP_RECVPKTINFO option.
} >     - If it's zero, turn off the IP_RECVPKTINFO option.
} >   - If it's sizeof(struct in_pktinfo), assume it's being used as in
} >     Solaris, to set a default for the source interface and/or
} >     source address for outgoing packets on the socket.
} >
} >5) Fix our documentation.  Both ip(4) and ip6(4) contain errors in
} >   their descriptions of these particular options and control messages.
} >
} >With this, we should have automatic source code compatibility with
} >pretty much everything, and orthogonality between IPv6 and IPv4.
} 
} I like and I support this proposal.

     For what it's worth, me too.  :-)  The lack of source code
compatibility has really been annoying me when working on some
packages.  Also, tftpd not sending packets from the correct source
address has been a problem (this may have been fixed in the mean
time).  Also, good work Tom with the research!

}-- End of excerpt from Christos Zoulas


Home | Main Index | Thread Index | Old Index