Subject: IP_HDRINCL revisited
To: None <tech-net@netbsd.org>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-net
Date: 01/21/1999 13:32:53
Hi folks...

I'm finally getting around to revisiting IP_HDRINCL.  As you all probably
know, historic BSD IP_HDRINCL implementations specify that it's a raw
IP packet, including header, with everything as it will go out onto
the network _EXCEPT_ the ip_len field in the header.  This is in host
order, as it is used to represent the length of the packet.

This is somewhat annoying for a couple of reasons:

	(1) Linux and some other OSs use ip_len in network order
	    for IP_HDRINCL.

	(2) If you have a program (like e.g. our dhcp server) which can
	    interface to many network access methods, you have to create
	    the IP header differently to send it w/ a SOCK_RAW socket
	    or a BPF.

	(3) "If it's really a raw IP packet, why isn't it like it will
	    be when it goes out onto the wire?"

Fixing this is easy enough, and one can even retain compatibility with
the old semantics by versioning the socket option.

However, it creates an interesting asymmetry... The two fields in the
IP header that NetBSD-current (1.3I, 1.3J once I commit the IP_HDRINCL
change) swaps are ip_len and ip_off.  These are for the convenience of
the stack, internally.  ip_off, once IP fragment reassembly is complete,
will be 0, so we can ignore that.  ip_len, however, will still be in host
order.

This is not a big deal for most applications, since they never see an IP
header.  However, if you're using SOCK_RAW to do IP, you get the IP header
with your datagram.  ip_len is still in host order.  This is now _different_
that when you transmit with IP_HDRINCL.

So, my question is:

	(1) Should we bother making the IP_HDRINCL change, and just keep
	    the traditional BSD semantics of ip_len in host order, or

	(2) If we do change IP_HDRINCL, should we also change rip_input()
	    to swap ip_len back into network order before handing it off
	    to the socket, so that the byte order will be consistent for
	    sending and receiving?

        -- Jason R. Thorpe <thorpej@nas.nasa.gov>