Re: IP_HDRINCL byte ordering

To: Tech net <tech-net%netbsd.org@localhost>
Subject: Re: IP_HDRINCL byte ordering
From: Christoph Badura <bad%bsd.de@localhost>
Date: Tue, 24 Jun 2025 00:10:52 +0200

On Mon, Jun 23, 2025 at 04:33:25PM +0100, Roy Marples wrote:
> Since inception on NetBSD, IP_HDRINCL for RAW sockets requires ip_len and
> ip_off to by sent in host byte order.
> https://nxr.netbsd.org/xref/src/sys/netinet/raw_ip.c#369
> 
> This makes literally no sense because reading from the same socket you get
> ip_len and ip_off in network byte order.

You do?  How do you explain:
https://nxr.netbsd.org/xref/src/sys/netinet/raw_ip.c?a=true#176 ff.:

	/*
	 * XXX Compatibility: programs using raw IP expect ip_len
	 * XXX to have the header length subtracted, and in host order.
	 * XXX ip_off is also expected to be host order.
	 */
	hlen = ip->ip_hl << 2;
	ip->ip_len = ntohs(ip->ip_len) - hlen;
	NTOHS(ip->ip_off);

Maybe you are confusing reading raw packets from bpf sockets with reading
from RAW sockets?

> Writing to a bpf socket, ip_len and ip_off have to be done in network byte order.
> No other part of the system behaves like this (that I know of) and it's
> not documented in ip(4).

That could be easily rectified by inserting the following lines from the
macOS ip(4) man page.

	.Ed
	.sp .5
	.Pp
	Note that 
	the ip_off and ip_len fields are in host byte order.
	.Pp

Or maybe it should be made even more clear that this is the case when
reading and writing.

> Every other OS does not do this. Or if they did, they don't do it now.

macOS does.  The above is from macOS 15.5 with Xcode 16.4 installed.

> Our Linux compat support for this socket option does nothing special here,
> so it won't work for Linux binaries.

That, of course, should be fixed.

> Attached is a patch to fix this by adding IP_HDRINCL_RAW and mapping the
> linux compat socket option to this.
> This allows old binaries and old software to compile and work as before.
> 
> Portable software should then be written thusly, assuming all target
> platforms are modern.
> 
>      #ifdef IP_HDRINCL_RAW
>              setsockopt(s, IPPROTO_IP, IP_HDRINCL_RAW, &hincl, sizeof(hincl));
>      #else
>              #ifdef __NetBSD__
>              // Either error here or handle byte swapping yourself as before
>              #endif
>              setsockopt(s, IPPROTO_IP, IP_HDRINCL, &hincl, sizeof(hincl));
>      #endif
> 
> Comments, as always, welcome.

The above example doesn't tell the full story.  You forgot to give an
representative example for the actual packet sending code.

You'd have to switch from something like:

	#if defined(__APPLE__) || defined(__NetBSD__)
		ip->ip_off = offset;
		ip->ip_len = len;
	#else
		ip->ip_off = htons(offset);
		ip->ip_len = htons(len);
	#endif
		write(fd, &pkg, len);

to something like:

	#if defined(__APPLE__) !! defined(__NetBSD__) && !defined(IP_HDRINCL_RAW)
or maybe 
	#if defined(__APPLE__) !! defined(__NetBSD__) && !__NetBSD_Prereq(10,99,XX)
		ip->ip_off = offset;
		ip->ip_len = len;
	#else
		ip->ip_off = htons(offset);
		ip->ip_len = htons(len);
	#endif
		write(fd, &pkg, len);

NB: this is entirely off the top of my head with out me investing the time
to actually verify that I got the #ifdefs correct, although I think they
are.  And anyway, you should be able to get the gist of it.

Because the code still has to work on NetBSD 10 and NetBSD 9.

So, my question is, apart from introducing more, more complex and harder to
get correct and read #ifdefs and more complex code in raw_ip.c, how
does the proposed change alter and/or improve the situation?  (Let's ignore
the probably minimal cost of the run-time branching introduce by the
patch.)

--chris

References:
- IP_HDRINCL byte ordering
  - From: Roy Marples

Prev by Date: Re: IP_HDRINCL byte ordering
Next by Date: Re: Import nd6_output_ifp routing function from FreeBSD
Previous by Thread: Re: IP_HDRINCL byte ordering
Indexes:

Home | Main Index | Thread Index | Old Index