Subject: rip_ctloutput() and IP_HDRINCL
To: NetBSD Kernel <tech-kern@NetBSD.org>
From: Markus Mayer <mmayer@redback.com>
List: tech-kern
Date: 11/27/2007 12:10:49
This is a multi-part message in MIME format.
--------------040301050303060903090801
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I found what I believe is a little problem in the way rip_ctloutput() 
handles setting the IP_HDRINCL on an mbuf.

The diff I attached is against NetBSD 3.0, but I checked HEAD and it 
still has the same code.

I am talking about this code snippet. It checks to make sure that *m 
points to an area at least sizeof(int) bytes long. However, it does not 
check whether *m points to an area *bigger* than sizeof(int).

	case PRCO_SETOPT:
		switch (optname) {
		case IP_HDRINCL:
			if (*m == 0 || (*m)->m_len < sizeof (int))
				error = EINVAL;
			else {
				if (*mtod(*m, int *))
					inp->inp_flags |= INP_HDRINCL;
				else
					inp->inp_flags &= ~INP_HDRINCL;
			}
			if (*m != 0)
				(void) m_free(*m);
			break;


While it may seem unnecessary at first glance to check for "bigger than 
sizeof(int)", we actually ran into an issue because of this. Admittedly, 
it was unclean code in the userland application, which passed the 
address of a u_long to setsockopt() and not an int, but the kernel 
should catch such problems and either handle them or return an error.

We are running 64 bit code on a big endian CPU. You probably won't see 
any issues on little endian machines.

The problem is that dereferencing the mbuf pointer (mtod(*m, int *)) 
assumes that it is an integer pointer. However, if the pointer passed in 
by userland is a long, not an int, this code will be looking in the 
wrong location (on a big endian machine) if longs and ints are not the 
same size.

I have tried two approaches in the kernel, both of which seem to work. 
The second solution is probably the better one in the sense that it 
doesn't require any overhead over what's currently done and it is POSIX 
conformant (which requires the 4th argument to setsockopt to be an (int 
*) unless it is a pointer to a structure).

Work around it:

	case IP_HDRINCL:
		if (*m == 0 || (*m)->m_len < sizeof (int))
			error = EINVAL;
		else {
			if (m->m_len == sizeof(int) &&
			    *mtod(*m, int *))
			else if (m->m_len == sizeof(long) &&
			    *mtod(*m, long *))
				inp->inp_flags |= INP_HDRINCL;
			else
				inp->inp_flags &= ~INP_HDRINCL;


Or fail gracefully:

	case IP_HDRINCL:
		if (*m == 0 || (*m)->m_len != sizeof (int))
			error = EINVAL;
		else {
			if (*mtod(*m, int *))

The attached patch does the latter.

Regards,
-Markus

-- 
Markus Mayer
Redback Networks
An Ericsson Company

--------------040301050303060903090801
Content-Type: text/plain;
 name="raw_ip.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="raw_ip.c.diff"

Index: raw_ip.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/raw_ip.c,v
retrieving revision 1.86
diff -c -p -5 -r1.86 raw_ip.c
*** raw_ip.c	11 Mar 2005 06:16:16 -0000	1.86
--- raw_ip.c	27 Nov 2007 20:08:55 -0000
*************** rip_ctloutput(int op, struct socket *so,
*** 391,401 ****
  	} else switch (op) {
  
  	case PRCO_SETOPT:
  		switch (optname) {
  		case IP_HDRINCL:
! 			if (*m == 0 || (*m)->m_len < sizeof (int))
  				error = EINVAL;
  			else {
  				if (*mtod(*m, int *))
  					inp->inp_flags |= INP_HDRINCL;
  				else
--- 391,401 ----
  	} else switch (op) {
  
  	case PRCO_SETOPT:
  		switch (optname) {
  		case IP_HDRINCL:
! 			if (*m == 0 || (*m)->m_len != sizeof (int))
  				error = EINVAL;
  			else {
  				if (*mtod(*m, int *))
  					inp->inp_flags |= INP_HDRINCL;
  				else

--------------040301050303060903090801--