Subject: Re: sys_select() EBADF bug
To: matthew green <mrg@eterna.com.au>
From: Tad Hunt <tad@entrisphere.com>
List: tech-kern
Date: 11/14/2002 21:54:55
In message <1856.1037336207@splode.eterna.com.au>, you said:
;
;   		if (SCARG(uap, nd) > p->p_fd->fd_nfiles) {
;   			/* forgiving; slightly wrong */
;   			SCARG(uap, nd) = p->p_fd->fd_nfiles;
;   		}
;
;
;while this is marked as "slightly wrong" it is done so *because*
;it was actually chosen to be that way.  why, exactly, do you want
;to "fix" this (and break backwards compatibility for old programs,
;both binary & source.)  IMO it should be left alone.

Summary:
	I'm not clear that changing the current behavior will break
	backwards compatibility, because the current behavior
	varies.  Sometimes it will return EBADF, and sometimes it
	will just allow the select to continue, completely ignoring
	the bad fds.  Whether or not it returns EBADF or allows the
	select to proceed without signalling an error depends on
	the size of the kernel fd array, which the user process has
	no visibility into.  It just so happens that on my machine it
	starts out at size 20, but how is a user process to know this?

Detail:

So, let's take a look at the current behavior with this simple test
program attached below.

	sh$ ./a.out 5
	error Bad file descriptor
	sh$ ./a.out 19
	error Bad file descriptor
	sh$ ./a.out 20	# hangs forever
	^C
	sh$ 

Notice that sometimes select(2) returns EBADF, and sometimes it
just proceeds happily along ignoring the fd which is set but happens
to be greater than the size of the current process fd array, (which
starts out with 20 entries).  The user process knows nothing about
how large the kernel has grown this array.

Short of fstat()'ing every fd before putting it into the fd_set
(won't work in a multi-threaded application, and is really expensive!),
the process has no way to know if all of the files in the fd_set are
valid or not.

If the kernel array happens to grow above 20, then I close a bunch
of files, I end up with different behavior.  Assume I uncomment the
"dup2(0, fd+1);" line from the example program.  Now the select
always returns EBADF.

Now the fd array in the kernel has grown big enough that a select(2)
on fd 20 will return EBADF, instead of hanging.

According to the manpage, the user can rely on select(2) to return
EBADF if "One of the descriptor sets specified an invalid descriptor".

I would have liked to make the kernel return EBADF if there are
any fd's in the fd_set which are invalid, but it doesn't have enough
information about what the user process FD_SETSIZE is to be able
to constrain "nd" properly.  The best it can do is to return an
error if "nd" is bigger than the array.

The manpage already said that select(2) will return EINVAL for a bad
timeval, but the kernel *also* returns EINVAL for a "nd" which is
< 0:
	sys_select()
	{
		...
		if (SCARG(uap, nd) < 0)
			return (EINVAL);
		...
	}

This behavior is not currently documented, but some form of it has
been around for a Very Long Time.  In NetBSD 1.0, select(2) would
return EINVAL if "nd > FD_SETSIZE", so obviously this interface has
changed since then.  programs calling select(2) already have to handle
EINVAL, so I don't think this is an earth shattering change.

With this change, at least the select doesn't block:

	sh$ ./a.out 5
	error Bad file descriptor
	sh$ ./a.out 19
	error Bad file descriptor
	sh$ ./a.out 20
	error Invalid argument
	sh$ 

It helps select catch more completely a class of programming errors
which it is obviously intended to catch, without adding additional
runtime overhead.

-Tad

P.S.  Interesting behavior on Solaris.  Apparently, even though
their manpage says that it returns EINVAL if "nfds" < 0 or >=
FD_SETSIZE, it doesn't really.  It looks to me like it doesn't
range check the nfds parameter passed from userland, it just lets
the copy of the fd_set from userland segfault.  Hmm.  I don't think
select is a system call on Solaris.  Must be the library segfaulting
on the copy.

	tad:tad$ ./a.out 5
	error Bad file number
	tad:tad$ ./a.out 19
	error Bad file number
	tad:tad$ ./a.out 20
	error Bad file number
	tad:tad$ ./a.out 55
	error Bad file number
	tad:tad$ ./a.out 999
	error Bad file number
	tad:tad$ ./a.out 50000
	Segmentation Fault(coredump)
	tad:tad$ 

------------------------------------------------------------------
#include <stdio.h>
#include <sys/types.h>
#include <sys/time.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>

int
main(int argc, char *argv[])
{
	int r, fd;
	fd_set rfds;

	if(argc != 2) {
		printf("usage: %s fdnum\n", argv[0]);
		exit(1);
	}

	fd = strtoul(argv[1], NULL, 0);

	// dup2(0, fd+1); /* uncommenting gives different behavior */

	FD_ZERO(&rfds);
	FD_SET(fd, &rfds);

	r = select(fd+1, &rfds, NULL, NULL, NULL);
	if(r == -1) {
		printf("error %s\n", strerror(errno));
		exit(1);
	}

	printf("selected OK: %d\n", r);
	exit(0);
}