Subject: implementing closeall via a syscall
To: None <tech-kern@netbsd.org>
From: mouss <usebsd@free.fr>
List: tech-kern
Date: 01/04/2004 23:44:29
This is a multi-part message in MIME format.
--------------070908070407060203050908
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

I have implemented a syscall to close all descriptors from some value to 
the max open, that is:
	closem(k) will close descriptors k, k+1, ..., max_open_fd

[rationale]
dameons (and other apps) sometimes need to close almost all descriptors.
The primary method to do this was to call close on fds from k to N, 
where N is either a fixed value or the result of a function (SOPEN_MAX, 
getrlimit, getdtablesize, ...). Unfortunately, this has two problems:
	* a program may lower its limits while having a lot of fds open, so the 
return value of getrlimit, sysconf, ... do not necessarily match the 
number of open files inherited from a parent.
	* there are too many useless syscalls

An alternative is the use of /proc/ and close each open fd. This gets 
the open fds right, but still consumes many syscalls. While this may be 
acceptable, procfs is not necessarily the right place (kernfs maybe?).

AIX has a F_CLOSEM cmd to fcntl to do just that. I originally intended 
to implement this, but fcntl code checks that the fd arg is valid, which 
is not relevant for the closem() function. Also, I got comments (a very 
long time ago) that this would change the semantics of fcntl (which up 
so far acts on a single fd and doesn't touch other fds), which seems a 
reasonable counter-arg. Also, I'm not aware of any unix that followed 
the aix path, so chances are this won't happen, so compatibility is not 
critical.

Thus the syscall approach...

[name]
How to name the syscall? for now, I just called it closem().
closeall() would be a bad name because the current closeall() closes 
_all_ descriptors, so confusion would result if the same name is reused.
Also, closeall may be present in user apps?

solaris has closefrom(), which does the same thing (using /proc if I'm 
not mistaken). So that would be a better name.

[questions]
- is such a thing desired/desirable?
- can someone review the code to check it's correct?
(the libc part is missing, as well as the manpage. I tested it using 
syscall() directly, but I might have missed something).

mouss



--------------070908070407060203050908
Content-Type: text/plain;
 name="closem.diffs"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="closem.diffs"

*** kern/init_sysent.c.orig	Wed Nov 19 13:02:11 2003
--- kern/init_sysent.c	Sun Jan  4 22:55:53 2004
***************
*** 954,960 ****
  	{ 4, s(struct sys_fsync_range_args), 0,
  	    sys_fsync_range },			/* 354 = fsync_range */
  	{ 0, 0, 0,
! 	    sys_nosys },			/* 355 = filler */
  	{ 0, 0, 0,
  	    sys_nosys },			/* 356 = filler */
  	{ 0, 0, 0,
--- 954,960 ----
  	{ 4, s(struct sys_fsync_range_args), 0,
  	    sys_fsync_range },			/* 354 = fsync_range */
  	{ 0, 0, 0,
! 	    sys_closem },			/* 355 = filler */
  	{ 0, 0, 0,
  	    sys_nosys },			/* 356 = filler */
  	{ 0, 0, 0,
*** kern/kern_descrip.c.orig	Sun Jan  4 18:57:42 2004
--- kern/kern_descrip.c	Sun Jan  4 22:53:28 2004
***************
*** 610,615 ****
--- 610,646 ----
  	return (fdrelease(p, fd));
  }
  
+ 
+ /*
+  * Close multiple file descriptors.
+  */
+ /* ARGSUSED */
+ int
+ sys_closem(struct lwp *l, void *v, register_t *retval)
+ {
+ 	struct sys_close_args /* {
+ 		syscallarg(int)	fd;
+ 	} */ *uap = v;
+ 	int		fd;
+ 	struct filedesc	*fdp;
+ 	struct proc *p;
+ 	int i;
+ 
+ 	p = l->l_proc;
+ 	fd = SCARG(uap, fd);
+ 	fdp = p->p_fd;
+ 
+ 	if ((u_int) fd >= fdp->fd_nfiles)
+ 		return (EBADF);
+ 
+ 	for (i=fdp->fd_lastfile; i>=fd; i--) {
+ 		fdrelease(p, i);
+ 	}
+ 
+ 	return 0;
+ }
+ 
+ 
  /*
   * Return status information about a file descriptor.
   */
*** kern/syscalls.c.orig	Sun Jan  4 22:00:13 2004
--- kern/syscalls.c	Sun Jan  4 22:52:44 2004
***************
*** 494,497 ****
--- 494,498 ----
  	"#352 (unimplemented sys_sched_get_priority_min)",		/* 352 = unimplemented sys_sched_get_priority_min */
  	"#353 (unimplemented sys_sched_rr_get_interval)",		/* 353 = unimplemented sys_sched_rr_get_interval */
  	"fsync_range",			/* 354 = fsync_range */
+ 	"closem",			/* 355 = closem */
  };
*** kern/syscalls.master.orig	Sun Jan  4 21:53:25 2004
--- kern/syscalls.master	Sun Jan  4 21:55:10 2004
***************
*** 708,710 ****
--- 708,715 ----
  
  354	STD		{ int sys_fsync_range(int fd, int flags, off_t start, \
  			    off_t length); }
+ 
+ ;
+ ;
+ ;
+ 355	STD		{ int sys_closem(int fd); }
*** sys/syscall.h.orig	Sun Jan  4 22:01:34 2004
--- sys/syscall.h	Sun Jan  4 22:54:00 2004
***************
*** 973,977 ****
  /* syscall: "fsync_range" ret: "int" args: "int" "int" "off_t" "off_t" */
  #define	SYS_fsync_range	354
  
! #define	SYS_MAXSYSCALL	355
  #define	SYS_NSYSENT	512
--- 973,980 ----
  /* syscall: "fsync_range" ret: "int" args: "int" "int" "off_t" "off_t" */
  #define	SYS_fsync_range	354
  
! /* syscall: "closem" ret: "int" args: "int" */
! #define	SYS_closem	355
! 
! #define	SYS_MAXSYSCALL	356
  #define	SYS_NSYSENT	512
*** sys/syscallargs.h.orig	Sun Jan  4 22:01:43 2004
--- sys/syscallargs.h	Sun Jan  4 22:54:09 2004
***************
*** 1476,1481 ****
--- 1476,1483 ----
  
  int	sys_close(struct lwp *, void *, register_t *);
  
+ int	sys_closem(struct lwp *, void *, register_t *);
+ 
  int	sys_wait4(struct lwp *, void *, register_t *);
  
  int	compat_43_sys_creat(struct lwp *, void *, register_t *);

--------------070908070407060203050908--