Subject: Allow pkill races without aborting?
To: None <tech-userlevel@NetBSD.org, ad@netbsd.org>
From: Dave Sainty <dave@dtsp.co.nz>
List: tech-userlevel
Date: 07/16/2005 19:34:29
Currently pkill exits immediately with an internal error if kill()
fails because the process has disappeared.  This can happen repeatably
if a previous kill() has an implication of shutting down another
process that pkill will also try and kill.

E.g. as an outcome of my infrared remote power off button :) I'm left
with...

% pgrep -lf poweroff
7516 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
4218 rsh chartreuse poweroff &
4428 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
10524 rsh chartreuse poweroff &
14492 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
8425 rsh chartreuse poweroff &
11508 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
7103 rsh chartreuse poweroff &
11502 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
895 rsh chartreuse poweroff &
4965 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
3272 rsh chartreuse poweroff &
9239 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
19705 rsh chartreuse poweroff &

% pkill -f poweroff
pkill: signalling pid 4218: No such process

% pgrep -lf poweroff
4428 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
10524 rsh chartreuse poweroff &
14492 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
8425 rsh chartreuse poweroff &
11508 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
7103 rsh chartreuse poweroff &
11502 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
895 rsh chartreuse poweroff &
4965 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
3272 rsh chartreuse poweroff &
9239 rcmd chartreuse.dave.dtsp.co.nz -l dave poweroff &
19705 rsh chartreuse poweroff &

The same scenario could also occur when one of the processes happens
to exit at the same time as the pkill is running.  This is a quite
plausible event, and it seems reasonable for pkill to simply accept
that it got there too late for that process - and continue.

The current behaviour doesn't seem right, though there is more than
one alternative.  Should it always carry on after any kind of kill()
error?  Should it output error messages as they occur?

The case of the process disappearing feels like a largely irrelevant
one - quite often it will simply be a matter of timing.  So I think an
improvement would be to simply ignore the ESRCH error, but still exit
on any other error.

However, an exit level of 1 seems appropriate if pkill never matches a
process that it successfully signalled (or, indeed, if pgrep never
matches a process that it successfully output).

So, I suggest the following patch:

--- src/usr.bin/pkill/pkill.c.orig	2005-05-30 22:30:36.000000000 +1200
+++ src/usr.bin/pkill/pkill.c	2005-07-16 19:12:19.000000000 +1200
@@ -108,8 +108,8 @@
 
 int	main(int, char **);
 void	usage(void);
-void	killact(struct kinfo_proc2 *);
-void	grepact(struct kinfo_proc2 *);
+int	killact(struct kinfo_proc2 *);
+int	grepact(struct kinfo_proc2 *);
 void	makelist(struct listhead *, enum listtype, char *);
 
 int
@@ -119,7 +119,7 @@
 	extern int optind;
 	char buf[_POSIX2_LINE_MAX], *mstr, **pargv, *p, *q;
 	int i, j, ch, bestidx, rv, criteria;
-	void (*action)(struct kinfo_proc2 *);
+	int (*action)(struct kinfo_proc2 *);
 	struct kinfo_proc2 *kp;
 	struct list *li;
 	u_int32_t bestsec, bestusec;
@@ -398,8 +398,7 @@
 		if ((kp->p_flag & P_SYSTEM) != 0)
 			continue;
 
-		rv = 1;
-		(*action)(kp);
+		rv |= (*action)(kp);
 	}
 
 	exit(rv ? STATUS_MATCH : STATUS_NOMATCH);
@@ -423,22 +422,38 @@
 	exit(STATUS_ERROR);
 }
 
-void
+int
 killact(struct kinfo_proc2 *kp)
 {
+	if (kill(kp->p_pid, signum) == -1) {
+		if (errno == ESRCH)
+			/*
+			 * The process disappeared between us matching
+			 * it and us signalling it.  Return 0 to
+			 * indicate that the process should not be
+			 * considered a match.
+			 */
+			return 0;
 
-	if (kill(kp->p_pid, signum) == -1)
 		err(STATUS_ERROR, "signalling pid %d", (int)kp->p_pid);
+	}
+
+	return 1;
 }
 
-void
+int
 grepact(struct kinfo_proc2 *kp)
 {
 	char **argv;
 
 	if (longfmt && matchargs) {
 		if ((argv = kvm_getargv2(kd, kp, 0)) == NULL)
-			return;
+			/*
+			 * The process disappeared?  Return 0 to
+			 * indicate that the process should not be
+			 * considered a match.
+			 */
+			return 0;
 
 		printf("%d ", (int)kp->p_pid);
 		for (; *argv != NULL; argv++) {
@@ -452,6 +467,8 @@
 		printf("%d", (int)kp->p_pid);
 
 	printf("%s", delim);
+
+	return 1;
 }
 
 void