tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

tail: robustness



While debugging a problem in tail(1) (which resulted in
http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50322), I have run
across two other situations in which tail -F will stall indefinitely, or exit
prematurely, respectively.

Both problems are triggered only by borderline conditions, though, so I am not
sure if you feel they're worthy of inclusion.

Both problems can be triggered by running
$ touch /tmp/foo
$ tail -F /tmp/foo

in one shell, and

$ while true; do rm /tmp/foo; echo bar >/tmp/foo; done

in another shell.

Sooner or later, (in my experiemnts rather sooner than later), one of the
following two things happen:

a) /tmp/foo is removed and recreated after the last freopen(3), but before the
   vnode events are registered/polled using kevent(2), in which case we're
   never seeing the NOTE_DELETE event. (At least that's how I understand the
   situation).  The effect is that we're kevent(2)-ing a file that has already
   been unlinked, and since there's no timeout, tail will sit there forever.

b) /tmp/foo is removed and after the stat(2) and before the freopen(3)
   calls in this code from usr.bin/tail/forward.c:

| if (fflag == 2 && fp != stdin &&
|     stat(fname, &statbuf) != -1) {
[...]
| 		fp = freopen(fname, "r", fp);
|		if (fp == NULL) {
|			ierr();
|			goto out;
|		}

..in which case tail terminates with ``No such file or directory''.
This is obviously incorrect given that tail -F should be unimpressed
by the file disappearing; furthermore it is the only actual exit
condition in this mode of operation.


To address a), kevent(2) could be given a timeout.  If it does time out, 
a stat(2) is in order to check whether the file has been replaced.
The following patch would achieve that, with an abitrarily chosen
timeout of five seconds.

-----------8<---------------------8<---------------------8<-----------
--- forward.c.orig	2015-10-10 01:39:10.000000000 +0200
+++ forward.c	2015-10-10 01:39:22.000000000 +0200
@@ -218,9 +218,15 @@
 			break;
 
 		case USE_KQUEUE:
-			if (kevent(kq, NULL, 0, ev, 1, NULL) == -1)
+			n = kevent(kq, NULL, 0, ev, 1, &(struct timespec){5,0});
+			if (n == -1)
 				xerr(1, "kevent");
 
+			if (n == 0) {
+				action = USE_SLEEP;
+				break;
+			}
+
 			if (ev[0].filter == EVFILT_VNODE) {
 				/* file was rotated, wait until it reappears */
 				action = USE_SLEEP;
-----------8<---------------------8<---------------------8<-----------



In order to address b), we should check errno for ENOENT when freopen(3)
gives us NULL.  Using freopen(3) is problematic, because it will fclose(3)
the old file even if the reopen itself fails.  So a more robust approach
is to fopen(3) first, and if successful, fclose(3) the old file.
Note that we cannot just fclose(3) the old file if we're not planning
to terminate, because the next iteration of the loop that encloses all
this will attempt to read from the file.

The following patch would address this issue:

-----------8<---------------------8<---------------------8<-----------
--- forward.c.orig	2015-10-10 01:36:47.000000000 +0200
+++ forward.c	2015-10-10 01:37:15.000000000 +0200
@@ -246,11 +246,17 @@
 				    statbuf.st_dev != sbp->st_dev ||
 				    statbuf.st_rdev != sbp->st_rdev ||
 				    statbuf.st_nlink == 0) {
-					fp = freopen(fname, "r", fp);
-					if (fp == NULL) {
+					FILE *nfp = fopen(fname, "r");
+					if (nfp == NULL) {
+						if (errno == ENOENT)
+							break;
 						ierr();
 						goto out;
 					}
+
+					fclose(fp);
+					fp = nfp;
+
 					*sbp = statbuf;
 					if (kq != -1)
 						action = ADD_EVENTS;
-----------8<---------------------8<---------------------8<-----------

Cheers,

Timo Buhrmester



Home | Main Index | Thread Index | Old Index