Subject: Re: bin/10775: cron exits on stat failure
To: <>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-userlevel
Date: 08/26/2000 06:16:54
    Date:        Wed, 09 Aug 2000 17:20:55 +1000
    From:        Robert Elz <kre@munnari.OZ.AU>
    Message-ID:  <17699.965805655@mundamutti.cs.mu.OZ.AU>

  | The next time cron needs to restart on my web server, it will start a modified
  | version which logs the value or errno, and then immediately attempts the
  | stat() again, and if that works, just continues (otherwise still exits).

I had almost forgotten about this one...

This is the patch I made to cron (database.c) ...

*** database.c.OK	Sun Feb  1 01:40:26 1998
--- database.c	Tue Aug  8 11:36:23 2000
***************
*** 34,39 ****
--- 34,40 ----
  #include <fcntl.h>
  #include <sys/stat.h>
  #include <sys/file.h>
+ #include <errno.h>
  
  
  #define TMAX(a,b) ((a)>(b)?(a):(b))
***************
*** 62,69 ****
  	 * cached any of the database), we'll see the changes next time.
  	 */
  	if (stat(SPOOL_DIR, &statbuf) < OK) {
  		log_it("CRON", getpid(), "STAT FAILED", SPOOL_DIR);
! 		(void) exit(ERROR_EXIT);
  	}
  
  	/* track system crontab file
--- 63,76 ----
  	 * cached any of the database), we'll see the changes next time.
  	 */
  	if (stat(SPOOL_DIR, &statbuf) < OK) {
+ 		int err = errno;
+ 
  		log_it("CRON", getpid(), "STAT FAILED", SPOOL_DIR);
! 		log_it("CRON", getpid(), "STAT ERROR", strerror(err));
! 		if (stat(SPOOL_DIR, &statbuf) == OK)
! 			log_it("CRON", getpid(), "STAT RECOVERED", "one retry");
! 		else
! 			(void) exit(ERROR_EXIT);
  	}
  
And this is what has been happening recently (back as far as what I
still have logs - these are just the relevant lines of course) ...


Aug 19 23:28:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 19 23:28:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 19 23:28:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 20 01:07:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 20 01:07:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 20 01:07:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 20 20:43:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 20 20:43:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 20 20:43:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 22 10:07:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 22 10:07:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 22 10:07:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 23 15:42:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 23 15:42:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 23 15:42:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 23 18:58:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 23 18:58:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 23 18:58:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 24 11:29:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 24 11:29:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 24 11:29:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 24 11:40:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 24 11:40:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 24 11:40:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 24 23:40:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 24 23:40:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 24 23:40:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 
Aug 25 02:12:00 muckleshoot cron[212]: (CRON) STAT FAILED (tabs) 
Aug 25 02:12:00 muckleshoot cron[212]: (CRON) STAT ERROR (No such file or directory) 
Aug 25 02:12:00 muckleshoot cron[212]: (CRON) STAT RECOVERED (one retry) 


Cron hasn't exited since the patch was installed (not that I am suggesting
putting that patch into cron, it is a diagnostic, not a cure).

If there was ever any doubt this is a kernel problem, this squelches it.
Someone who is able should move this PR so it is listed as a kernel
problem rather than a cron problem.

  | If I can create an environment that will force this to happen...

This one I'm afraid I haven't had time to work on yet.   It is still on
my list of thing sto attempt.

kre