Subject: Re: bin/11087: syslogd not working, HUP doesn't fix, requires hard restart
To: None <gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org>
From: Dave Olson <olson@bengaltech.com>
List: netbsd-bugs
Date: 09/26/2000 23:31:48
Greg A. Woods wrote: 
|  Long-running local processes that only call openlog() once at their
|  initialisation will not continue to log after syslogd has been restarted
|  because they use local-domain sockets to communicate with syslogd and it
|  would seem that there's no existing mechanism in use to signal when a
|  the syslogd has gone away, and since syslogd does the further injustice
|  of re-creating the pathname representing the local-domain socket
|  (i.e. _PATH_LOG, or /var/run/log) every time it starts up, it's probably
|  necessary to re-open the client-side socket too.  Note how different
|  this is semantically from normal "datagram" style sockets where you can
|  always just blindy spew stuff out and hope somone is still listening!
|  
|  Maybe fixing the latter could be as simple as this (this is pure
|  un-tested speculation though):

Here's the change we implemented at Geocast for programs that
call openlog(), but might have syslogd restarted.  It's against
the 1999-11-03 version of syslog.c.  We have a fair amount of
testing on this version.  Similar to yours, but a bit different.

Index: syslog.c
===================================================================
RCS file: lib/libc/gen/syslog.c,v
retrieving revision 1.1.1.8
retrieving revision 1.2
diff -u -r1.1.1.8 -r1.2
--- syslog.c	1999/11/04 20:21:51	1.1.1.8
+++ syslog.c	2000/07/19 02:16:56	1.2
@@ -135,7 +135,10 @@
 	char *stdp = NULL;	/* pacify gcc */
 	char tbuf[TBUF_LEN], fmt_cpy[FMT_LEN];
 	size_t tbuf_left, fmt_left, prlen;
+	int firsttry;
 
+	firsttry = 1;
+
 #define	INTERNALLOG	LOG_ERR|LOG_CONS|LOG_PERROR|LOG_PID
 	/* Check for invalid bits. */
 	if (pri & ~(LOG_PRIMASK|LOG_FACMASK)) {
@@ -246,12 +249,25 @@
 
 	/* Get connected, output the message to the local logger. */
 	mutex_lock(&syslog_mutex);
-	if (!connected)
+retry:
+	if (!connected) {
 		openlog_unlocked(LogTag, LogStat | LOG_NDELAY, 0);
+		firsttry--;	/* no point in retrying open if we just did it */
+	}
 	if (send(LogFile, tbuf, cnt, 0) >= 0) {
 		mutex_unlock(&syslog_mutex);
 		return;
 	} 
+	if(firsttry>0 && connected) {
+		/* if send() failed, and we already had a connection open, one likely
+		 * cause is that syslogd exit'ed.  It may have been restarted since, so
+		 * try once to open a new connection.  This could be made conditional
+		 * on particular errno values, but the potential list is long, and changes
+		 * over time even within OSes.  (Geocast bug 1884) */
+		closelog_unlocked();
+		firsttry = 0;
+		goto retry;
+	}
 	mutex_unlock(&syslog_mutex);
 
 	/*