Subject: bin/8065: processes using openlog() can become disconnected from syslogd's log socket(s)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <woods@weird.com>
List: netbsd-bugs
Date: 07/24/1999 13:51:26
>Number:         8065
>Category:       bin
>Synopsis:       processes using openlog() can become disconnected from syslogd's log socket(s)
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 24 13:50:01 1999
>Last-Modified:
>Originator:     Greg A. Woods
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Release:        NetBSD-current; July 24, 1999
>Environment:

	So far observed only on NetBSD 1.3.1, 1.3.2, and 1.3.3, but
	given the evidence in the code, likely in all versions up to and
	including today's -current.

>Description:

	Long running processes using openlog() [i.e. processes with the
	_PATH_LOG socket open] become disconnected from syslod if the
	latter is killed and re-started.  This seems to be due to the
	fact that syslogd unlinks _PATH_LOG and creates it anew every
	time it starts, while syslog(3) is not prepared to deal with
	this eventuality.

	The symptom is that processes such as named, dhcpd, ipmon, and
	others fail to generate log entries after syslogd has been
	killed and restarted.  Beware that any process using LOG_CONS
	*will* keep writing to /dev/console because syslog(3) closes and
	re-opens /dev/console with every message [this caused me no end
	of grief until I got around to looking at the code! :-)].

	Initially I've been lead to killing and restarting syslogd
	because it's no longer working properly, probably because
	_PATH_LOG has been flooded at boot (by a noisy named).  The
	symptom of this is that netstat shows a large Recv-Q for the
	_PATH_LOG socket.  I've no idea what's getting stuck, and
	whatever it is, it's a separate bug and I'll worry about it
	later if indeed it still exists in -current and/or 1.4.1

>How-To-Repeat:

	Kill and restart syslogd and then observe that long running
	processes such as named, dhcpd, ipmon, etc. fail to generate any
	more log entries (other than to /dev/console with LOG_CONS)

>Fix:

	unknown

    Suggestions:

	perhaps the local interface between syslog(3) and syslogd(8)
	shouldn't be a transient socket file, but rather a device
	driver, permanent shared memory segment, named-pipe (mkfifo),
	etc.  [only the latter will preserve the ability to use the '-p'
	and '-P' options in a compatible manner]

	maybe _PATH_LOG (and all the paths given with -p/-P options)
	need only be created by /etc/rc at system boot time, not by
	syslogd itself, unless they don't already exist, or are not
	openable, or don't seem to be working sockets (though I'm not
	100% sure this will work as I expect it to) [this would be
	required anyway if _PATH_LOG was created with mkfifo()]

	maybe the client code in syslog(3) can detect when the server
	has "gone away" and try to "Do The Right Thing" (and whatever
	that is it should not lose any messages unless the world seems
	ready to come to an end anyway) [this option is not very good
	because it doesn't fix any staticly linked binaries created
	prior to the implementation of the fix]

	I personally prefer converting _PATH_LOG (and the files given by
	'-p' and/or '-P') to be a FIFOs (named pipes created by
	mkfifo()) instead of a socket, which means it can be created
	once in the filesystem and then opened and used by syslogd and
	openlog() as necessary (with syslogd re-creating it on startup
	(and SIGHUP?) if it disappears).

    Workaround:

	Kill and restart named, dhcpd, ipmon, and others that have
	_PATH_LOG open after killing and restarting syslogd.  The
	following command, run before killing syslogd, will print a list
	of potential candidates that'll need restarting:

	# fstat |
	  grep $(
		fstat -p $(
 			cat /var/run/syslog*.pid
		) |
		grep 'syslogd *[0-9]* *[0-9]*. unix dgram' |
		awk '{print $NF}'
	  )'$'

	It is probably easier to just reboot, though if syslogd gets
	stuck on boot, as mine does now, this won't help.

	Note also that syslogd currently *must* start before anything
	opens any of the log sockets it might be recreating....

>Audit-Trail:
>Unformatted: