Subject: bin/8065: processes using openlog() can become disconnected from syslogd's log socket(s)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <woods@weird.com>
List: netbsd-bugs
Date: 07/24/1999 13:51:26
>Number: 8065
>Category: bin
>Synopsis: processes using openlog() can become disconnected from syslogd's log socket(s)
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: bin-bug-people (Utility Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jul 24 13:50:01 1999
>Last-Modified:
>Originator: Greg A. Woods
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Release: NetBSD-current; July 24, 1999
>Environment:
So far observed only on NetBSD 1.3.1, 1.3.2, and 1.3.3, but
given the evidence in the code, likely in all versions up to and
including today's -current.
>Description:
Long running processes using openlog() [i.e. processes with the
_PATH_LOG socket open] become disconnected from syslod if the
latter is killed and re-started. This seems to be due to the
fact that syslogd unlinks _PATH_LOG and creates it anew every
time it starts, while syslog(3) is not prepared to deal with
this eventuality.
The symptom is that processes such as named, dhcpd, ipmon, and
others fail to generate log entries after syslogd has been
killed and restarted. Beware that any process using LOG_CONS
*will* keep writing to /dev/console because syslog(3) closes and
re-opens /dev/console with every message [this caused me no end
of grief until I got around to looking at the code! :-)].
Initially I've been lead to killing and restarting syslogd
because it's no longer working properly, probably because
_PATH_LOG has been flooded at boot (by a noisy named). The
symptom of this is that netstat shows a large Recv-Q for the
_PATH_LOG socket. I've no idea what's getting stuck, and
whatever it is, it's a separate bug and I'll worry about it
later if indeed it still exists in -current and/or 1.4.1
>How-To-Repeat:
Kill and restart syslogd and then observe that long running
processes such as named, dhcpd, ipmon, etc. fail to generate any
more log entries (other than to /dev/console with LOG_CONS)
>Fix:
unknown
Suggestions:
perhaps the local interface between syslog(3) and syslogd(8)
shouldn't be a transient socket file, but rather a device
driver, permanent shared memory segment, named-pipe (mkfifo),
etc. [only the latter will preserve the ability to use the '-p'
and '-P' options in a compatible manner]
maybe _PATH_LOG (and all the paths given with -p/-P options)
need only be created by /etc/rc at system boot time, not by
syslogd itself, unless they don't already exist, or are not
openable, or don't seem to be working sockets (though I'm not
100% sure this will work as I expect it to) [this would be
required anyway if _PATH_LOG was created with mkfifo()]
maybe the client code in syslog(3) can detect when the server
has "gone away" and try to "Do The Right Thing" (and whatever
that is it should not lose any messages unless the world seems
ready to come to an end anyway) [this option is not very good
because it doesn't fix any staticly linked binaries created
prior to the implementation of the fix]
I personally prefer converting _PATH_LOG (and the files given by
'-p' and/or '-P') to be a FIFOs (named pipes created by
mkfifo()) instead of a socket, which means it can be created
once in the filesystem and then opened and used by syslogd and
openlog() as necessary (with syslogd re-creating it on startup
(and SIGHUP?) if it disappears).
Workaround:
Kill and restart named, dhcpd, ipmon, and others that have
_PATH_LOG open after killing and restarting syslogd. The
following command, run before killing syslogd, will print a list
of potential candidates that'll need restarting:
# fstat |
grep $(
fstat -p $(
cat /var/run/syslog*.pid
) |
grep 'syslogd *[0-9]* *[0-9]*. unix dgram' |
awk '{print $NF}'
)'$'
It is probably easier to just reboot, though if syslogd gets
stuck on boot, as mine does now, this won't help.
Note also that syslogd currently *must* start before anything
opens any of the log sockets it might be recreating....
>Audit-Trail:
>Unformatted: