Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
syslog, ENOBUFS and non-C implementations
Hi,
over the last couple of months I have seen at least two non-C
(rust and python) implementations of syslog() equivalent
functionality causing applications written in those languages to
become brittle.
The reason, I hear you ask?
In C, the return type of syslog() is void, so it can't return any
error. Our C implementation makes a reasonable attempt at
re-trying in the face of OS-errors:
/*
* If the send() failed, there are two likely scenarios:
* 1) syslogd was restarted
* 2) /dev/log is out of socket buffer space
* We attempt to reconnect to /dev/log to take care of
* case #1 and keep send()ing data to cover case #2
* to give syslogd a chance to empty its socket buffer.
*/
for (tries = 0; tries < MAXTRIES; tries++) {
if (send(data->log_file, tbuf, cnt, 0) != -1)
break;
if (errno != ENOBUFS) {
disconnectlog_r(data);
connectlog_r(data);
} else
(void)usleep(1);
}
and if the number of retries is exceeded, our C code tries to
send the syslog message to the console instead.
However, the rust and python implementations have the possibility
of returning errors or raising exceptions, but the applications
using those syslog-like functions are evidently unprepared to
deal with any errors from that functionality, causing those
applications to exit if an error occurred during syslog'ing.
This has caused me to file
https://github.com/Geal/rust-syslog/issues/79
which has not seen any activity or comments since I submitted it.
This issue caused the net/routinator program (an RPKI validator
written in rust) to exit if I had turned up the logging level
"too high" (to trigger the ENOBUFS condition) when running it on
NetBSD. I worked around this issue by dialing down the syslog
level in my routinator configuration.
The python issue I'm having is that similarly, the
sysutils/py-borgmatic package is not prepared to handle errors
from syslog'ing, causing it to exit with this error message:
--- Logging error ---
Traceback (most recent call last):
File "/usr/pkg/lib/python3.10/logging/handlers.py", line 987, in emit
self.socket.send(msg)
OSError: [Errno 55] No buffer space available
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/pkg/lib/python3.10/logging/handlers.py", line 991, in emit
self.socket.send(msg)
OSError: [Errno 55] No buffer space available
Call stack:
File "/usr/pkg/bin/borgmatic", line 33, in <module>
sys.exit(load_entry_point('borgmatic==1.8.5', 'console_scripts', 'borgmatic')())
File "/usr/pkg/lib/python3.10/site-packages/borgmatic/commands/borgmatic.py", line 894, in main
logger.handle(log)
This is operationally troublesome, to say the least, for
potentially long-running programs.
Since this points to the syslog code inside python itself (in
this case python 3.10), I tried replicating parts of the C
semantics and eliminate raising exceptions in the syslog code by
applying the attached local patch. I've not yet tried to submit
this one upstream, and I'm still testing it locally.
IMHO, having long-running programs become brittle just because
logging failed is just ... silly, and it appears that almost
every programmer out there expects the C semantics that
syslog()'ing never fails.
Secondly: is it something particular we are doing on the NetBSD
end of things which contributes to this problem? Doesn't other
OSes return ENOBUFS if syslogd isn't able to keep up by consuming
the messages at the receiving end?
Other comments?
Regards,
- Håvard
$NetBSD$
Introduce code to re-try sending of log message up to
10 times, and drop messages if the retry count is exceeded,
instead of raising an error. Calling code is seldom prepared
to handle exceptions from syslog-like functions, and become
needlessly brittle if syslog-ing can raise an exception.
--- Lib/logging/handlers.py.orig 2024-03-05 08:27:17.479574742 +0000
+++ Lib/logging/handlers.py
@@ -25,6 +25,7 @@ To use, simply 'import logging.handlers'
import io, logging, socket, os, pickle, struct, time, re
from stat import ST_DEV, ST_INO, ST_MTIME
+import errno
import queue
import threading
import copy
@@ -983,16 +984,39 @@ class SysLogHandler(logging.Handler):
msg = msg.encode('utf-8')
msg = prio + msg
if self.unixsocket:
- try:
- self.socket.send(msg)
- except OSError:
- self.socket.close()
- self._connect_unixsocket(self.address)
- self.socket.send(msg)
+ tries = 10
+ while tries > 0:
+ try:
+ self.socket.send(msg)
+ except OSError as err:
+ tries -= 1
+ if err.errno == errno.ENOBUFS:
+ time.sleep(0.001)
+ else:
+ self.socket.close()
+ self._connect_unixsocket(self.address)
+ # if we exhaust retries, just drop message
+ # possibly, we could count the number of dropped messages
elif self.socktype == socket.SOCK_DGRAM:
- self.socket.sendto(msg, self.address)
+ tries = 10
+ while tries > 0:
+ tries -= 1
+ try:
+ self.socket.sendto(msg, self.address)
+ except OSError:
+ time.sleep(0.001)
+ # if we exhaust retries, just drop message
+ # possibly, we could count the number of dropped messages
else:
- self.socket.sendall(msg)
+ tries = 10
+ while tries > 0:
+ tries -= 1
+ try:
+ self.socket.sendall(msg)
+ except OSError:
+ time.sleep(0.001)
+ # if we exhaust retries, just drop message
+ # possibly, we could count the number of dropped messages
except Exception:
self.handleError(record)
Home |
Main Index |
Thread Index |
Old Index