syslog, ENOBUFS and non-C implementations

To: current-users%NetBSD.org@localhost
Subject: syslog, ENOBUFS and non-C implementations
From: Havard Eidnes <he%NetBSD.org@localhost>
Date: Tue, 05 Mar 2024 11:19:34 +0100 (CET)

Hi,

over the last couple of months I have seen at least two non-C
(rust and python) implementations of syslog() equivalent
functionality causing applications written in those languages to
become brittle.

The reason, I hear you ask?

In C, the return type of syslog() is void, so it can't return any
error.  Our C implementation makes a reasonable attempt at
re-trying in the face of OS-errors:

        /*
         * If the send() failed, there are two likely scenarios:
         *  1) syslogd was restarted
         *  2) /dev/log is out of socket buffer space
         * We attempt to reconnect to /dev/log to take care of
         * case #1 and keep send()ing data to cover case #2
         * to give syslogd a chance to empty its socket buffer.
         */
        for (tries = 0; tries < MAXTRIES; tries++) {
                if (send(data->log_file, tbuf, cnt, 0) != -1)
                        break;
                if (errno != ENOBUFS) {
                        disconnectlog_r(data);
                        connectlog_r(data);
                } else
                        (void)usleep(1);
        }

and if the number of retries is exceeded, our C code tries to
send the syslog message to the console instead.

However, the rust and python implementations have the possibility
of returning errors or raising exceptions, but the applications
using those syslog-like functions are evidently unprepared to
deal with any errors from that functionality, causing those
applications to exit if an error occurred during syslog'ing.

This has caused me to file

  https://github.com/Geal/rust-syslog/issues/79

which has not seen any activity or comments since I submitted it.
This issue caused the net/routinator program (an RPKI validator
written in rust) to exit if I had turned up the logging level
"too high" (to trigger the ENOBUFS condition) when running it on
NetBSD.  I worked around this issue by dialing down the syslog
level in my routinator configuration.


The python issue I'm having is that similarly, the
sysutils/py-borgmatic package is not prepared to handle errors
from syslog'ing, causing it to exit with this error message:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.10/logging/handlers.py", line 987, in emit
    self.socket.send(msg)
OSError: [Errno 55] No buffer space available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/pkg/lib/python3.10/logging/handlers.py", line 991, in emit
    self.socket.send(msg)
OSError: [Errno 55] No buffer space available
Call stack:
  File "/usr/pkg/bin/borgmatic", line 33, in <module>
    sys.exit(load_entry_point('borgmatic==1.8.5', 'console_scripts', 'borgmatic')())
  File "/usr/pkg/lib/python3.10/site-packages/borgmatic/commands/borgmatic.py", line 894, in main
    logger.handle(log)

This is operationally troublesome, to say the least, for
potentially long-running programs.

Since this points to the syslog code inside python itself (in
this case python 3.10), I tried replicating parts of the C
semantics and eliminate raising exceptions in the syslog code by
applying the attached local patch.  I've not yet tried to submit
this one upstream, and I'm still testing it locally.


IMHO, having long-running programs become brittle just because
logging failed is just ... silly, and it appears that almost
every programmer out there expects the C semantics that
syslog()'ing never fails.

Secondly: is it something particular we are doing on the NetBSD
end of things which contributes to this problem?  Doesn't other
OSes return ENOBUFS if syslogd isn't able to keep up by consuming
the messages at the receiving end?


Other comments?


Regards,

- Håvard

$NetBSD$

Introduce code to re-try sending of log message up to
10 times, and drop messages if the retry count is exceeded,
instead of raising an error.  Calling code is seldom prepared
to handle exceptions from syslog-like functions, and become
needlessly brittle if syslog-ing can raise an exception.

--- Lib/logging/handlers.py.orig	2024-03-05 08:27:17.479574742 +0000
+++ Lib/logging/handlers.py
@@ -25,6 +25,7 @@ To use, simply 'import logging.handlers'
 
 import io, logging, socket, os, pickle, struct, time, re
 from stat import ST_DEV, ST_INO, ST_MTIME
+import errno
 import queue
 import threading
 import copy
@@ -983,16 +984,39 @@ class SysLogHandler(logging.Handler):
             msg = msg.encode('utf-8')
             msg = prio + msg
             if self.unixsocket:
-                try:
-                    self.socket.send(msg)
-                except OSError:
-                    self.socket.close()
-                    self._connect_unixsocket(self.address)
-                    self.socket.send(msg)
+                tries = 10
+                while tries > 0:
+                    try:
+                        self.socket.send(msg)
+                    except OSError as err:
+                        tries -= 1
+                        if err.errno == errno.ENOBUFS:
+                            time.sleep(0.001)
+                        else:
+                            self.socket.close()
+                            self._connect_unixsocket(self.address)
+                # if we exhaust retries, just drop message
+                # possibly, we could count the number of dropped messages
             elif self.socktype == socket.SOCK_DGRAM:
-                self.socket.sendto(msg, self.address)
+                tries = 10
+                while tries > 0:
+                    tries -= 1
+                    try:
+                        self.socket.sendto(msg, self.address)
+                    except OSError:
+                        time.sleep(0.001)
+                # if we exhaust retries, just drop message
+                # possibly, we could count the number of dropped messages
             else:
-                self.socket.sendall(msg)
+                tries = 10
+                while tries > 0:
+                    tries -= 1
+                    try:
+                        self.socket.sendall(msg)
+                    except OSError:
+                        time.sleep(0.001)
+                # if we exhaust retries, just drop message
+                # possibly, we could count the number of dropped messages
         except Exception:
             self.handleError(record)

Follow-Ups:
- Re: syslog, ENOBUFS and non-C implementations
  - From: Lloyd Parkes

Prev by Date: re: new BIND in 10.0_RC5/sparc dies w/Bus error
Next by Date: Re: Unacceptable firefox behavior with nouveau graphics card
Previous by Thread: rc.d start order
Next by Thread: Re: syslog, ENOBUFS and non-C implementations
Indexes:

Home | Main Index | Thread Index | Old Index