Re: so_rerror

To: Roy Marples <roy%marples.name@localhost>
Subject: Re: so_rerror
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Wed, 07 Nov 2018 12:50:04 +0700

    Date:        Sun, 4 Nov 2018 21:02:26 +0000
    From:        Roy Marples <roy%marples.name@localhost>
    Message-ID:  <d3c2a03d-357a-89d1-c4b4-bbc8ba9c26bd%marples.name@localhost>

  | Whether it arrived at the kernel by UDP or carrier pigeon and could not 
  | be delivered for reason X, we should not be discarding this silently.

The error was never silently disacrded, it was always counted, and
available from netstat -m

The pushback is against informing applications which cannot rationally
do anything about it .in general ... there is no way to prevent an
occasional overflow when ciccumstances conspire to make a very
large number of messages all arrive at once - all making the buffer
bigger does is to allow more of all of that to be queued (removing the
logging from the time of event more than it should be) and meaning that
an even bigger buffer perhaps gets allocated, to handle once in a
century type events.

  | However, I do buy into the argument that syslogd can't keep up with 
  | incoming data in all situations. To facilitate this, I've added the -B 
  | option so you can specify a large buffer.

That's usually going to be the wrong thing to do.   There might be some
very busy syslog servers where the default buffer size is simply not
enough, and for those, this is a reasonable solution.   But for most,
overreacting to an occaional spike is not a good solution.

  | Also, it might be that the system is just to slow to log the amount of 
  | incoming data so I've added the -X option so ENOBUFS can be silently 
  | discarded.

That's good, but even better would be to not bother syslogd with the
"error" in the first place.

  | So as of right now, the admin can see overflow and they can make the 
  | choice about how to handle it. Surely you must agree that this is a good 
  | default rather than leaving the admin to worry if their logger actually 
  | logged everything or not.

There is no way not to worry - syslog messages can be relayed over
normal udp from host to host - and can be dropped anywhere.   All
this is doing is catching one odd case of lost messages - allowing the
admin to believe that if they see no "buffer overflow" messages then
that means that they're not losing any messages is irresponsible.

  | And still dhcpcd reports overruns before we increased the size of the 
  | buffers. It still does, but only on my router and only at boot time, but 
  | thankfully it now has the code resync itself to the real system state.

The routing socket (and its clone, when we get it, the mobile-ip socket)
is special - probably should not be a socket at all, but some other kind
of entity (though inventing something new is also not necessarily the
best thing to do.)   For that one, the buffer overflow message is useful,
bith because that is (I suspect) the only way that messages can be lost,
and because the recipient has a way to recover (expensive, but possible)
when it happens.   Had all of this mechanism been confined to the
routing socket, there never would have been a problem.

kre

Follow-Ups:
- Re: so_rerror
  - From: Robert Elz
- Re: so_rerror
  - From: Roy Marples

References:
- Re: so_rerror
  - From: Roy Marples
- Re: so_rerror
  - From: Christos Zoulas

Prev by Date: Re: altq on a gif tunnel
Next by Date: Re: altq on a gif tunnel
Previous by Thread: Re: so_rerror
Next by Thread: Re: so_rerror
Indexes:

Home | Main Index | Thread Index | Old Index