tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: so_rerror



On 07/11/2018 05:50, Robert Elz wrote:
     Date:        Sun, 4 Nov 2018 21:02:26 +0000
     From:        Roy Marples <roy%marples.name@localhost>
     Message-ID:  <d3c2a03d-357a-89d1-c4b4-bbc8ba9c26bd%marples.name@localhost>


   | Whether it arrived at the kernel by UDP or carrier pigeon and could not
   | be delivered for reason X, we should not be discarding this silently.

The error was never silently disacrded, it was always counted, and
available from netstat -m

It's clear that you don't care about auditability or traceability.
If I have any things running, which one was it discarded for and why?


The pushback is against informing applications which cannot rationally
do anything about it .in general ... there is no way to prevent an
occasional overflow when ciccumstances conspire to make a very
large number of messages all arrive at once - all making the buffer
bigger does is to allow more of all of that to be queued (removing the
logging from the time of event more than it should be) and meaning that
an even bigger buffer perhaps gets allocated, to handle once in a
century type events.

There isn't much I can do about an ENOSYS error either.
Maybe we should silently discard those too?


   | However, I do buy into the argument that syslogd can't keep up with
   | incoming data in all situations. To facilitate this, I've added the -B
   | option so you can specify a large buffer.

That's usually going to be the wrong thing to do.   There might be some
very busy syslog servers where the default buffer size is simply not
enough, and for those, this is a reasonable solution.   But for most,
overreacting to an occaional spike is not a good solution.
| Also, it might be that the system is just to slow to log the amount of
   | incoming data so I've added the -X option so ENOBUFS can be silently
   | discarded.

That's good, but even better would be to not bother syslogd with the
"error" in the first place.

   | So as of right now, the admin can see overflow and they can make the
   | choice about how to handle it. Surely you must agree that this is a good
   | default rather than leaving the admin to worry if their logger actually
   | logged everything or not.

There is no way not to worry - syslog messages can be relayed over
normal udp from host to host - and can be dropped anywhere.   All
this is doing is catching one odd case of lost messages - allowing the
admin to believe that if they see no "buffer overflow" messages then
that means that they're not losing any messages is irresponsible.

   | And still dhcpcd reports overruns before we increased the size of the
   | buffers. It still does, but only on my router and only at boot time, but
   | thankfully it now has the code resync itself to the real system state.

The routing socket (and its clone, when we get it, the mobile-ip socket)
is special - probably should not be a socket at all, but some other kind
of entity (though inventing something new is also not necessarily the
best thing to do.)   For that one, the buffer overflow message is useful,
bith because that is (I suspect) the only way that messages can be lost,
and because the recipient has a way to recover (expensive, but possible)
when it happens.   Had all of this mechanism been confined to the
routing socket, there never would have been a problem.

You know what?
I no longer care.
I no longer care that I spent months figuring out a long standing issue around stuff that packets were being lost in a self contained system because the important error WAS NOT BEING REPORTED.

I no longer care to inform the admin that their logging server just can't keep up with demand and pretend that life is dandy.

Feel free to rip my code out and put back all the comments saying XXX report this to userland. I'm done with this shit that wants to make a developers life harder and not easier.

Roy


Home | Main Index | Thread Index | Old Index