tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: so_rerror



On 04/11/2018 15:18, Christos Zoulas wrote:
| Can you explain how it was broken and what do you to make it work again?

I turned off logging completely to fix it. Logging ended up taking
up all the I/O cycles because each time logging overflowed syslogd
ended up logging that logging overflowed... This worked just fine
before the changes.

| Which is why we need a better solution than what we have.
| dynamically increasing/decreasing buffer size is a good solution for
| this, which should make everyone happy.

That will never fix the problem; in fact it will make the situation
worse because of bufferbloat, resource consumption on low resource
sysrems, and increased latency. As people have explained numerous
times before this is UDP and you should be prepared to lose packets
(the transport is unreliable). If you want to build a reliable
transport on top of UDP rerror is not enough, you need to use a
packet sequence number or something to detect lost packets.

Yes it is good to detect lost packets when you can so rerror is
generally a good thing, and if it was done on day one it would
probably be fine to keep. I would also be nice to have on by default
eventually, but right now it makes the situation worse than before.

Whether it arrived at the kernel by UDP or carrier pigeon and could not be delivered for reason X, we should not be discarding this silently.

However, I do buy into the argument that syslogd can't keep up with incoming data in all situations. To facilitate this, I've added the -B option so you can specify a large buffer. Also, it might be that the system is just to slow to log the amount of incoming data so I've added the -X option so ENOBUFS can be silently discarded.

So as of right now, the admin can see overflow and they can make the choice about how to handle it. Surely you must agree that this is a good default rather than leaving the admin to worry if their logger actually logged everything or not.


| > Nevertheless now everyone can have it the way the like... There is
| > a sysctl to turn it on globally and a per-socket setsockopt to override.
|
| And we want a secure system where a lot of useful programs don't run and
| sweeps overflow issues under the carpet by default? Not me!

Yes, for the programs that want this behavior. Let us not forget that
this started because of the aberrant behavior of the routing socket
where because of the compatibility messages we ended up overflowing
and losing. Instead of fixing the root cause (don't send compat
stuff to the programs that don't need them -- programs understand only
one version of the messages and throw away the rest), we decided to
detect the dropped packet problem by introducing so_rerror. This
detection could have also be done by using the sequence number, or
a similar id based protocol.

This is actually untrue.
All programs that care about the routing socket have already set socket filters to avoid the messages (ie compat versions) they don't care about. These filters run before overflow can happen. And still dhcpcd reports overruns before we increased the size of the buffers. It still does, but only on my router and only at boot time, but thankfully it now has the code resync itself to the real system state.

Roy


Home | Main Index | Thread Index | Old Index