Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: -current cloner interfaces broken/gone/unusable



It is not only a boot time issue - I also see during normal operation:

2018-04-24T10:30:10.466723+00:00 gateway blacklistd 611 - - bl_recv: recvmsg failed (No buffer space available) 2018-04-24T10:30:10.466821+00:00 gateway blacklistd 611 - - no message (No buffer space available) 2018-04-24T10:56:47.223562+00:00 gateway sshd 13053 - - error: maximum authentication attempts exceeded for invalid user root from 106.113.147.190 port 63303 ssh2 [preauth] 2018-04-24T11:15:09.240247+00:00 gateway blacklistd 611 - - bl_recv: recvmsg failed (No buffer space available) 2018-04-24T11:15:09.240791+00:00 gateway blacklistd 611 - - no message (No buffer space available)

I don't expect major resource usage for blacklistd though.

Also named does not seem to be too happy and ceases interface scanning. This does not yet give a warm fuzzy feeling :-) && :-(

Frank

On 04/24/18 09:56, Roy Marples wrote:
On 24/04/2018 08:26, Martin Husemann wrote:
On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:
syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix
`/var/run/log': No buffer space available

This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?

Not yet no. But yes, in all releases prior it was a silent bug on all types of socket and in all the BSDs as well. I know, I checked - only OpenBSD has an overflow check like this and they solve that with a magic message on route(4) only which is just yuck as it makes the problem worse.

I only have one machine where I can reliably repro this, my erlite and that only happens because route(4) overflows (detected in dhcpcd) as it's a router and the box isn't up yet and a load of address validation flows over the socket when the link comes up. This is a good thing, because dhcpcd can then react to the error and sync it's state using getifaddrs().

I think the easiest fix is to increase the default size of the socket buffer. Where this is done, I don't know but could find out if pushed.
This would fix everything if the default buffer was big enough.

Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have.

The last option is to increase the socket buffer size in all affected applications using ioctl (or is it setsockopt?). But to what value I don't know. Trial and error?

Roy



Home | Main Index | Thread Index | Old Index