NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: FreeRADIUS instability



Hi Christos,


On 9/17/21 15:19, Christos Zoulas wrote:
I think your problem is mprotect. (man 7 sysctl).
Try setting security.pax.mprotect.ptrace=2

thank you, that got us a working GDB.

On Sep 17, 2021, at 3:40 PM, Pawel S. Veselov <pawel.veselov%gmail.com@localhost> wrote:
On 9/16/21 11:31 AM, Dima Veselov wrote:
I'm trying to help Dima with this problem.
I do not know if this is NetBSD-related, but I suffer from FreeRADIUS
instability on NetBSD for a long time and do not know how to debug this.
Symptoms are: RADIUS server randomly (once a day or once a week) can stop
answering and this is not connected to the actual load. While in that state
it can be killed with -9 only, other signals do nothing
Well, it seems that the signals are blocked and this does not have to
do with kevent (probably FreeRADIUS does it explicitly).
There is no issue with the kevent() call being responsive. kevent() obviously cycles at least on signals.
So perhaps the handler does something and does not exit?
The signal handlers respond just fine. The code isn't stuck in a signal handler, and I can see that kevent() exits on a signal, but the app just re-enters it shortly thereafter.
Yes, the question is what happened to fd#3 (presumably the kqueue).
If you can get into the debugger (gdb <radiusd> <pid>) and look at
queue call and see what fd is passed to it?
It's still fd#3

What we have determined from tracing the process that fd#3 is just
being closed and then re-opened as another kqueue (due to fd reuse)
that radius then tries to keep using as its own, but since none of
its filters are there, the process is effectively dead.

So we caught where the queue is closed, and traced it back to
getaddrinfo(). That call both closes fd#3, creates a new kqueue
and leaves it open. This is the back trace from close:

#0  0x0000732d69c07892 in close () from /usr/lib/libpthread.so.1
#1  0x0000732d68f25da9 in __res_ndestroy () from /usr/lib/libc.so.12
#2  0x0000732d68f2676b in __res_vinit () from /usr/lib/libc.so.12
#3  0x0000732d68f26bef in __res_check () from /usr/lib/libc.so.12
#4  0x0000732d68f22220 in __res_nsend () from /usr/lib/libc.so.12
#5  0x0000732d68f2719c in ?? () from /usr/lib/libc.so.12
#6  0x0000732d68f27420 in ?? () from /usr/lib/libc.so.12
#7  0x0000732d68f2a5a9 in ?? () from /usr/lib/libc.so.12
#8  0x0000732d68f2a8bd in ?? () from /usr/lib/libc.so.12
#9  0x0000732d68f3ed49 in nsdispatch () from /usr/lib/libc.so.12
#10 0x0000732d68f286c8 in getaddrinfo () from /usr/lib/libc.so.12

The full stack traces and ktraces can be found here:

https://github.com/FreeRADIUS/freeradius-server/issues/4244

Our next step is to recompile libc with debugging symbols and start
poking around there to see why is it closing an fd that doesn't
belong to it, but if somebody knows why that might happen -
that'd be great.

[skipped]




Home | Main Index | Thread Index | Old Index