NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/59571: new bind crashes on arm64



>Number:         59571
>Category:       bin
>Synopsis:       new bind crashes on arm64
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 03 23:15:00 +0000 2025
>Originator:     matthew green
>Release:        netbsd 11 and netbsd-current
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
both arm64el and arm64eb crash for me.
>Description:
with both -current and -11 branch, trying to start named on an arm64 host
crashes while it is starting up.

when running inside GDB, the same problem occurs though it is usually some
hundreds of entries into _cds_wfcq_enqueue() function, but due to the
heavily threaded nature, the crash isn't 100% identical every time.

one thing i've noticed is that the corrupted pointer is the same bit-pattern
though interpreted in opposite directions on arm64 big and little endian:

little endian:
(gdb) p tail
$1 = (struct cds_wfcq_tail *) 0x1

big endian:
(gdb) p tail
$9 = (struct cds_wfcq_tail *) 0x100000000000000

here's the crash and bt:

Thread 4 "isc-loop-0003" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 3884 of process 27833]
_cds_wfcq_enqueue (new_tail=0xf989b44182e8, tail=0x1, head=...)
    at /usr/src/external/lgpl2/userspace-rcu/lib/liburcu-memb/../../dist/include/urcu/static/wfcqueue.h:217
217             return ___cds_wfcq_append(head, tail, new_tail, new_tail);
(gdb) bt
#0  _cds_wfcq_enqueue (new_tail=0xf989b44182e8, tail=0x1, head=...)
    at /usr/src/external/lgpl2/userspace-rcu/lib/liburcu-memb/../../dist/include/urcu/static/wfcqueue.h:217
#1  _call_rcu (head=head@entry=0xf989b44182e8, func=<optimized out>, crdp=0x1)
    at /usr/src/external/lgpl2/userspace-rcu/lib/liburcu-memb/../../dist/src/urcu-call-rcu-impl.h:705
#2  0x0000000003010028 in urcu_memb_call_rcu (head=0xf989b44182e8, func=<optimized out>)
    at /usr/src/external/lgpl2/userspace-rcu/lib/liburcu-memb/../../dist/src/urcu-call-rcu-impl.h:732
#3  0x0000f989b586c11c in dispentry_destroy (resp=0xf989b0728290) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:445
#4  dns_dispentry_unref (ptr=0xf989b0728290) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:453
#5  0x0000f989b586d11c in udp_connected (handle=0x0, eresult=ISC_R_FAMILYNOSUPPORT, arg=<optimized out>)
    at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:1950
#6  0x0000f989b56bdeb4 in isc_nm_udpconnect (mgr=<optimized out>, local=local@entry=0xf989b07282e0, peer=peer@entry=0xf989b0728310, 
    cb=cb@entry=0xf989b586d020 <udp_connected>, cbarg=cbarg@entry=0xf989b0728290, timeout=<optimized out>)
    at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/netmgr/udp.c:844
#7  0x0000f989b586af18 in udp_dispatch_connect (disp=disp@entry=0xf989b4418210, resp=resp@entry=0xf989b0728290)
    at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:1961
#8  0x0000f989b586ea1c in dns_dispatch_connect (resp=0xf989b0728290) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:2086
#9  0x0000f989b5834964 in dns_request_create (requestmgr=0xf989b5656150, message=0xf989b0886290, srcaddr=srcaddr@entry=0xf989b2788340, 
    destaddr=destaddr@entry=0xf989b07281f0, transport=0x0, tlsctx_cache=0xf989b55e1610, options=options@entry=0, key=0x0, timeout=timeout@entry=16, 
    udptimeout=udptimeout@entry=5, udpretries=udpretries@entry=2, loop=0xf989b4c84588, cb=cb@entry=0xf989b57ec980 <notify_done>, arg=arg@entry=0xf989b0728150, 
    requestp=requestp@entry=0xf989b0728170) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/request.c:643
#10 0x0000f989b57ec304 in notify_send_toaddr (arg=0xf989b0728150) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/zone.c:12678
#11 0x0000f989b56e95a4 in isc__async_cb (handle=<optimized out>) at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/async.c:113
#12 0x0000f989b56fc22c in uv__async_io (loop=0xf989b4c845a0, w=<optimized out>, events=<optimized out>)
    at /usr/src/external/mit/libuv/lib/../dist/src/unix/async.c:163
#13 0x0000f989b56f1708 in uv__io_poll (loop=0xf989b4c845a0, timeout=<optimized out>) at /usr/src/external/mit/libuv/lib/../dist/src/unix/kqueue.c:390
#14 0x0000f989b56f9794 in uv_run (loop=0xf989b4c845a0, mode=UV_RUN_DEFAULT) at /usr/src/external/mit/libuv/lib/../dist/src/unix/core.c:406
#15 0x0000f989b56e4410 in loop_thread (arg=arg@entry=0xf989b4c84588) at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/loop.c:330
#16 0x0000f989b56e96c0 in thread_body (wrap=0xf989b450b660) at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/thread.c:87
#17 thread_run (wrap=0xf989b450b660) at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/thread.c:102
#18 0x0000f989b54dddb4 in pthread__create_tramp (cookie=0xf989b44b4800) at /usr/src/lib/libpthread/pthread.c:605
#19 0x0000f989b47334e0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

we can see at this point that "disp" from frame 3 has become NULL already:

(gdb) f 3
#3  0x0000f989b586c11c in dispentry_destroy (resp=0xf989b0728290) at /usr/src/external/mpl/bind/lib/libdns/../../dist/lib/dns/dispatch.c:445
445             dns_dispatch_detach(&disp); /* DISPATCH001 */
(gdb) p disp
$1 = (dns_dispatch_t *) 0x0

but this may be OK, since it appears to be the final reference and is
expected to go away.. it does appear to be valid upon entry in frame #3,
dispentry_destroy(), but GDB on arm64 is making it hard to know without
adding direct instrumentation.. but it does appear to be something that
happens during the call to dns_dispatch_detach() in dispentry_destroy()
that the issue occurs.

>How-To-Repeat:
# /usr/sbin/named -f

crashes.  see above for what GDB has revealed.
>Fix:
spent an hour or two in GDB and the sources and i don't know what
is happening, though it seems to be the first time a udp connection
is destroyed and goes via the udp_connected() -> dispentry_destroy()
path show in the description.

i haven't figured out where the pointer is corrupted, though it
may be something in the userspace-rcu library as most of the code
below dns_dispatch_detach() is in here.



Home | Main Index | Thread Index | Old Index