NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/58694: race condition when ptracing threaded applications
>Number: 58694
>Category: kern
>Synopsis: race condition when ptracing threaded applications
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 25 09:55:00 +0000 2024
>Originator: Martin Husemann
>Release: NetBSD 10.99.12
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD big-apple.aprisoft.de 10.99.12 NetBSD 10.99.12 (POWERMAC_G5.MP) #119: Tue Sep 24 16:48:52 CEST 2024 martin%seven-days-to-the-wolves.aprisoft.de@localhost:/work/src/sys/arch/macppc/compile/POWERMAC_G5.MP macppc
Architecture: powerpc
Machine: macppc
>Description:
I am not sure if this should be a kern bug for the ptrace interface or a
toolchain bug for gdb issues around it - all existing ptrace ATF tests
pass fine, and AFAICT also include a threaded variant where new threads
exit - so this must be special somehow.
It also may be related to PR 58691.
I run a browser under gdb and while it is still starting up gdb gets confused
about threads. Note that LWP 1090 seems to come and go in the output below,
this is strange as pid/lid recycling shouldn't happen this fast and the machine
was otherwise mostly idle. So maybe there is some hickup in ptrace event
reporting or gdb gets confused on the userland side - I am open to ideas
what to investigate next.
To get this output I added a DPRINTF to powerpc process_machdep.c that
would have explained EBUSY but I never saw that fire, and I added
options PTRACE_DEBUG to my kernel config. I also slightly modified the
DPRINTF that creates the last kernel output line below to get more information
about the lwp being passed as "l" into that code (where previously it only
printed the "t" target details pid, stat and flags = !t->p_waited.
[ 9539.8398799] ptrace_startstop: lwp=2041 request=21
[ 9539.8398799] ptrace_startstop: lwp=2850 request=21
[ 9540.0598920] ptrace_get_siginfo: lwp=1090 signal=5
[ 9540.0598920] ptrace_get_process_state: lwp=1090 event=0x8 pid=1090 lwp=1090
[New LWP 1090 of process 2020]
[ 9540.0598920] ptrace_regs: lwp=1596 request=33
[ 9540.0598920] ptrace_startstop: lwp=2020 request=21
[ 9540.0598920] ptrace_startstop: lwp=2020 request=21
[ 9540.0598920] ptrace_startstop: lwp=2668 request=21
[ 9540.0598920] ptrace_startstop: lwp=2621 request=21
[ 9540.0598920] ptrace_startstop: lwp=3048 request=21
[ 9540.0598920] ptrace_startstop: lwp=2388 request=21
[ 9540.0598920] ptrace_startstop: lwp=2034 request=21
[ 9540.0598920] ptrace_startstop: lwp=2040 request=21
[ 9540.0598920] ptrace_startstop: lwp=1780 request=21
[ 9540.0598920] ptrace_startstop: lwp=2537 request=21
[ 9540.0598920] ptrace_startstop: lwp=2648 request=21
[ 9540.0598920] ptrace_startstop: lwp=2042 request=21
[ 9540.0598920] ptrace_startstop: lwp=1329 request=21
[ 9540.0598920] ptrace_startstop: lwp=3050 request=21
[ 9540.0598920] ptrace_startstop: lwp=2266 request=21
[ 9540.0598920] ptrace_startstop: lwp=1336 request=21
[ 9540.0598920] ptrace_startstop: lwp=1248 request=21
[ 9540.0598920] ptrace_startstop: lwp=1371 request=21
[ 9540.0598920] ptrace_startstop: lwp=1576 request=21
[ 9540.0598920] ptrace_startstop: lwp=2264 request=21
[ 9540.0598920] ptrace_startstop: lwp=1829 request=21
[ 9540.0598920] ptrace_startstop: lwp=2045 request=21
[ 9540.0598920] ptrace_startstop: lwp=1613 request=21
[ 9540.0598920] ptrace_startstop: lwp=2041 request=21
[ 9540.0598920] ptrace_startstop: lwp=2850 request=21
[ 9540.0598920] ptrace_startstop: lwp=1090 request=21
[ 9540.0798926] ptrace_get_siginfo: lwp=1090 signal=5
[ 9540.0798926] ptrace_get_process_state: lwp=1090 event=0x10 pid=1090 lwp=1090
[LWP 1090 of process 2020 exited]
[New LWP 1090 of process 2020]
[LWP 1090 of process 2020 exited]
[ 9540.0798926] req 33 pid 2020 (l->l_proc->p_pid 1596 lid 1596) stat 2 flag 0 -> EBUSY
Couldn't get registers: Device busy.
I am confused by LWP 1090 being noted as new and exited and new and exited
by gdb, and then PT_GETREGS (req 33) failing for some lwp that I did not
print out (will try to add that).
The userland state after the failure is:
[~] martin@big-apple > ps -sp 2020
UID PID PPID CPU LID NLWP PRI NI VSZ RSS WCHAN STAT TTY LTIME COMMAND
205 2020 1596 194389 1736 25 30 0 403004 82800 - T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2850 25 37 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2041 25 37 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1613 25 37 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2045 25 37 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1829 25 37 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2264 25 37 0 403004 82800 parked T pts/2 0:00.04 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1576 25 38 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1371 25 38 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1248 25 40 0 403004 82800 parked T pts/2 0:00.02 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1336 25 40 0 403004 82800 parked T pts/2 0:00.01 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2266 25 43 0 403004 82800 - T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 3050 25 42 0 403004 82800 parked T pts/2 0:00.03 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1329 25 43 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2042 25 42 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2648 25 43 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2537 25 43 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 1780 25 43 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2034 25 43 0 403004 82800 parked T pts/2 0:00.01 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2040 25 43 0 403004 82800 parked T pts/2 0:00.14 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2388 25 43 0 403004 82800 parked T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 3048 25 43 0 403004 82800 - T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2621 25 85 0 403004 82800 poll T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2668 25 85 0 403004 82800 kqueue T pts/2 0:00.00 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
205 2020 1596 194389 2020 25 30 0 403004 82800 parked T pts/2 0:02.36 /pkgbuild/usr/pkgobj/www/palemoon/work/build/dist/bin/palemoon
>How-To-Repeat:
(on my machine > 70% reproducable)
cd $(pkgsrc www/palemoon work dir)
cd build/dist/bin
./run-mozilla.sh -g ./palemoon
>Fix:
n/a
Home |
Main Index |
Thread Index |
Old Index