NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/55352: access_regs_set_unaligned_pc_0x7 test cases sometimes fail
The following reply was made to PR kern/55352; it has been noted by GNATS.
From: Andreas Gustafsson <gson%gson.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/55352: access_regs_set_unaligned_pc_0x7 test cases sometimes fail
Date: Fri, 15 Oct 2021 19:34:13 +0300
My NetBSD/amd64 real hardware testbed is still reporting several
access_regs_set_unaligned_pc_0x7 test case failures per run.
Here's log output from a recent run with five of them:
https://www.gson.org/netbsd/bugs/build/amd64-baremetal/2021/2021.10.08.21.32.28/test.html#failed-tcs-summary
I tried to debug this by setting the variable "debug" in
t_ptrace_wait.c, but that didn't work. Setting it from the debugger
doesn't work because the variable has been optimized away, and if you
change the initializer in the source and rebuild t_ptrace_wait, you do
get debug output but the test no longer fails.
When the test fails, the t_ptrace_wait process hangs until the ATF
5-minute timeout, and ps shows it has forked a child. If I attach to
the parent with gdb, it's hung in a wait() syscall:
(gdb) bt
#0 0x000076c7db846a8a in _sys___wait450 () from /usr/lib/libc.so.12
#1 0x000076c7dc008821 in __wait450 (wpid=wpid@entry=-1, status=status@entry=0x7f7fff88ec5c, options=options@entry=0, rusage=rusage@entry=0x0)
at /usr/src/lib/libpthread/pthread_cancelstub.c:661
#2 0x000076c7db872f28 in _wait (istat=istat@entry=0x7f7fff88ec5c) at /usr/src/lib/libc/gen/wait.c:55
#3 0x00000001b2e270e4 in access_regs (regset=<optimized out>, aux=<optimized out>) at /usr/src/tests/lib/libc/sys/t_ptrace_register_wait.h:147
#4 0x000076c7dc40a434 in atf_tc_run (tc=0x1b30541d0 <atfu_access_regs_set_unaligned_pc_0x7_tc>, resfile=<optimized out>) at /usr/src/external/bsd/atf/dist/atf-c/tc.c:1024
#5 0x000076c7dc406dac in run_tc (exitcode=<synthetic pointer>, p=0x7f7fff88f040, tp=0x7f7fff88f020) at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
#6 controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x1b2e0a51c <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>)
at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
#7 atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=0x1b2e0a51c <atfu_tp_add_tcs>) at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
#8 0x00000001b2e0994d in ___start ()
Specifically, it hangs at the second one of two wait calls performed
by the test, expected to fail because the first one succeeded:
#3 0x00000001b2e270e4 in access_regs (regset=<optimized out>, aux=<optimized out>) at /usr/src/tests/lib/libc/sys/t_ptrace_register_wait.h:147
147 TWAIT_REQUIRE_FAILURE(ECHILD,
I can't attach to the child process with gdb because it's already
being ptrace'd by the parent process.
If I understand the code correctly, the test sets the program counter
of a traced process to a somewhat arbitrary value that may well point
into the middle of a multi-byte instruction, tells the process to
continue execution from that point, then immediately kills it, and
finally waits for it to exit, twice, expecting the first wait to
succeed and the second one to fail.
Did I get that right? If so, it's weird test in that it invokes
undefined behavior in the child process, but arguably it should still
be possible to trace, kill, and wait for such a process without a
second wait hanging.
Can anyone else reproduce this? This hangs with about 50% probability
on my machine:
cd /usr/tests/lib/libc/sys
./t_ptrace_wait access_regs_set_unaligned_pc_0x7
--
Andreas Gustafsson, gson%gson.org@localhost
Home |
Main Index |
Thread Index |
Old Index