NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/55352: access_regs_set_unaligned_pc_0x7 test cases sometimes fail



The following reply was made to PR kern/55352; it has been noted by GNATS.

From: Andreas Gustafsson <gson%gson.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/55352: access_regs_set_unaligned_pc_0x7 test cases sometimes fail
Date: Fri, 15 Oct 2021 19:34:13 +0300

 My NetBSD/amd64 real hardware testbed is still reporting several
 access_regs_set_unaligned_pc_0x7 test case failures per run.
 Here's log output from a recent run with five of them:
 
   https://www.gson.org/netbsd/bugs/build/amd64-baremetal/2021/2021.10.08.21.32.28/test.html#failed-tcs-summary
 
 I tried to debug this by setting the variable "debug" in
 t_ptrace_wait.c, but that didn't work.  Setting it from the debugger
 doesn't work because the variable has been optimized away, and if you
 change the initializer in the source and rebuild t_ptrace_wait, you do
 get debug output but the test no longer fails.
 
 When the test fails, the t_ptrace_wait process hangs until the ATF
 5-minute timeout, and ps shows it has forked a child.  If I attach to
 the parent with gdb, it's hung in a wait() syscall:
 
   (gdb) bt
   #0  0x000076c7db846a8a in _sys___wait450 () from /usr/lib/libc.so.12
   #1  0x000076c7dc008821 in __wait450 (wpid=wpid@entry=-1, status=status@entry=0x7f7fff88ec5c, options=options@entry=0, rusage=rusage@entry=0x0)
       at /usr/src/lib/libpthread/pthread_cancelstub.c:661
   #2  0x000076c7db872f28 in _wait (istat=istat@entry=0x7f7fff88ec5c) at /usr/src/lib/libc/gen/wait.c:55
   #3  0x00000001b2e270e4 in access_regs (regset=<optimized out>, aux=<optimized out>) at /usr/src/tests/lib/libc/sys/t_ptrace_register_wait.h:147
   #4  0x000076c7dc40a434 in atf_tc_run (tc=0x1b30541d0 <atfu_access_regs_set_unaligned_pc_0x7_tc>, resfile=<optimized out>) at /usr/src/external/bsd/atf/dist/atf-c/tc.c:1024
   #5  0x000076c7dc406dac in run_tc (exitcode=<synthetic pointer>, p=0x7f7fff88f040, tp=0x7f7fff88f020) at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
   #6  controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x1b2e0a51c <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>)
       at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
   #7  atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=0x1b2e0a51c <atfu_tp_add_tcs>) at /usr/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
   #8  0x00000001b2e0994d in ___start ()
 
 Specifically, it hangs at the second one of two wait calls performed
 by the test, expected to fail because the first one succeeded:
 
   #3  0x00000001b2e270e4 in access_regs (regset=<optimized out>, aux=<optimized out>) at /usr/src/tests/lib/libc/sys/t_ptrace_register_wait.h:147
   147                     TWAIT_REQUIRE_FAILURE(ECHILD,
 
 I can't attach to the child process with gdb because it's already
 being ptrace'd by the parent process.
 
 If I understand the code correctly, the test sets the program counter
 of a traced process to a somewhat arbitrary value that may well point
 into the middle of a multi-byte instruction, tells the process to
 continue execution from that point, then immediately kills it, and
 finally waits for it to exit, twice, expecting the first wait to
 succeed and the second one to fail.
 
 Did I get that right?  If so, it's weird test in that it invokes
 undefined behavior in the child process, but arguably it should still
 be possible to trace, kill, and wait for such a process without a
 second wait hanging.
 
 Can anyone else reproduce this?  This hangs with about 50% probability
 on my machine:
 
   cd /usr/tests/lib/libc/sys
   ./t_ptrace_wait access_regs_set_unaligned_pc_0x7
 
 --
 Andreas Gustafsson, gson%gson.org@localhost
 


Home | Main Index | Thread Index | Old Index