NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

misc/59381: t_ptrace_wait* and t_ptrace_sigchld tests are flaky



>Number:         59381
>Category:       misc
>Synopsis:       t_ptrace_wait* and t_ptrace_sigchld tests are flaky
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    misc-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 02 01:30:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        
>Organization:
The NetPtraceD Esperation
>Environment:
>Description:
1. Enabling debug=1 in t_ptrace_wait.c and t_ptrace_sigchld.c seems to generate so much output it breaks the releng testbeds by filling up the disk.  Oops.  (See https://mail-index.netbsd.org/source-changes/2025/04/29/msg156694.html and https://mail-index.netbsd.org/source-changes/2025/05/01/msg156711.html, in which I tried to mitigate it after disasters like https://releng.netbsd.org/b5reports/amd64/2025/2025.04.29.23.32.35/test.log, but, e.g., thousands of tests are still failing on earmv7hf because the disk is full: https://releng.netbsd.org/b5reports/evbarm-earmv7hf/2025/2025.05.01.08.43.02/test.log)

2. t_ptrace_wait4:x86_fpregs_fpu_write sometimes fails with on:

   3327 				FORKEE_ASSERT(vals_fpu.ip.fa_64
   3328 				    == expected_fpu.ip.fa_64);

   https://nxr.netbsd.org/xref/src/tests/lib/libc/sys/t_ptrace_x86_wait.h?r=1.31#3327

   By improving the debug prints, I caught it in the act with vals_fpu.ip.fa_64=0x76543210, expected_fpu.ip.fa_64=0xa9876543210.  No idea what happeened to the upper 0x00000a98!

   Similar tests were failing on amd64 throughout April, on the same assertion: https://releng.netbsd.org/b5reports/amd64/commits-2025.04.html#end

3. t_ptrace_wait4:x86_cve_2018_8897 failed with:

   FAILED: /tmp/build/2025.04.15.22.40.20-i386/src/tests/lib/libc/sys/t_ptrace_wait.h:320: WSTOPSIG(status) != expected: Unexpected stop signal received [Segmentation fault] != [Suspended (signal)]

   https://releng.netbsd.org/b5reports/i386/2025/2025.04.15.22.40.20/test.html#lib_libc_sys_t_ptrace_wait4_x86_cve_2018_8897

4. The testbed seems to be confused by t_ptrace_wait*:access_regs_set_unaligned_pc_0x7, which are not failing but which the b5reports summary thinks are failing:

   https://releng.netbsd.org/b5reports/riscv-riscv64/commits-2025.04.html#end

   I wonder whether this may happen because the riscv64 kernel prints a scary message in the middle of the test which is interpreted as a test failure:

    access_regs_set_unaligned_pc_0x7: [ 4739.4096919] Trapframe @ 0xffffffc029eaeee0 (cause=2 (illegal instruction), status=0x4020, pc=      0x3ff83bd2c7, va=0x8300):
[ 4739.4096919]                         ra =           0x206ba  sp =      0x3ffffee480  gp =           0x3ba70
[ 4739.4096919] tp =      0x3ff825a010  t0 =      0x3ff83bd2b0  t1 =0xffffffffffffffff  t2 =      0x3ff842f170
[ 4739.4096919] s0 =           0x3bc08  s1 =                 0  a0 =                 0  a1 =                 0
[ 4739.4096919] a2 =                 0  a3 =                 0  a4 =        0x33330003  a5 =      0x3ff825a000
[ 4739.4096919] a6 =      0x3ff825a110  a7 =      0x3ff825a120  s2 =           0x304f0  s3 =0xffffffffffffffff
[ 4739.4096919] s4 =      0x3ffffee7b8  s5 =           0x303b8  s6 =      0x3ffffee758  s7 =      0x3ffffee7a0
[ 4739.4096919] s8 =      0x3ff8431eac  s9 =      0x3ff867aa28  s10=      0x3ff84373ec  s11=      0x3ffffee7b8
[ 4739.4096919] t3 =      0x3ffffee2c8  t4 =      0x2300000000  t5 =              0x23  t6 =             0x13e
[0.211157s] Passed.

   https://releng.netbsd.org/b5reports/riscv-riscv64/2025/2025.04.03.14.59.05/test.log

5. Lots of t_ptrace_* tests are failing on pmax (though this might be a qemu mips fpu emulation bug): https://releng.netbsd.org/b5reports/pmax/2025/2025.04.30.15.40.38/test.html#failed-tcs-summary
>How-To-Repeat:
cd /usr/tests/lib/libc/sys
atf-run t_ptrace_* | atf-report

(t_ptrace_kill flakiness is tracked separately, since it is not related to the other t_ptrace_* tests -- see PR misc/59380: t_ptrace_kill is flaky, https://gnats.NetBSD.org/59380)
>Fix:
Yes, please!



Home | Main Index | Thread Index | Old Index