NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
misc/59381: t_ptrace_wait* and t_ptrace_sigchld tests are flaky
>Number: 59381
>Category: misc
>Synopsis: t_ptrace_wait* and t_ptrace_sigchld tests are flaky
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: misc-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 02 01:30:00 +0000 2025
>Originator: Taylor R Campbell
>Release:
>Organization:
The NetPtraceD Esperation
>Environment:
>Description:
1. Enabling debug=1 in t_ptrace_wait.c and t_ptrace_sigchld.c seems to generate so much output it breaks the releng testbeds by filling up the disk. Oops. (See https://mail-index.netbsd.org/source-changes/2025/04/29/msg156694.html and https://mail-index.netbsd.org/source-changes/2025/05/01/msg156711.html, in which I tried to mitigate it after disasters like https://releng.netbsd.org/b5reports/amd64/2025/2025.04.29.23.32.35/test.log, but, e.g., thousands of tests are still failing on earmv7hf because the disk is full: https://releng.netbsd.org/b5reports/evbarm-earmv7hf/2025/2025.05.01.08.43.02/test.log)
2. t_ptrace_wait4:x86_fpregs_fpu_write sometimes fails with on:
3327 FORKEE_ASSERT(vals_fpu.ip.fa_64
3328 == expected_fpu.ip.fa_64);
https://nxr.netbsd.org/xref/src/tests/lib/libc/sys/t_ptrace_x86_wait.h?r=1.31#3327
By improving the debug prints, I caught it in the act with vals_fpu.ip.fa_64=0x76543210, expected_fpu.ip.fa_64=0xa9876543210. No idea what happeened to the upper 0x00000a98!
Similar tests were failing on amd64 throughout April, on the same assertion: https://releng.netbsd.org/b5reports/amd64/commits-2025.04.html#end
3. t_ptrace_wait4:x86_cve_2018_8897 failed with:
FAILED: /tmp/build/2025.04.15.22.40.20-i386/src/tests/lib/libc/sys/t_ptrace_wait.h:320: WSTOPSIG(status) != expected: Unexpected stop signal received [Segmentation fault] != [Suspended (signal)]
https://releng.netbsd.org/b5reports/i386/2025/2025.04.15.22.40.20/test.html#lib_libc_sys_t_ptrace_wait4_x86_cve_2018_8897
4. The testbed seems to be confused by t_ptrace_wait*:access_regs_set_unaligned_pc_0x7, which are not failing but which the b5reports summary thinks are failing:
https://releng.netbsd.org/b5reports/riscv-riscv64/commits-2025.04.html#end
I wonder whether this may happen because the riscv64 kernel prints a scary message in the middle of the test which is interpreted as a test failure:
access_regs_set_unaligned_pc_0x7: [ 4739.4096919] Trapframe @ 0xffffffc029eaeee0 (cause=2 (illegal instruction), status=0x4020, pc= 0x3ff83bd2c7, va=0x8300):
[ 4739.4096919] ra = 0x206ba sp = 0x3ffffee480 gp = 0x3ba70
[ 4739.4096919] tp = 0x3ff825a010 t0 = 0x3ff83bd2b0 t1 =0xffffffffffffffff t2 = 0x3ff842f170
[ 4739.4096919] s0 = 0x3bc08 s1 = 0 a0 = 0 a1 = 0
[ 4739.4096919] a2 = 0 a3 = 0 a4 = 0x33330003 a5 = 0x3ff825a000
[ 4739.4096919] a6 = 0x3ff825a110 a7 = 0x3ff825a120 s2 = 0x304f0 s3 =0xffffffffffffffff
[ 4739.4096919] s4 = 0x3ffffee7b8 s5 = 0x303b8 s6 = 0x3ffffee758 s7 = 0x3ffffee7a0
[ 4739.4096919] s8 = 0x3ff8431eac s9 = 0x3ff867aa28 s10= 0x3ff84373ec s11= 0x3ffffee7b8
[ 4739.4096919] t3 = 0x3ffffee2c8 t4 = 0x2300000000 t5 = 0x23 t6 = 0x13e
[0.211157s] Passed.
https://releng.netbsd.org/b5reports/riscv-riscv64/2025/2025.04.03.14.59.05/test.log
5. Lots of t_ptrace_* tests are failing on pmax (though this might be a qemu mips fpu emulation bug): https://releng.netbsd.org/b5reports/pmax/2025/2025.04.30.15.40.38/test.html#failed-tcs-summary
>How-To-Repeat:
cd /usr/tests/lib/libc/sys
atf-run t_ptrace_* | atf-report
(t_ptrace_kill flakiness is tracked separately, since it is not related to the other t_ptrace_* tests -- see PR misc/59380: t_ptrace_kill is flaky, https://gnats.NetBSD.org/59380)
>Fix:
Yes, please!
Home |
Main Index |
Thread Index |
Old Index