NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases



>Number:         56867
>Category:       port-hppa
>Synopsis:       hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-hppa-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 07 02:35:00 +0000 2022
>Originator:     Tom Lane
>Release:        HEAD/202206030100Z
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.97 NetBSD 9.99.97 (SD2) #0: Fri Jun  3 12:30:06 EDT 2022  tgl%nuc1.sss.pgh.pa.us@localhost:/home/tgl/netbsd-H-202206030100Z/obj.hppa/sys/arch/hppa/compile/SD2 hppa
>Description:
After applying the fixes proposed in PRs 56864, 56865, 56866, I still see one class of failures in t_ptrace_wait and sibling test programs: the stepN and setstepN test cases frequently complain that they see SIGSEGV rather than SIGTRAP as the WSTOPSIG(status) result after an attempted step.  The failure rate is near 100% if you do it via atf-run, but if you invoke these tests individually they frequently pass, so there's something nondeterministic in there.
>How-To-Repeat:
This way fails pretty reproducibly:

$ cd /usr/tests/
$ atf-run lib/libc/sys/t_ptrace_wait

This way succeeds more often than not for me, but sometimes fails with the same symptom:

$ /usr/tests/lib/libc/sys/t_ptrace_wait step1

(replace step1 with any related test case, same results)

>Fix:
I have not isolated the cause, and may not be able to because my lone HPPA machine has developed hardware issues.  But I wanted to memorialize this issue just to clarify that the preceding PRs don't fully fix this test program.

Given the evident nondeterminism, the hypothesis that I was about to investigate when my machine suddenly started making weird noises is that if we get a TLB miss when trying to execute the single intended instruction, trap.c somehow misbehaves and reaches the place where it reports SIGSEGV while trying to handle the TLB miss trap.  It might be something quite different though.



Home | Main Index | Thread Index | Old Index