NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-mips/52892: Tests hang on MIPS



The following reply was made to PR port-mips/52892; it has been noted by GNATS.

From: christos%zoulas.com@localhost (Christos Zoulas)
To: Andreas Gustafsson <gson%gson.org@localhost>, gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: port-mips/52892: Tests hang on MIPS
Date: Thu, 19 Apr 2018 08:44:46 -0400

 On Apr 19,  3:40pm, gson%gson.org@localhost (Andreas Gustafsson) wrote:
 -- Subject: Re: port-mips/52892: Tests hang on MIPS
 
 | I have now looked into this some more, and have the following
 | observations:
 | 
 | 1. The tests are still hanging every time on b5:
 | 
 |   http://releng.netbsd.org/b5reports/hpcmips/
 |   http://releng.netbsd.org/b5reports/pmax/
 | 
 | 2. I can reliably reproduce the hang on hpcmips under gxemul
 | by running
 | 
 |   cd /usr/tests/net/icmp
 |   while true; do atf-run t_ping|atf-report; done
 | 
 | On b5, this usually hangs within an hour, but on my own test machine,
 | it took a day or two to hang.
 | 
 | 3. It's the kernel that's hanging, not just ATF.  For example, if I
 | run the above test script in the background with output redirected to
 | a log file, and "tail -f" the log file on the console, I'm unable to
 | kill the tail process using the interrupt character after the test has
 | hung.
 | 
 | 4. By bisection, I found that the hpcmips tests started hanging on b5
 | at the time of the commit
 | 
 |   2017.12.02.22.51.22 christos src/sys/kern/kern_lwp.c 1.191
 | 
 | Since this caused the tests to hang earlier in the test run, in the
 | lib/libc/sys/t_ptrace_wait3:resume1 test rather than in the network
 | related tests where it is now hanging, it was not immediately clear
 | whether that commit also triggered the present issue.  However...
 | 
 | 5. If I revert that commit, the tests do run to completion on b5.
 
 That's strange, because the change was put there to prevent a hang (in go).
 I.e. sleep interruptively instead of sleep with signals blocked.
 There must be some other race that's causing it. Can you try to always
 set errno to EAGAIN after cv_wait_sig() returns?
 
 christos
 


Home | Main Index | Thread Index | Old Index