NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/56506: sys/rc/t_rc_d_cli tests randomly fail



>Number:         56506
>Category:       bin
>Synopsis:       sys/rc/t_rc_d_cli tests randomly fail
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 17 20:25:01 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:
  
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

On one of my testbeds, a physical i386 laptop, various test cases of
the sys/rc/t_rc_d_cli test program fail randomly.  The log output from
a typical failure is here:

  https://www.gson.org/netbsd/bugs/build/i386-laptop/2021/2021.11.14.18.36.13/test.html#sys_rc_t_rc_d_cli_default_stop_no_args

In this case, the default_restart_no_args test case failed with the
error message "h_simple not running?".

This looks like a race condition in rc.subr, which in some cases
checks whether a service is running by examining the output of ps(1).
When ps runs, the process running a newly started service will have
forked, but it may not yet have completed an exec(), and if so, it
will not show up in the ps output under the expected name.

To test this theory, I modified rc.subr to save the ps output to a
file using tee(1), and found that when the test fails, the ps output
shows a process with the name "(sh)" in place of the expected
"h_simple".

>How-To-Repeat:

  cd /usr/tests/sys/rc
  while atf-run t_rc_d_cli:default_stop_no_args; do true; done

The :default_stop_no_args part is only supported on -current;
omit it if testing on a release.  Repeat on different machines
until you find one that happens to have the right timing for
the test to fail.

>Fix:



Home | Main Index | Thread Index | Old Index