NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/56442: Tests hang under NVMM with options DEBUG



>Number:         56442
>Category:       kern
>Synopsis:       Tests hang under NVMM with options DEBUG
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 06 06:45:00 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2019.12.01.13.20.42
>Organization:
  
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

If I enable "options DEBUG" in the NetBSD-current/amd64 GENERIC kernel
configuration, build a release, boot it in "qemu -accel nvmm", and run
the kernel/t_trapsignal test, the test hangs during the fpe_ignore
test case and is not killed by the ATF timeout mechanism, nor can it
be killed manually by hitting control-c or control-\.  Console
keystrokes are still echoed and I am able to enter ddb by sending a
serial break:

  # cd /usr/tests/kernel
  # atf-run t_trapsignal | atf-report
  Tests root: /usr/tests/kernel

  t_trapsignal (1/1): 20 test cases
      bus_handle: [0.213148s] Passed.
      bus_handle_recurse: [0.209931s] Passed.
      bus_ignore: [0.199009s] Passed.
      bus_mask: [0.200927s] Passed.
      bus_simple: [0.200246s] Passed.
      fpe_handle: [0.219451s] Passed.
      fpe_handle_recurse: [598.294510s] Failed: Test case timed out after 300 seconds
      fpe_ignore:^Z^C^C
  ^C^C^C^C
  ^\^\
  ^C^C[ 115781.8123080] fatal breakpoint trap in supervisor mode
  [ 115781.8123080] trap type 1 code 0 rip 0xffffffff8021dd8d cs 0x8 rflags 0x202 cr2 0x72bf48be8fe0 ilevel 0x8 rsp 0xffffcb002a623e88
  [ 115781.8123080] curlwp 0xffffeb5ed06676c0 pid 881.1 lowest kstack 0xffffcb002a6202c0
  Stopped in pid 881.1 (h_segv) at        netbsd:breakpoint+0x5:  leave
  db{0}> bt
  breakpoint() at netbsd:breakpoint+0x5
  comintr() at netbsd:comintr+0x8e5
  db{0}>

A bisection identified the following commit as the point where the
problem started:

  2019.12.01.13.20.42 ad src/sys/kern/kern_runq.c 1.51
  2019.12.01.13.20.42 ad src/sys/kern/sched_4bsd.c 1.39
  2019.12.01.13.20.42 ad src/sys/kern/sched_m2.c 1.35

The problem does not occur
 - without "options DEBUG"
 - in qemu without "-accel nvmm"
 - in qemu with "-accel nvmm -smp 2"
 - on real amd64 multiprocessor hardware

I also tried to test on real amd64 uniprocessor hardware, but was
thwarted by the unrelated problem reported in PR 51531.

>How-To-Repeat:

See above.

>Fix:



Home | Main Index | Thread Index | Old Index