NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54166: NetBSD tests hang after lost IDE interrupt under qemu



>Number:         54166
>Category:       kern
>Synopsis:       NetBSD tests hang after lost IDE interrupt under qemu
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 06 07:30:00 +0000 2019
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2019.03.19.16.56.29
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

Running the ATF tests on recent versions of NetBSD/i386, /amd64, or
/sparc64 under recent versions of qemu frequently ends with the NetBSD
IDE disk driver reporting a lost interrupt, after which the tests
produce no further output.  For example:

  lib/librt/t_sem (418/799): 4 test cases
      basic: [0.285119s] Passed.
      child: piixide0:0:0: lost interrupt
  [ 5969.7132325]		type: ata tc_bcount: 2048 tc_skip: 0
  piixide0:0:0: bus-master DMA error: missing interrupt, status=0x61
  [ 5969.7953625] wd0a: DMA error writing fsbn 128 of 128-131 (wd0 bn 2176; cn 2 tn 2 sn 34), xfer f5c, retry 0
  [ 5971.7754220] wd0: soft error (corrected) xfer f5c

This was extracted from:

  http://www.gson.org/netbsd/bugs/build/i386/2019/2019.04.26.11.51.56/test.log

More reports of similar hangs:

  http://www.gson.org/netbsd/bugs/build/i386/2019/2019.04.11.14.47.06/test.log
  http://www.gson.org/netbsd/bugs/build/i386/2019/2019.04.12.09.29.26/test.log
  http://www.gson.org/netbsd/bugs/build/sparc64/2019/2019.04.26.14.36.40/test.log
  http://releng.netbsd.org/b5reports/sparc64/2019/2019.03.05.15.18.59/test.log

This has been tricky to track down because it only happens with the
combination of a recent qemu *and* a recent -current, and even then it
does not fail in every test run, only most of them.

Specifically, the lost interrupts appear to be related to the qemu
version, whereas the inability to recover from those lost interrupts
appears to be related to the NetBSD-current version.  I believe I have
now identified the commit where the latter problem started, namely

  2019.03.19.16.56.29 mlelstv src/sys/dev/ata/wd.c 1.446

Log output showing the tests running to completion before said commit
and hanging after it can be found at

  http://www.gson.org/netbsd/bugs/build/i386/commits-2019.03.html#2019.03.19.16.56.29

This PR is intended to track the specific issue of the test runs
hanging.  For the lost interrupts themselves, there is already
port-sparc64/54035.

>How-To-Repeat:

Run the ATF tests on NetBSD-current/i386 under qemu 3.1.0 or 4.0.0,
possibly multiple times.

>Fix:



Home | Main Index | Thread Index | Old Index