NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/56847: nouveau autoconfiguration error: fifo: fault 01 [WRITE] -> DROPPED_MMU_FAULT; freezed console output on boot on a GTX 1060



>Number:         56847
>Category:       kern
>Synopsis:       nouveau autoconfiguration error: fifo: fault 01 [WRITE] -> DROPPED_MMU_FAULT; freezed console output on boot on a GTX 1060
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 19 10:10:00 +0000 2022
>Originator:     Paolo Vincenzo Olivo
>Release:        NetBSD HEAD (May the 14th, 2022) / amd64
>Organization:
SDF Publix Access UNIX System
>Environment:
NetBSD  9.99.96 NetBSD 9.99.96 (GENERIC) #0: Sat May 14 21:04:34 UTC 2022  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Once loaded in memory, the nouveau driver apparently fails to reset the display on a Nvidia Geforce GTX 1060, using the latest HEAD snapshot.

The console output is stuck at:

```
nouveau0: info: NVIDIA GP106 (13000a1)
nouveau0: info: bios: version 86.06.0e.00.99
nouveau0: interrupting at msi6vec 0 (nouveau0)
nouveau0: info: fb: 6144 MiB GDDR5
```

After which the console becomes completely unresponsive. The machine however successfully boots, so I am able to SSH into it and inspect the dmesg, which prints a long list of autoconfiguration errors like those which follow:

```
fifo: fault 01 [WRITE] at 000000000102d000 engine 04 [BAR1] client 08 [HUB/HOST_CPU_NB] reason 00 [PDE] on channel -1 [017feaa000 unknown]
[    21.541860] nouveau0: autoconfiguration error: error: fifo: fault 01 [WRITE] at 000000000102f000 engine 04 [BAR1] client 08 [HUB/HOST_CPU_NB] reason 00 [PDE] on channel -1 [017feaa000 unknown]
[    21.541860] nouveau0: autoconfiguration error: error: fifo: fault 01 [WRITE] at 0000000001030000 engine 04 [BAR1] client 08 [HUB/HOST_CPU_NB] reason 00 [PDE] on channel -1 [017feaa000 unknown]
[    21.541860] nouveau0: autoconfiguration error: error: fifo: fault 01 [WRITE] at 0000000001032000 engine 04 [BAR1] client 08 [HUB/HOST_CPU_NB] reason 00 [PDE] on channel -1 [017feaa000 unknown]

[...]

[    22.300998] nouveau0: autoconfiguration error: error: fifo: DROPPED_MMU_FAULT 00000000
```

Still, wsdisplay is loaded and the ttys are supposedly created:

[    22.310713] wsdisplay0 at nouveaufb0 kbdmux 1: console (default, vt100 emulation), using wskbd0
[    22.322939] wsmux1: connecting to wsdisplay0
[    22.322939] wskbd1: connecting to wsdisplay0
[    45.060664] cd0(ahcisata0:1:0):  DEFERRED ERROR, key = 0x2
[    81.420583] wsdisplay0: screen 1 added (default, vt100 emulation)
[    81.420583] wsdisplay0: screen 2 added (default, vt100 emulation)
[    81.420583] wsdisplay0: screen 3 added (default, vt100 emulation)
[    81.420583] wsdisplay0: screen 4 added (default, vt100 emulation)

```
I took a picture of the console buffer, which is available at https://ttm.sh/bva.jpg

The full dmesg is available at https://bhh.sh/6cm.

Disabling `nouveau' and `nouveaufb' on boot eliminates the problem (the kernel buffer prints the whole boot sequence, I can see rc starting the services and be eventually greated with a working wscons).

Here's the standard dmesg with nouveau* modules disabled:
https://bhh.sh/6cl

Using a `pcictl' based script, here's a recap of all PCI devices names and IDs, which my workstation comes equipped with:

```
 Core 7G (S, Quad) Host Bridge, DRAM (0x591f)
 Core 6G PCIe x16 (0x1901)
 200 Series xHCI (0xa2af)
 200 Series MEI (0xa2ba)
 200 Series SATA (AHCI) (0xa282)
 200 Series PCIe (0xa2e9)
 200 Series PCIe (0xa292)
 200 Series PCIe (0xa298)
 H270 LPC (0xa2c4)
 GeForce GTX 1060 6GB (0x1c03)
 BCM5751 10/100/1000 Ethernet (0x1677)
 Wireless AC 9260 (0x2526)
 ASM1083/1085 PCIe-PCI Bridge (0x1080)
```

The problem looks independent from the LCD monitor's resolution and the video display interface used. I've tried both with:

- an iiyama ProLite 2560x1440 2K monitor, connected through DP port  
- a BenQ 1920x1080 monitor, connected through HDMI port

And the result is the same.

The same bug has been also reported recently by another user, owning a GTX 770:
https://marc.info/?l=netbsd-bugs&m=165167531709750&w=2 

It is acceptably safe to assume that, with the graphics stack currently implemented in HEAD, this bug affects many GPUs models across multiple generations.

And it seems a known regression in nouveau, at least affecting some users since Linux 4.19, and possibly fixed upstream:
https://bugzilla.kernel.org/show_bug.cgi?id=201847
https://www.spinics.net/lists/kernel/msg3773355.html
>How-To-Repeat:
Boot a HEAD 9.99.96 NetBSD/amd64 snapshot on a desktop equipped with a Nvidia Geforce GTX 1060, making sure that the gpufw.tar.xz set is properly installed^[1].


^[1] Otherwise the driver will fail to attach (`acr: failed to load firmware') resulting in a kernel panic. Loading firmware on GPUs which require it seems mandatory starting with Linux 5.6. See: https://www.mail-archive.com/nouveau%lists.freedesktop.org@localhost/msg35424.html
>Fix:



Home | Main Index | Thread Index | Old Index