NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/58745: nouveau triggered assert in linux_dma_fence.c



>Number:         58745
>Category:       kern
>Synopsis:       nouveau triggered assert in linux_dma_fence.c
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 12 01:25:00 +0000 2024
>Originator:     matthew green
>Release:        NetBSD 10.99.12
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
System: NetBSD aches.eterna23.net 10.99.12 NetBSD 10.99.12 (_aches_) #52: Sat Sep 14 15:08:46 CDT 2024  mrg%aches.eterna23.net@localhost:/var/obj/amd64-x86_64/usr/src/sys/arch/amd64/compile/_aches_ amd64
Architecture: amd64
>Description:

	on a ryzen 5600G system that was recently moved from a radeon
	hd 6450 to a nvidia GT 730 (the 6450 does not work with two
	larger monitors), i saw this while playing a video:

	[ 312382.0409707] nouveau0: autoconfiguration error: error: fifo: fault 01 [WRITE] at 00000000082e0000 engine 1b [CE2] client 18 [HUB/GR_CE] reason 02 [PTE] on channel 2 [007f952000 user]
	[ 312382.0409707] nouveau0: notice: fifo: channel 2: killed
	[ 312382.0409707] nouveau0: notice: fifo: runlist 0: scheduled for recovery
	[ 312382.0409707] nouveau0: warn: user: channel 2 killed!
	[ 312382.0409707] nouveau0: notice: fifo: engine 0: scheduled for recovery
	[ 312382.0409707] nouveau0: notice: fifo: engine 7: scheduled for recovery

	at this point, X crashed, and as i've seen this on 710, 730,
	and 1030 cards and been able to restart, i restarted X but it
	then died immediately:

	[ 313138.7008055] nouveau0: autoconfiguration error: error: gr: TRAP ch 7 [007f7bc000 user]
	[ 313138.7008055] nouveau0: autoconfiguration error: error: gr: GPC0/TPC0/TEX: 80000041
	[ 313138.7008055] nouveau0: autoconfiguration error: error: gr: GPC0/TPC1/TEX: 80000041
	[ 313138.7008055] nouveau0: autoconfiguration error: error: fifo: fault 00 [READ] at 0000000000ed2000 engine 00 [GR] client 04 [GPC0/T1_1] reason 02 [PTE] on channel 7 [007f7bc000 user]
	[ 313138.7008055] nouveau0: notice: fifo: channel 7: killed
	[ 313138.7008055] nouveau0: notice: fifo: runlist 0: scheduled for recovery
	[ 313138.7008055] nouveau0: notice: fifo: engine 0: scheduled for recovery
	[ 313138.7008055] nouveau0: warn: user: channel 7 killed!

	and then:

	panic: kernel diagnostic assertion "(atomic_load_relaxed(&fence->flags) & (1u << DMA_FENCE_FLAG_SIGNALED_BIT)) == 0" failed: file "/usr/src/sys/external/bsd/drm2/linux/linux_dma_fence.c", line 696

	it may be the assert is checking something that linux does not
	enforce, and an uncommon error code path we haven't seen before.
	i have had this error not restart but simply fail and need a
	reboot to work properly, but not crash before.  eg, perhaps the
	fence was signaled _twice_ in the error case, and the second one
	is triggering the assert.

	i have a core file and a netbsd.gdb for this one.  the backtrace
	is not especially interesting i think:

	vpanic() at vpanic+0x17b
	kern_assert() at __x86_indirect_thunk_rax
	linux_dma_fence_set_error() at linux_dma_fence_set_error+0x160
	nouveau_fence_context_kill() at nouveau_fence_context_kill+0x44
	nouveau_channel_killed() at nouveau_channel_killed+0x5c
	nvif_notify_work() at nvif_notify_work+0x2b
	linux_workqueue_thread() at linux_workqueue_thread+0x154

>How-To-Repeat:
>Fix:



Home | Main Index | Thread Index | Old Index