NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55488: Occasional panic upon resume (i915drmkms-related)



>Number:         55488
>Category:       kern
>Synopsis:       Occasional panic upon resume (i915drmkms-related)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 14 05:05:00 +0000 2020
>Originator:     Jukka Ruohonen
>Release:        NetBSD 9.0_STABLE
>Organization:
>Environment:
NetBSD camus 9.0_STABLE NetBSD 9.0_STABLE (GENERIC_KASLR) #0: Sat Jul  4
18:46:38 EEST 2020
jruoho@camus:/usr/obj/sys/arch/amd64/compile/GENERIC_KASLR amd64
>Description:
The i915drmkms driver sometimes causes a panic when resuming from suspend:

[ 65582.854289] ioapic0 reenabling
[ 65589.565336] kern info: [drm] stuck on render ring
[ 65589.565336] kern info: [drm] GPU HANG: ecode 7:0:0xfffffffe, reason: Ring hung, action: reset
[ 65589.565336] kern info: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 65589.565336] kern info: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI ->  DRM/Intel
[ 65589.565336] kern info: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 65589.565336] kern info: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 65589.565336] kern info: [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 65589.565336] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/i915_drv.c:841)i915_drm_resume] *ERROR* failed to re-initialize GPU, declaring wedged!
[ 65589.565336] drm/i915: Resetting chip after gpu hang
[ 65589.565336] uvm_fault(0xffffffff8ffe2c80, 0x0, 1) -> e
[ 65589.565336] fatal page fault in supervisor mode
[ 65589.565336] trap type 6 code 0 rip 0xffffffffc7b3e670 cs 0x8 rflags 0x10293 cr2 0x8 ilevel 0 rsp 0xffffb880aafa5d10
[ 65589.565336] curlwp 0xfffff1c521abe4a0 pid 0.89 lowest kstack 0xffffb880aafa32c0
[ 65589.565336] panic: trap
[ 65589.565336] cpu3: Begin traceback...
[ 65589.575344] vpanic() at netbsd:vpanic+0x160
[ 65589.575344] snprintf() at netbsd:snprintf
[ 65589.585351] startlwp() at netbsd:startlwp
[ 65589.585351] warning: /usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_display.c:2457: WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex))alltraps() at netbsd:alltraps+0xbb
[ 65589.585351] intel_cleanup_ring_buffer() at netbsd:intel_cleanup_ring_buffer+0xec
[ 65589.595359] i915_gem_cleanup_ringbuffer() at netbsd:i915_gem_cleanup_ringbuffer+0x51
[ 65589.595359] i915_gem_init_hw() at netbsd:i915_gem_init_hw+0x58e
[ 65589.605367] i915_reset() at netbsd:i915_reset+0x89
[ 65589.605367] i915_handle_error() at netbsd:i915_handle_error+0x9ba
[ 65589.615374] linux_workqueue_thread() at netbsd:linux_workqueue_thread+0xdd
[ 65589.615374] cpu3: End traceback...
[ 65589.615374] dumping to dev 20,0 (offset=50954631, size=2019279):
[ 65589.615374] dump ehci1: config timeout
[ 65589.805522] ahcisata0 port 0: device present, speed: 6.0Gb/s
[ 65589.805522] autoconfiguration error: ahcisata0 port 0: clearing WDCTL_RST failed for drive 0
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not allocate an object descriptor (20190405/utcopy-1050)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU0._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Failed to extend the result stack (20190405/dswstate-184)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU0._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU1._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Namespace] (0x1) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not release AML Namespace mutex (20190405/exutils-151)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Interpreter] (0x0) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not release AML Interpreter mutex (20190405/exutils-156)
>How-To-Repeat:
1. Reproduction is difficult. This panic only occurs in about 1/10:th of
suspend/resume cycles.

2. However, I have not seen the panic with identical hardware running
-current. So it may be that the problematic code has already been fixed.
>Fix:
N/A



Home | Main Index | Thread Index | Old Index