Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Testing/Debugging amdgpu kernel driver in 10.0_BETA



On Fri, Dec 30, 2022 at 2:02 PM Jeff Frasca <thatguy%jeff-frasca.name@localhost> wrote:
>
> Hi all,  I have a lenovo thinkpad E485 which has a Ryzen 3 2200U APU
> in it (which is a Vega/RAVEN RIDGE GPU).  I installed the NetBSD 10.0
> BETA on it this week, and that's actually working better than
> Slackware 15.0 (which just refuses to boot).  Everything seems to be
> working reasonably well, and X11 runs with the framebuffer drivers and
> llvmpipe.  However, that only gets me 1024x768, and I know there's a
> big chunk of code sitting in the src tree that has the real drivers
> for it.
>
> I've got three kernels installed right now.  The GENERIC kernel from
> the install, a custom kernel that just has the drivers I need and is
> setup to run with the framebuffer drivers.  Both of those seem to work
> equally well.  The third is another custom config that is the same as
> the working custom kernel, it just has the amdgpu drivers configured.
>
> That third kernel panics on boot in one of two ways:
>
> On cold boot, I get an error "PSP load tmr failed!"
>
> On a warm boot (after first loading any of the three kernels), I get
> an ETIMEDOUT error from the ring_test function in the driver.
>
> In both cases, that isn't the actual panic, it's a failed
> 'cv_is_valid()' assertion in a call to 'cv_destroy()' in
> 'drm_sched_fini()'.  I think both of those errors are causing the
> 'amdgpu_driver_load_kms()' to fail, emit a "Fatal error during GPU
> init" and then jump to an error block which calls
> 'amdgpu_driver_unload_kms()' which then panics while trying to unload
> the driver.

After reading a bunch of code this afternoon, I think the cv_destroy()
call is failing because the ring structures never get initialized.
I'm pretty sure my APU uses the functions in amdgpu_vcn.c, and that
doesn't have any calls to amdgpu_init_ring(), which shows up in a lot
of the other files.  Hopefully in the next day or two I figure out
some code changes to try.  Then I can move onto the next bug.  (Maybe
the firmware doesn't load on cold boot, maybe something hiding behind
the missing initialization bug.  Probably the latter.)

> Anyone have any tips on how to further debug this?  I'm going to keep
> poking at it on my own, but any suggestions on where to look, things
> to try, I would love to hear them.  Is there any more information I
> can/should send?  Is there a good way to save backtraces and dmesg
> rather than typing it into an email manually on a different computer?
> Any cute ddb tricks that would help?  I know this is going to need
> code changes (which I will try to work out), is there a better mailing
> list to send this to?
>
> (I'm typing this from a Kaveri APU based desktop that's dual booting
> Slackware 15 and NetBSD 9.99.x right now, if I make headway on the
> laptop, the desktop will get 10.0 next and I'll try and get bugs in
> its graphics drivers tested and ironed out.)
>
> Thanks!
> Jeff


Home | Main Index | Thread Index | Old Index