Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: AMDGPU Driver patches/bugs



> Date: Mon, 20 Feb 2023 23:15:35 -0800
> From: Jeff Frasca <thatguy%jeff-frasca.name@localhost>
> 
> I have two machines with AMD graphics hardware: a laptop with a raven
> ridge APU (GCN 5) and a desktop with a kaveri APU (GCN 2).
> [...]
> After upgrading my system in place from tarballs, and compiling a
> custom kernel with the AMDGPU drivers instead of the radeon drivers, I
> was pleasantly surprised to find that the frame buffer worked with the
> newer, less tested drivers!

Cool!

By the way, you may be able to just add `load amdgpu' to boot.cfg
instead of compiling a custom kernel with amdgpu.

Loading amdgpu as a module also has the advantage that it doesn't
break dtrace (due to annoying technical restrictions in CTF which the
kernel violates when amdgpu is statically linked because it is so
large).  You can rebuild just the module with:

cd src/sys/modules/amdgpu
$TOOLDIR/bin/nbmake-$ARCH -j4 dependall
$TOOLDIR/bin/nbmake-$ARCH -j4 install

> The problem with the doorbell code is that the Linux code uses
> adev->doorbell.ptr + index to get the address to write to.  ptr is
> ultimately a pointer to a 32 bit wide value (rather than the 64 bit
> wide value it actually is :-/ ), so the compiler's pointer math
> multiplies index by 4 instead of 8, as the NetBSD dev who wrote the
> code would have expected.

Amazing!  I must have stared at that code for hours trying to track
down the ring test failures, without realizing that the pointer was
typed 32-bit instead of 64-bit.

...I don't suppose you have another trick up your sleeve for the
radeon driver, do you?  We've also been seeing intermittent ring test
failures at boot, but it doesn't use any 64-bit doorbells, so this
trick doesn't work, alas.

> (The driver blows up spectacularly shortly thereafter by causing a
> floating point exception in kernel mode.  I don't have a full fix for
> that yet.  The thing I did try that seems to get further causes the
> screen to go blank.  I have a plan for debugging this, but I haven't
> gotten there yet.)

If you have a stack trace or crash dump I might be able to help.  The
amdgpu driver apparently uses FP/SIMD instructions in the kernel, and
I wired it up to NetBSD's mechanism for allowing it to do that, but I
don't know if I've ever seen those parts of the code get hit and
perhaps I missed something.

> I've attached patches.  Should I open a bug?  Send these to the kernel
> mailing list?

Patches applied, thanks!  I tweaked them a little bit, including to
fix an arithmetic overflow bug that you had copied & pasted from one
Taylor R Campbell, riastradh%NetBSD.org@localhost, in kern_ksyms.c...oops.  (Fix
also applied in kern_ksyms.c now.)

Feel free to file PRs with patches and/or cc me and tech-kern -- I
don't always follow current-users.


Home | Main Index | Thread Index | Old Index