Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

AMDGPU Driver patches/bugs



I have two machines with AMD graphics hardware: a laptop with a raven
ridge APU (GCN 5) and a desktop with a kaveri APU (GCN 2).

I'm currently trying to get wlroots and my wayland compositor running
on either machine.  The Kaveri APU should technically work with the
radeon driver, and it mostly did from some of the 9.99.x releases (the
frame buffer worked, X could start, but I didn't necessarily get full
acceleration).  With 10.0_BETA, the installer kernel causes my
monitors to output "HDMI Signal Out of Range" after the graphics
drivers initialize and the kernel tries to switch to the radeon
framebuffer.

After upgrading my system in place from tarballs, and compiling a
custom kernel with the AMDGPU drivers instead of the radeon drivers, I
was pleasantly surprised to find that the frame buffer worked with the
newer, less tested drivers!

X was only sort of making use of the driver (Mesa was still using
llvmpipe for OpenGL applications), but I didn't test this too much, as
it was an old build of the modular X server from pkgsrc and I wanted
to start over with the latest pkgsrc and just try to get wlroots
compiling against the pkgsrc wayland libraries and its dependencies
(mostly importantly, Mesa and libdrm).

From trying to get this to work, I found that libdrm_amdgpu calls
lseek(2) against DMA BUF file descriptors to figure out the buffer
size.  lseek(2) is unimplemented, so I added a quick version based on
the ksymsseek() function.  That's in the patch below, and that got me
to being able to get Mesa to draw to the screen (with a bunch of other
changes to wlroots that don't yet work well enough to have a working
compositor).  I have no idea how much this does or does not help
X11+AMD users.  It shouldn't hurt anyone.

The other patch is a bugfix for later 64 bit AMD graphics cards.  My
laptop currently crashes on boot with the AMDGPU driver.  The first
bug is that the NetBSD implementation of amdgpu_mm_wdoorbell64()
doesn't work, so the doorbells fail, and this causes the ring tests to
fail, which causes the initialization code to bail.  It then crashes
because it doesn't de-initialize properly.

The problem with the doorbell code is that the Linux code uses
adev->doorbell.ptr + index to get the address to write to.  ptr is
ultimately a pointer to a 32 bit wide value (rather than the 64 bit
wide value it actually is :-/ ), so the compiler's pointer math
multiplies index by 4 instead of 8, as the NetBSD dev who wrote the
code would have expected.  Elsewhere, the indexes get left shifted by
1 to account for this, and a bunch of the bit field macros for setting
up registers were also written with this extra offset built in.  So
rather than try to track down all the places where this was done in a
dumb/broken manner, I changed the multiplier from 8 to 4 in the
bus_space_write calls for the NetBSD code.  This gets the doorbells
working.

(The driver blows up spectacularly shortly thereafter by causing a
floating point exception in kernel mode.  I don't have a full fix for
that yet.  The thing I did try that seems to get further causes the
screen to go blank.  I have a plan for debugging this, but I haven't
gotten there yet.)

I've attached patches.  Should I open a bug?  Send these to the kernel
mailing list?

Thanks,
Jeff

Attachment: linux_dma_buf.patch
Description: Binary data

Attachment: amdgpu_device.patch
Description: Binary data



Home | Main Index | Thread Index | Old Index