NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xsrc/58133: X server crashes; radeon 5450; modesetting



The following reply was made to PR xsrc/58133; it has been noted by GNATS.

From: Rhialto <rhialto%falu.nl@localhost>
To: matthew green <mrg%eterna23.net@localhost>
Cc: Rhialto <rhialto%falu.nl@localhost>, xsrc-manager%netbsd.org@localhost,
	gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
	gnats-bugs%netbsd.org@localhost
Subject: Re: xsrc/58133: X server crashes; radeon 5450; modesetting
Date: Sun, 12 May 2024 20:25:59 +0200

 So my experiments of today can easily reproduce the crash.
 It happened at the familiar place in glamor_text() every time.
 
 Procedure: With ctwm as window manager, run "while :; do ls -lR; done"
 in an xterm. Possibly the associated disk I/O contributes to the issue.
 The font is lucidasanstypewriter-12.
 
 In another workspace (so the ls is not even visible), run image viewer
 geeqie (graphics/geeqie) in a directory with images - at least one
 should be larger than the screen. Show such an image at unscaled
 resolution (shortcut key z), with the window maximized or the image
 fullscreen (f). I had best "success" with just the maximized window.
 Then drag the image around, to view other parts. Within a few seconds,
 the SIGSEGV should occur. The attached gdb should show this and its
 promt.
 
 It happened every time today in glamor_text(), printing some output from
 "ls".
 
 Typing "c" for continue in gdb simply continued X without causing the
 expected crash.
 
 This seems to mean that the effect of the mmap(2) call done not so long
 before, to map the vbo into memory, has a delayed effect.
 
 The protocol for mapping seems to be (simplified):
 
 - ioctl(fd, RADEON_GEM_MMAP, &args, ...), returns an args.addr_ptr
 - mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, args.addr_ptr)
 
 There are 3 occurrences of this:
 - xsrc/external/mit/MesaLib/dist/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 - xsrc/external/mit/MesaLib.old/dist/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 - xsrc/external/mit/libdrm/dist/radeon/radeon_bo_gem.c
 
 The one in MesaLib.old seems to be the one in play (I instrumented the
 others and that didn't trigger; also one of the observed crashes
 happened in
 /xsrc/external/mit/MesaLib.old/dist/src/gallium/drivers/r600/r600_shader.c
 as reported above)
 
 My first theory was that the page table entries created by the mmap(2)
 call might not be propagated to all cpus yet, and that a cpu switch
 would (sometimes) have occurred before using the mapped memory. That
 would explain why a delay caused by human reaction time in gdb would fix
 up the memory access.
 
 I tried to test this theory by setting the cpu affinity of the X process
 to just a single cpu:  "sudo schedctl -p 1701 -A 0". After that all 4
 threads of the process were shown to have affinity to cpu 0.
 
 However the SIGSEGV could still be reproduced with this. This makes this
 theory less likely. Can anyone think of some other mechanism that
 "delays" the validity of mapped memory?
 (or maybe the command I used doesn't have the effect I thought?)
 


Home | Main Index | Thread Index | Old Index