NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xsrc/58133: X server crashes; radeon 5450; modesetting



So my experiments of today can easily reproduce the crash.
It happened at the familiar place in glamor_text() every time.

Procedure: With ctwm as window manager, run "while :; do ls -lR; done"
in an xterm. Possibly the associated disk I/O contributes to the issue.
The font is lucidasanstypewriter-12.

In another workspace (so the ls is not even visible), run image viewer
geeqie (graphics/geeqie) in a directory with images - at least one
should be larger than the screen. Show such an image at unscaled
resolution (shortcut key z), with the window maximized or the image
fullscreen (f). I had best "success" with just the maximized window.
Then drag the image around, to view other parts. Within a few seconds,
the SIGSEGV should occur. The attached gdb should show this and its
promt.

It happened every time today in glamor_text(), printing some output from
"ls".

Typing "c" for continue in gdb simply continued X without causing the
expected crash.

This seems to mean that the effect of the mmap(2) call done not so long
before, to map the vbo into memory, has a delayed effect.

The protocol for mapping seems to be (simplified):

- ioctl(fd, RADEON_GEM_MMAP, &args, ...), returns an args.addr_ptr
- mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, args.addr_ptr)

There are 3 occurrences of this:
- xsrc/external/mit/MesaLib/dist/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
- xsrc/external/mit/MesaLib.old/dist/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
- xsrc/external/mit/libdrm/dist/radeon/radeon_bo_gem.c

The one in MesaLib.old seems to be the one in play (I instrumented the
others and that didn't trigger; also one of the observed crashes
happened in
/xsrc/external/mit/MesaLib.old/dist/src/gallium/drivers/r600/r600_shader.c
as reported above)

My first theory was that the page table entries created by the mmap(2)
call might not be propagated to all cpus yet, and that a cpu switch
would (sometimes) have occurred before using the mapped memory. That
would explain why a delay caused by human reaction time in gdb would fix
up the memory access.

I tried to test this theory by setting the cpu affinity of the X process
to just a single cpu:  "sudo schedctl -p 1701 -A 0". After that all 4
threads of the process were shown to have affinity to cpu 0.

However the SIGSEGV could still be reproduced with this. This makes this
theory less likely. Can anyone think of some other mechanism that
"delays" the validity of mapped memory?
(or maybe the command I used doesn't have the effect I thought?)


Home | Main Index | Thread Index | Old Index