tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: DRM/KMS: vmwgfx driver is now available



On 7/24/23 00:03, Taylor R Campbell wrote:
Date: Sun, 23 Jul 2023 22:57:14 +0900
From: PHO <pho%cielonegro.org@localhost>

On 7/23/23 21:24, Taylor R Campbell wrote:
Why can't the loop just use dma_fence_get_rcu and dma_fence_put?

Might need to release and reacquire the lock around dma_fence_put.

I tried but it didn't work, apparently because someone was holding a
pointer to a dying fence and waiting for it to be signaled. Using
dma_fence_get_rcu() meant we didn't signal such fences, and the kernel
hanged and eventually got killed by heartbeat().

What exactly did you try and what was the heartbeat panic stack trace?
Nothing here should wait for signalling in a way that trips the kernel
heartbeat monitor -- that should only happen if you hold onto a spin
lock for too long.

In __vmw_fences_update():

        ...
	list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
		if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
			list_del_init(&fence->head);
struct dma_fence *f = dma_fence_get_rcu(&fence->base);
                        if (f) {
                                dma_fence_signal_locked(f);
                                spin_unlock(&fman->lock);
                                dma_fence_put(f);
                                spin_lock(&fman->lock);
                        }
                        INIT_LIST_HEAD(&action_list);
                        ...

And the stack trace was this:

__kernel_end() at 0
sys_reboot() at sys_reboot+0x30
vpanic() at vpanic+0x1ad
printf_nostamp() at printf_nostamp+0x30
heartbeat() at heartbeat+0x21c
hardclock() at hardclock+0xbb
Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
--- interrupt ---
mutex_vector_enter() at mutex_vector_enter+0x36f
vmw_fifo_send_fence() at vmw_fifo_send_fence+0xea
vmw_execbuf_fence_commands() at vmw_execbuf_fence_commands+0x3b
vmw_execbuf_process() at vmw_execbuf_process+0x91e
vmw_execbuf_ioctl() at vmw_execbuf_ioctl+0x97
drm_ioctl() at drm_ioctl+0x26d
drm_read() at drm_read+0x15
sys_ioctl() at sys_ioctl+0x59d
syscall() at syscall+0x196
--- syscall (number 54) ---
syscall+0x196:

vmw_fifo_send_fence+0xea was at vmwgfx_fifo.c:622:

	spin_lock(&dev_priv->fence_lock);

But let's not worry about this too much, as it doesn't occur in the latest code (read below).

(That said: yay, the heartbeat monitor is catching what would
otherwise be a silent nonresponsive hang!)

Yup, it has helped me so many times.

1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can
    safely do dma_fence_put under the lock.

I can't see how to do it without interfering with dma_fence_free(), so

2. Create a list on the stack of signalled fences to put in
    __vmw_fences_update and put them after the loop, outside the lock:

I tried this. Your code didn't work as-is because it could release the same fence twice, but in the end I managed to get it work! Yay, now I can withdraw my changes to linux_dma_fence.c:

static void __vmw_fences_update(struct vmw_fence_manager *fman)
{
        ...
	struct list_head to_put = LIST_HEAD_INIT(to_put);
        ...
	list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
		if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
			list_del_init(&fence->head);
			if (dma_fence_get_rcu(&fence->base) != NULL) {
				list_add(&fence->head, &to_put);
				dma_fence_signal_locked(&fence->base);
			}

			INIT_LIST_HEAD(&action_list);
			list_splice_init(&fence->seq_passed_actions,
					 &action_list);
			vmw_fences_perform_actions(fman, &action_list);
		} else
			break;
	}
        ...
	if (!list_empty(&to_put)) {
		spin_unlock(&fman->lock);
		list_for_each_entry_safe(fence, next_fence, &to_put, head) {
			list_del_init(&fence->head);
			dma_fence_put(&fence->base);
		}
		spin_lock(&fman->lock);
	}
}


Home | Main Index | Thread Index | Old Index