tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)



On 7/23/23 17:27, PHO wrote:
On 7/22/23 22:41, Taylor R Campbell wrote:
Date: Sat, 22 Jul 2023 21:52:40 +0900
From: PHO <pho%cielonegro.org@localhost>

Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161]
vmw_fence_wait() at netbsd:vmw_fence_wait+0xdc

Just to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in
gdb?

And, if you can get to the frame in gdb, what does gdb say &cb.wq is
in the vmw_fence_wait frame, and what cv is in the cv_destroy frame?

Let's confirm it is the cv you think it is -- I suspect it might be a
different one.

I just encountered the crash and could obtain a crash dump. It is indeed the "DRM_DESTROY_WAITQUEUE(&cb.wq)" in vmw_fence_wait() but the contents of cb does not make sense to me:

...

CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string?

I realized the cause of this:

static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout)
{
        ...
	if (likely(vmw_fence_obj_signaled(fence)))
		return timeout;
        ...
	spin_lock(f->lock);

	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->flags))
		goto out; // <-- THIS ONE

	if (intr && signal_pending(current)) {
		ret = -ERESTARTSYS;
		goto out; // <-- OR THIS
	}

#ifdef __NetBSD__
	DRM_INIT_WAITQUEUE(&cb.wq, "vmwgfxwf");
#else
	cb.task = current;
#endif
        ...
out:
	spin_unlock(f->lock);
#ifdef __NetBSD__
	DRM_DESTROY_WAITQUEUE(&cb.wq);
#endif
        ...
}

There were cases where the function was destroying a condvar that it didn't initialize! Ugh, this is the very reason why I dislike C...

Home | Main Index | Thread Index | Old Index