tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
Hi,
While further testing my DRM/KMS vmwgfx driver patches by playing
games/minetest from pkgsrc, I experienced inexplicable kernel panics on
this code:
https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L289
Here is a simplified version of the function to illustrate what's going
on. The function is called with a process context:
kmutex_t mtx;
mutex_init(&mtx, MUTEX_DEFAULT, IPL_VM);
...
mutex_spin_enter(&mtx);
kcondvar_t cv;
cv_init(&cv, "whatever");
{{ Store &mtx and &cv somewhere else so that a softint will later
signal the cv. }}
while (true) {
if (!...) {
cv_timedwait_sig(&cv, &mtx, timeout);
}
}
{{ Ask the softint not to signal it anymore. }}
mutex_spin_exit(&mtx);
cv_destroy(&cv); // <-- Panics!
It seldom panics on KASSERT(!cv_has_waiters(cv)) in cv_destroy() but not
always. The panic seems to happen when cv_timedwait_sig() exits due to
the timeout expiring before it gets signaled.
In this particular case the only LWP that can possibly be put into the
sleep queue is curlwp, which should have been removed by
sleepq_timeout() before leaving cv_timedwait_sig(), yet CV_SLEEPQ(cv)
contains some totally unrelated LWPs and even invalid pointers to lwp_t
when it panics.
The implementation of cv_destroy(9) is as follows:
void
cv_destroy(kcondvar_t *cv)
{
sleepq_destroy(CV_SLEEPQ(cv));
#ifdef DIAGNOSTIC
KASSERT(cv_is_valid(cv));
KASSERT(!cv_has_waiters(cv));
CV_SET_WMESG(cv, deadcv);
#endif
}
Yes we are calling cv_has_waiters() without locking the mutex. I could
work around the panic by destroying the cv before unlocking it, but I
really can't see why this is problematic and why I can avoid the panic
by doing that. Does anyone have any clues on what's going on?
Home |
Main Index |
Thread Index |
Old Index