Strange crash of DIAGNOSTIC kernel on cv_destroy(9)

To: tech-kern%netbsd.org@localhost
Subject: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
From: PHO <pho%cielonegro.org@localhost>
Date: Mon, 17 Jul 2023 12:57:42 +0900

Hi,

While further testing my DRM/KMS vmwgfx driver patches by playinggames/minetest from pkgsrc, I experienced inexplicable kernel panics onthis code:


https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L289

Here is a simplified version of the function to illustrate what's goingon. The function is called with a process context:


  kmutex_t mtx;
  mutex_init(&mtx, MUTEX_DEFAULT, IPL_VM);

  ...

  mutex_spin_enter(&mtx);

  kcondvar_t cv;
  cv_init(&cv, "whatever");

{{ Store &mtx and &cv somewhere else so that a softint will latersignal the cv. }}


  while (true) {
      if (!...) {
          cv_timedwait_sig(&cv, &mtx, timeout);
      }
  }

  {{ Ask the softint not to signal it anymore. }}

  mutex_spin_exit(&mtx);

  cv_destroy(&cv); // <-- Panics!

It seldom panics on KASSERT(!cv_has_waiters(cv)) in cv_destroy() but notalways. The panic seems to happen when cv_timedwait_sig() exits due tothe timeout expiring before it gets signaled.

In this particular case the only LWP that can possibly be put into thesleep queue is curlwp, which should have been removed bysleepq_timeout() before leaving cv_timedwait_sig(), yet CV_SLEEPQ(cv)contains some totally unrelated LWPs and even invalid pointers to lwp_twhen it panics.


The implementation of cv_destroy(9) is as follows:

  void
  cv_destroy(kcondvar_t *cv)
  {
      sleepq_destroy(CV_SLEEPQ(cv));
  #ifdef DIAGNOSTIC
      KASSERT(cv_is_valid(cv));
      KASSERT(!cv_has_waiters(cv));
      CV_SET_WMESG(cv, deadcv);
  #endif
  }

Yes we are calling cv_has_waiters() without locking the mutex. I couldwork around the panic by destroying the cv before unlocking it, but Ireally can't see why this is problematic and why I can avoid the panicby doing that. Does anyone have any clues on what's going on?

Follow-Ups:
- Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
  - From: Taylor R Campbell

Prev by Date: [PATCH] __builtin_ffs/clz in sys/bitops.h
Next by Date: Re: DRM/KMS: vmwgfx driver is now available
Previous by Thread: [PATCH] __builtin_ffs/clz in sys/bitops.h
Next by Thread: Re: Strange crash of DIAGNOSTIC kernel on cv_destroy(9)
Indexes:

Home | Main Index | Thread Index | Old Index