tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Locking strategy for device deletion (also see PR kern/48536)



   Date: Thu, 9 Jun 2016 07:12:08 +0800 (PHT)
   From: Paul Goyette <paul%whooppee.com@localhost>

   For now, I'm just going to maintain a zero/non-zero "active" flag in the 
   driver, setting it on open() and clearing on close().

   Unloading of the module will first call devsw_detach(), and then check 
   the flag.  If the flag is set, call devsw_attach() to reinsert the 
   device, and fail the unload operation.

   All accesses to the flag (set, clear, and check) will be protected by a 
   mutex.

That's a reasonable first approximation -- though it does not prevent
unmapping the code for ipfopen/ipfioctl/ipfclose/&c. before all CPUs
are done with them.  Granted, that is probably a smaller race than
you're mainly concerned about!

   BTW, reading Taylor's sketch, it looks like there are some very small 
   windows where the mutex is released (within the xcall?) after the CPU's 
   count has been added to the total.  Is it possible for someone to sneak 
   in and add a new reference before everything is finished?

If a CPU sneaks in after it has executed device_usecount_xc, then it
has *also* witnessed dev->dv_dying = true (cross-calls serve as full
memory barriers), so device_lookup_acquire will fail on that CPU.

   Also, I'm not quite sure I understand why we have the pointer-to-total 
   and why we have the possibility of setting to NULL (and creating an 
   intentional chance for a crash?)

           /* Cause subsequent attempts to add to the total to crash.  */

   Are these two observations related?  And would it be reasonable to do a 
   KASSERT(dev->dv_usecount_total != NULL) within the xcall() routine so we 
   would at least get a meaningful panic message?   :)

Sure, that would be fine.  That's just a bit of paranoia to sneakily
double-check my intended invariants.  This sketch is not as tight as
it could be: we could use the nullness-or-nonnullness of
dev->dv_usecount_total instead of an extra dev->dv_dying field, to
save a few bytes of memory per device.

   Just wondering here - for pseudo-devices, is it really necessary to 
   remove the autoconf stuff at all?  It's not clear to me where it even 
   gets added in the case of modularly-loaded pseudo-devices...

Yes.  See, e.g., cgd_modcmd.  In addition to cgd_cdevsw, there is also
a struct cfdriver object, cgd_cd, that must be detached, i.e. removed
from allcfdrivers.


Home | Main Index | Thread Index | Old Index