On Tue, Jun 07, 2016 at 06:28:11PM +0800, Paul Goyette wrote:
Can anyone suggest a reliable way to ensure that a device-driver
module can be _really_ safely detached?
The module could theoretically maintain an open/ref counter, but
making this MP-safe is "difficult"!  Even if the module were to
provide a mutex to control increment/decrement of it's counter,
there's still a problem:
Thread 1 initiates a module-unload, which takes the mutex
Thread 2 attempts to open the device (or one of its units), attempts to
grab the mutex, and waits
Back in thread 1, the driver's module unload code determines that it
is safe to unload (no current activites queued, no current opens),
so it
goes forward and unmaps the module - including the mutex!
I think that what's missing is a flag on the module that says it is
unloading, and module entrance/exit counters.  I think it could work
sort of like this---the devil is in the details:
Thread 1 initiates a module unload:
	1) Acquires mutex
	2) Sets the module's unloading flag
	3) Unlinks module entry points---that is, they're still mapped,
	   but there are no more globally-visible pointers to them
	4) While module entrances > exits, sleeps on module condition
	   variable C, thus temporarily releasing mutex
	5) Releases mutex
	6) Unmaps module
Thread 2 attempts to open the device
	1) Increases module-entrance count
	2) Acquires mutex
	3) Examines unloading flag
		a) Finding it set, signals condition variable C,
		b) OR, finding it NOT set, performs open
	4) increases module-exit count
	5) releases mutex
The module entrance/exit counts can be per-CPU variables that you
increment using non-interlocked atomic instructions, which are not very
expensive.
Now, I am trying to remember if/why counting entrances and exits
separately is necessary.  ISTM that to avoid races, you want to add up
exits across all CPUs, first, then add up entrances, and compare.
This is not necessarily the best or only way to handle this, and I feel
sure that I've overlooked a fatal flaw in this first draft.