tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: envsys/gttwsi deadlock



On Thu, Oct 15, 2015 at 05:11:47PM +0000, Taylor R Campbell wrote:
>    Date: Wed, 14 Oct 2015 15:53:16 +0000
>    From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
> 
>    This is not gttwsi_wait's fault -- this is envsys's fault.  The
>    correct solution should be to change the API to avoid holding a mutex
>    across calls to a device sensor refresh routine.  Something like
>    kern/kern_rndsink.c, perhaps.
> 
> The code structure should be something like this:
> 
> struct sysmon_envsys {
> 	...
> 	unsigned sme_refcnt;
> 	unsigned sme_busy:1;
> 	...
> };
> 
> mutex_enter(&sme_global_mtx);
> LIST_FOREACH(sme, &sysmon_envsys_list, sme_list) {
> 	sysmon_envsys_acquire(sme);	/* Bump refcnt.  */
> 	mutex_exit(&sme_global_mtx);
> 	...
> 	/* sme_update_dictionary */
> 	TAILQ_FOREACH(sensor, &sme->sme_sensors_list, sensors_head) {
> 		sysmon_envsys_busy(sme);	/* Wait for busy flag.  */
> 		sysmon_envsys_refresh_sensor(sme, sensor);
> 						/* Call sme->sme_refresh.  */
> 		sysmon_envsys_unbusy(sme);	/* Clear busy flag.  */
> 	}
> 	...
> 	mutex_enter(&sme_global_mtx);
> 	sysmon_envsys_release(sme);	/* Drop refcnt.  */
> }
> mutex_exit(&sme_global_mtx);
> 
> During the call to sysmon_envsys_refresh_sensor, the only `lock' that
> should be held is the busy flag.  The reference count ensures sme
> doesn't go away while sme_global_mtx is dropped; I assume here the
> sensors list is stable during the lifetime of sme.
> 
> Both sysmon_envysys_acquire/release and sysmon_envsys_busy/unbusy
> would hold sme->sme_mtx only internally.  sme_events_check need not
> busy the sensor, so there is no chance it may attempt to wait for I/O.

I looked a bit, and it's not that easy.
sysmon_envsys_refresh_sensor() is called from various places with sme_mtx
held; sometimes the function calling sysmon_envsys_refresh_sensor() is
entered with the lock held. sme_mtx protects various states of sme, and
reworking this requires quite a bit of work (dropping it before calling
sysmon_envsys_refresh_sensor() and reaquiring after is not an option).
More work than I can put in this right now ...

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index