envsys/gttwsi deadlock

To: tech-kern%netbsd.org@localhost
Subject: envsys/gttwsi deadlock
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Wed, 14 Oct 2015 12:35:41 +0200

Hello,
I've written (not yet commited) a driver for the axp209 power management IC
as found in the cubieboard, olimex lime2, and probably other A20-based
boards. The A20 I2C uses the gttwsi_core.c MI driver.

I've found that repeated use of envstat would block callouts, and I
tracked it down to this scenario:
envstat from userland ends up in cv_timedwait_sig() from gttwsi_wait(),
with the sme->sme_mtx lock held (it gets it in sme_update_dictionary()
before calling sysmon_envsys_refresh_sensor()):
db{1}> tr/a bc696380
trace: pid 17030 lid 1 at 0xa88dfa4c
0xa88dfa4c: netbsd:mi_switch+0x10
0xa88dfa7c: netbsd:sleepq_block+0x16c
0xa88dfab4: netbsd:cv_timedwait_sig+0x108
0xa88dfaec: netbsd:gttwsi_wait+0x98
0xa88dfb14: netbsd:gttwsi_initiate_xfer+0x20
0xa88dfb8c: netbsd:iic_exec+0x28c
0xa88dfbc4: netbsd:iic_smbus_block_read+0x48
0xa88dfbf4: netbsd:axp20x_read.isra.1+0x54
0xa88dfc1c: netbsd:axp20x_sensors_refresh+0xe0
0xa88dfc34: netbsd:sysmon_envsys_refresh_sensor+0x70
0xa88dfc5c: netbsd:sme_update_dictionary+0xcc
0xa88dfcec: netbsd:sysmonioctl_envsys+0x1a4
0xa88dfd1c: netbsd:cdev_ioctl+0x88
0xa88dfd44: netbsd:spec_ioctl+0x9c
0xa88dfd74: netbsd:VOP_IOCTL+0x50
0xa88dfe44: netbsd:vn_ioctl+0xc8
0xa88dff0c: netbsd:sys_ioctl+0x274
0xa88dff7c: netbsd:syscall+0x84
0xa88dffac: netbsd:swi_handler+0x98

yet the softclk soft interrupts runs the sme callout, which tries to aquire
sme->sme_mtx. 
db{1}>  tr/a bff01340
trace: pid 0 lid 5 at 0xbfef7e34
0xbfef7e34: netbsd:mi_switch+0x10
0xbfef7e64: netbsd:sleepq_block+0xb4
0xbfef7ea4: netbsd:turnstile_block+0x3e8
0xbfef7f0c: netbsd:mutex_enter+0x234
0xbfef7f34: netbsd:sme_events_check+0x68
0xbfef7f64: netbsd:callout_softclock+0x194
0xbfef7fac: netbsd:softint_dispatch+0xd4

This blocks the softclk thread in turnstile, and the cv_timedwait_sig() from
envstat will never wake up (for some reason it probably missed an
interupt here, but then envstat should see an error, not block and
blocking all callouts in the system).

A way to fix this would be to use mutex_tryenter() in sme_events_check(),
and just reschedule the callout if it can't get the lock.

Does anyone see a problem with this approach ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Index: sysmon_envsys_events.c
===================================================================
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys_events.c,v
retrieving revision 1.110.4.2
diff -u -p -u -r1.110.4.2 sysmon_envsys_events.c
--- sysmon_envsys_events.c	6 Apr 2015 18:45:30 -0000	1.110.4.2
+++ sysmon_envsys_events.c	14 Oct 2015 10:34:54 -0000
@@ -720,18 +720,21 @@ sme_events_check(void *arg)
 		mutex_exit(&sme->sme_work_mtx);
 		return;
 	}
-	mutex_exit(&sme->sme_work_mtx);
-
-	mutex_enter(&sme->sme_mtx);
-	mutex_enter(&sme->sme_work_mtx);
+	if (!mutex_tryenter(&sme->sme_mtx)) {
+		/* can't get lock - try again later */
+		if (!sysmon_low_power)
+			sme_schedule_callout(sme);
+		mutex_exit(&sme->sme_work_mtx);
+		return;
+	}
 	LIST_FOREACH(see, &sme->sme_events_list, see_list) {
 		workqueue_enqueue(sme->sme_wq, &see->see_wk, NULL);
 		see->see_edata->flags |= ENVSYS_FNEED_REFRESH;
 		sme->sme_busy++;
 	}
-	mutex_exit(&sme->sme_work_mtx);
 	if (!sysmon_low_power)
 		sme_schedule_callout(sme);
+	mutex_exit(&sme->sme_work_mtx);
 	mutex_exit(&sme->sme_mtx);
 }

Follow-Ups:
- Re: envsys/gttwsi deadlock
  - From: Taylor R Campbell

Prev by Date: Re: Anomalies while handling p_nstopchild count
Next by Date: Re: envsys/gttwsi deadlock
Previous by Thread: tmpfs race conditions
Next by Thread: Re: envsys/gttwsi deadlock
Indexes:

Home | Main Index | Thread Index | Old Index