tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
IIC locking when shutting down
Hi all,
When testing some iic changes, we saw this panic on shutdown (on a 4 CPU
system):
[ 356534.4468099] Skipping crash dump on recursive panic
[ 356534.5093055] panic: lock error: Mutex: mutex_vector_exit,747: assertion failed: MUTEX_OWNER(mtx->mtx_owner) == curthread: lock 0x1037806c8 cpu 0 lwp 0x1044e6e80
CPU 0 and 2 are both in the iic code (for backtraces, see below). I think
that I know why this happened. On CPU 0 we crash when we release the I2C
bus lock at:
https://nxr.netbsd.org/xref/src/sys/arch/sparc64/dev/pcf8591_envctrl.c#296
presumably because we didn't acquire it, because CPU 2 is running code that
acquired the lock:
https://nxr.netbsd.org/xref/src/sys/dev/i2c/lm75.c#352
If I'm reading the code correctly, this means that either we're either
polling or that the controller iic_acquire_bus() routine failed. As pcfiic
doesn't have an iic_acquire_bus() routine, we must be polling. Neither
ecadc nor lm75 (lmtemp) set the polling flag, and the sparc64 frontend
(pcfiic_ebus) only sets it if we don't have an interrupt, but I can see
interrupts in the node (/pci/ebus@1/SUNW,envctrl@14,600000 on this E450).
However, we call iic_op_flags() at the start of ic_acquire_bus():
https://nxr.netbsd.org/xref/src/sys/dev/i2c/i2c_exec.c#62
so I think that what happened was:
- we set the polling flag in iic_op_flags() during shutdown
- iic_acquire_bus() acquired the mutex for the 1st caller (lmtemp)
- iic_acquire_bus() returned EBUSY for the 2nd caller (ecadc)
- we don't handle errors from iic_acquire_bus() in the driver code
- we continue and then call iic_release_bus()
- boom
It looks like every call to iic_acquire_bus() will need to handle the error
return. We seem to do this in a few drivers, but not all of them. However,
is there a simple solution that I'm missing?
Regards,
Julian
- - -
Backtraces:
db{0}> mach cpu 0
db{0}> bt
panic(1aa5210, 1a9d000, 19e79a8, 2eb, 1a9cfc0, 1037806c8) at netbsd:panic+0x20
lockdebug_abort(19e79a8, 2eb, 1037806c8, 1c606c0, 1a9cfc0, e0048000) at netbsd:lockdebug_abort+0xa4
mutex_spin_exit(1037806c8, 1044e6e80, 0, 0, 0, 1cefa9c00) at netbsd:mutex_spin_exit+0xc4
ecadc_refresh(103780a40, 1042aa8e0, e0048000, 0, 1c60c98, 1042aa5c0) at netbsd:ecadc_refresh+0x98
sysmon_envsys_refresh_sensor(103780a40, 1042aa8e0, 1044e6e80, 1a65000, 1cc1000, 1044e6e80) at netbsd:sysmon_envsys_refresh_sensor+0x1c
sme_events_worker(103780b10, 103780a40, 1044e6e80, 1042aa8e0, 103780a40, 1044dce80) at netbsd:sme_events_worker+0x130
workqueue_worker(1037811c0, 103781218, 103781228, 103781208, 103781200, 0) at netbsd:workqueue_worker+0xf8
lwp_trampoline(f0075a4c, 113800, 112f40, fff7fdf8, 0, fff7fc78) at netbsd:lwp_trampoline+0x8
db{1}> mach cpu 2
db{2}> bt
pcfiic_xmit(103780680, 0, 1cefb1b38, 1, ffffffffffffffff, 1) at netbsd:pcfiic_xmit+0xa4
pcfiic_i2c_exec(103780680, 4d, 4d, 1, 1cefb1b40, 1cefb1b40) at netbsd:pcfiic_i2c_exec+0x68
iic_exec(1037806c0, 1, 4d, 1cefb1b38, 1, 1cefb1b40) at netbsd:iic_exec+0x1a4
lmtemp_temp_read(104288200, 0, 1cefb1c0c, 0, 104288200, 1c60c98) at netbsd:lmtemp_temp_read+0x40
lmtemp_refresh(103780cc0, 104288228, e0048000, 0, 1cee98000, 104288200) at netbsd:lmtemp_refresh+0x20
sysmon_envsys_refresh_sensor(103780cc0, 104288228, 1044e7700, 1a65000, 1cc1000, 1044e7700) at netbsd:sysmon_envsys_refresh_sensor+0x1c
sme_events_worker(103780d90, 103780cc0, 1044e7700, 104288228, 103780cc0, 1044dd080) at netbsd:sme_events_worker+0x130
workqueue_worker(103781440, 103781498, 1037814a8, 103781488, 103781480, 0) at netbsd:workqueue_worker+0xf8
lwp_trampoline(f0075a4c, 113800, 112f40, fff7fdf8, 0, fff7fc78) at netbsd:lwp_trampoline+0x8
Home |
Main Index |
Thread Index |
Old Index