tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

IIC locking when shutting down



Hi all,

When testing some iic changes, we saw this panic on shutdown (on a 4 CPU
system):

  [ 356534.4468099] Skipping crash dump on recursive panic
  [ 356534.5093055] panic: lock error: Mutex: mutex_vector_exit,747: assertion failed: MUTEX_OWNER(mtx->mtx_owner) == curthread: lock 0x1037806c8 cpu 0 lwp 0x1044e6e80

CPU 0 and 2 are both in the iic code (for backtraces, see below).  I think
that I know why this happened.  On CPU 0 we crash when we release the I2C
bus lock at:

  https://nxr.netbsd.org/xref/src/sys/arch/sparc64/dev/pcf8591_envctrl.c#296

presumably because we didn't acquire it, because CPU 2 is running code that
acquired the lock:

  https://nxr.netbsd.org/xref/src/sys/dev/i2c/lm75.c#352

If I'm reading the code correctly, this means that either we're either
polling or that the controller iic_acquire_bus() routine failed.  As pcfiic
doesn't have an iic_acquire_bus() routine, we must be polling.  Neither
ecadc nor lm75 (lmtemp) set the polling flag, and the sparc64 frontend
(pcfiic_ebus) only sets it if we don't have an interrupt, but I can see
interrupts in the node (/pci/ebus@1/SUNW,envctrl@14,600000 on this E450).

However, we call iic_op_flags() at the start of ic_acquire_bus():

  https://nxr.netbsd.org/xref/src/sys/dev/i2c/i2c_exec.c#62

so I think that what happened was:

 - we set the polling flag in iic_op_flags() during shutdown
 - iic_acquire_bus() acquired the mutex for the 1st caller (lmtemp)
 - iic_acquire_bus() returned EBUSY for the 2nd caller (ecadc)
 - we don't handle errors from iic_acquire_bus() in the driver code
 - we continue and then call iic_release_bus()
 - boom

It looks like every call to iic_acquire_bus() will need to handle the error
return.  We seem to do this in a few drivers, but not all of them.  However,
is there a simple solution that I'm missing?

Regards,

Julian

 - - -

Backtraces:

  db{0}> mach cpu 0
  db{0}> bt
  panic(1aa5210, 1a9d000, 19e79a8, 2eb, 1a9cfc0, 1037806c8) at netbsd:panic+0x20
  lockdebug_abort(19e79a8, 2eb, 1037806c8, 1c606c0, 1a9cfc0, e0048000) at netbsd:lockdebug_abort+0xa4
  mutex_spin_exit(1037806c8, 1044e6e80, 0, 0, 0, 1cefa9c00) at netbsd:mutex_spin_exit+0xc4
  ecadc_refresh(103780a40, 1042aa8e0, e0048000, 0, 1c60c98, 1042aa5c0) at netbsd:ecadc_refresh+0x98
  sysmon_envsys_refresh_sensor(103780a40, 1042aa8e0, 1044e6e80, 1a65000, 1cc1000, 1044e6e80) at netbsd:sysmon_envsys_refresh_sensor+0x1c
  sme_events_worker(103780b10, 103780a40, 1044e6e80, 1042aa8e0, 103780a40, 1044dce80) at netbsd:sme_events_worker+0x130
  workqueue_worker(1037811c0, 103781218, 103781228, 103781208, 103781200, 0) at netbsd:workqueue_worker+0xf8
  lwp_trampoline(f0075a4c, 113800, 112f40, fff7fdf8, 0, fff7fc78) at netbsd:lwp_trampoline+0x8

  db{1}> mach cpu 2
  db{2}> bt
  pcfiic_xmit(103780680, 0, 1cefb1b38, 1, ffffffffffffffff, 1) at netbsd:pcfiic_xmit+0xa4
  pcfiic_i2c_exec(103780680, 4d, 4d, 1, 1cefb1b40, 1cefb1b40) at netbsd:pcfiic_i2c_exec+0x68
  iic_exec(1037806c0, 1, 4d, 1cefb1b38, 1, 1cefb1b40) at netbsd:iic_exec+0x1a4
  lmtemp_temp_read(104288200, 0, 1cefb1c0c, 0, 104288200, 1c60c98) at netbsd:lmtemp_temp_read+0x40
  lmtemp_refresh(103780cc0, 104288228, e0048000, 0, 1cee98000, 104288200) at netbsd:lmtemp_refresh+0x20
  sysmon_envsys_refresh_sensor(103780cc0, 104288228, 1044e7700, 1a65000, 1cc1000, 1044e7700) at netbsd:sysmon_envsys_refresh_sensor+0x1c
  sme_events_worker(103780d90, 103780cc0, 1044e7700, 104288228, 103780cc0, 1044dd080) at netbsd:sme_events_worker+0x130
  workqueue_worker(103781440, 103781498, 1037814a8, 103781488, 103781480, 0) at netbsd:workqueue_worker+0xf8
  lwp_trampoline(f0075a4c, 113800, 112f40, fff7fdf8, 0, fff7fc78) at netbsd:lwp_trampoline+0x8


Home | Main Index | Thread Index | Old Index