[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
While testing changes to an envsys driver, I saw this crash on shutdown:
[ 1651.0108940] cpu0: data fault: pc=155ea68 rpc=101db8ca4 addr=0
[ 1651.0108940] kernel trap 30: data access exception
Stopped in pid 0.5 (system) at netbsd:mutex_oncpu.part.0+0x8: ldx
[%g1 + 0x18], %g2
sme_events_check(101db6718, 101d8a041, 0, 1c63348, 101db6640, 101d8a040) at netbsd:sme_events_check+0xc
line 739 mutex_enter(&sme->sme_work_mtx);
The driver runs sysmon_envsys_destroy() in its detach routine. Looking at
the code, it looks like that could race with sme_events_check() whilst the
sme sensors list is being removed - they both start by checking that
sme != NULL but sysmon_envsys_destroy() could remove the sme structure
whilst sme_events_check() is running. I'm guessing that's what happened
in the above case. Note, that I only saw this once in about 50 reboots,
so it's quite rare.
It seems sensible to take the sme_mtx in sysmon_envsys_destroy(), but
that just reduces the window - sme_events_check() checks sme != NULL and
the mutexs are part of the sme structure that we want to remove.
There is code in sysmon_envsys_sensor_detach() which removes callouts,
so a better solution might be to call sysmon_envsys_sensor_detach() from
sysmon_envsys_destroy(), or audit every driver to check that is done.
Any other solution appreciated.
Main Index |
Thread Index |