Subject: Race condition in generic soft interrupts code?
To: None <port-arm@netbsd.org, thorpej@wasabisystems.com>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 07/14/2003 22:50:08
I'm posting this here because I believe this problem may affect (at least
in theory) all platforms based on Jason's generic soft-interrupts
implementation for the ARM.
In trying to port the generic soft-ints code to the Integrator board I've
been running into a race condition that can lead to particular interrupts
becoming permanently blocked, even though the SPL level is 0 and the
processor is sitting in the idle loop. The main symptom is that
XXX_ipending has a bit set, and the corresponding bit in intr_enabled is
zero, so the interrupt is permanently blocked. Here is how I think this
can happen:
A high-priority interrupt (high SPL) occurs (lets say it's statclock)
The code enters XXX_intr_dispatch and blocks that interrupt. Eg:
hwpend = XXX_iintsrc_read();
/*
* Disable all the interrupts that are pending. We will
* reenable them once they are processed and not masked.
*/
intr_enabled &= ~hwpend;
XXX_set_intrmask();
It then enters the loop below that scans over hwpend bits. While
processing the statclock interrupt the SPL is effectively raised to
IPL_STAT_CLOCK with
current_spl_level |= iq->iq_mask;
oldirqstate = enable_interrupts(I32_bit);
Interrupts are then enabled and the statclock handler code is called:
for (ih = TAILQ_FIRST(&iq->iq_list); ih != NULL;
ih = TAILQ_NEXT(ih, ih_list)) {
(void) (*ih->ih_func)(ih->ih_arg ? ih->ih_arg : frame);
}
Now while this is being processed, another lower-priority interrupt occurs
(say hardclock): We again enter XXX_intr_dispatch and block that source
of interrupt as before. This time, however, we find that
(pcpl & ibit)
is true for hardclock (since IPL_CLOCK < IPL_STAT_CLOCK), so we fall into
the code
if (pcpl & ibit) {
/*
* IRQ is masked; mark it as pending and check
* the next one. Note: the IRQ is already disabled.
*/
ifpga_ipending |= ibit;
continue;
}
and then there is nothing more to do (so we return to the previous
interrupt invocation.
We now complete the handling of the statclock interrupt and restore the
IPL level, re-enable the statclock interrupt and exit the loop (there was
only one bit set in hwpend, remember):
current_spl_level = pcpl;
/* Re-enable this interrupt now that's it's cleared. */
intr_enabled |= ibit;
ifpga_set_intrmask();
At no time do we check for pending interrupts that need to be handled at
this level.
It seems that the fix for this is trivial, simply insert the following
line in the code that cleans up after running the handler list:
restore_interrupts(oldirqstate);
current_spl_level = pcpl;
+ hwpend |= (XXX_ipending & XXX_INTR_HWMASK) & ~pcpl;
/* Re-enable this interrupt now that's it's cleared. */
intr_enabled |= ibit;
XXX_set_intrmask();
and this will cause the interrupt that was deferred to be handled in the
current loop before returning.
This patch was certainly sufficient to correct the problem I've been
encountering (where the main clock interrupt was getting completely lost,
causing the system to hang during scsi-bus probing -- no clock, so no
timeout).
Jason, does this look correct to you? If so, then all ports derived from
your code will need patching.
R.