Subject: Re: Race condition in generic soft interrupts code?
To: None <Richard.Earnshaw@arm.com>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 07/14/2003 23:47:58
This is a multi-part message in MIME format.
--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
On Mon, 14 Jul 2003 22:50:08 +0100
Richard Earnshaw <rearnsha@arm.com> wrote:
> I'm posting this here because I believe this problem may affect (at
> least in theory) all platforms based on Jason's generic
> soft-interrupts implementation for the ARM.
>
> In trying to port the generic soft-ints code to the Integrator board
> I've been running into a race condition that can lead to particular
> interrupts becoming permanently blocked, even though the SPL level is
> 0 and the processor is sitting in the idle loop. The main symptom is
> that XXX_ipending has a bit set, and the corresponding bit in
> intr_enabled is zero, so the interrupt is permanently blocked. Here
> is how I think this can happen:
>
> A high-priority interrupt (high SPL) occurs (lets say it's statclock)
>
> The code enters XXX_intr_dispatch and blocks that interrupt. Eg:
>
> hwpend = XXX_iintsrc_read();
>
> /*
> * Disable all the interrupts that are pending. We will
> * reenable them once they are processed and not masked.
> */
> intr_enabled &= ~hwpend;
> XXX_set_intrmask();
>
> It then enters the loop below that scans over hwpend bits. While
> processing the statclock interrupt the SPL is effectively raised to
> IPL_STAT_CLOCK with
>
> current_spl_level |= iq->iq_mask;
> oldirqstate = enable_interrupts(I32_bit);
>
> Interrupts are then enabled and the statclock handler code is called:
>
> for (ih = TAILQ_FIRST(&iq->iq_list); ih != NULL;
> ih = TAILQ_NEXT(ih, ih_list)) {
> (void) (*ih->ih_func)(ih->ih_arg ? ih->ih_arg : frame);
> }
>
> Now while this is being processed, another lower-priority interrupt
> occurs (say hardclock): We again enter XXX_intr_dispatch and block
> that source of interrupt as before. This time, however, we find that
> (pcpl & ibit)
> is true for hardclock (since IPL_CLOCK < IPL_STAT_CLOCK), so we fall
> into the code
>
> if (pcpl & ibit) {
> /*
> * IRQ is masked; mark it as pending and check
> * the next one. Note: the IRQ is already disabled.
> */
> ifpga_ipending |= ibit;
> continue;
> }
>
> and then there is nothing more to do (so we return to the previous
> interrupt invocation.
>
> We now complete the handling of the statclock interrupt and restore
> the IPL level, re-enable the statclock interrupt and exit the loop
> (there was only one bit set in hwpend, remember):
>
> current_spl_level = pcpl;
>
> /* Re-enable this interrupt now that's it's cleared. */
> intr_enabled |= ibit;
> ifpga_set_intrmask();
>
> At no time do we check for pending interrupts that need to be handled
> at this level.
>
> It seems that the fix for this is trivial, simply insert the following
>
> line in the code that cleans up after running the handler list:
>
> restore_interrupts(oldirqstate);
> current_spl_level = pcpl;
>
> + hwpend |= (XXX_ipending & XXX_INTR_HWMASK) & ~pcpl;
>
> /* Re-enable this interrupt now that's it's cleared. */
> intr_enabled |= ibit;
> XXX_set_intrmask();
>
> and this will cause the interrupt that was deferred to be handled in
> the current loop before returning.
>
> This patch was certainly sufficient to correct the problem I've been
> encountering (where the main clock interrupt was getting completely
> lost, causing the system to hang during scsi-bus probing -- no clock,
> so no timeout).
Hmmm, which platform was this? I fixed this on cats, note the tail end
of footbridge_intr_dispatch does an splx(pcpl), I did mail jason about
this, see attached, but he probably lost it in his mountains of emails.
> Jason, does this look correct to you? If so, then all ports derived
> from your code will need patching.
I added an splx as it felt the right thing to do, but may be less
efficient...
Chris
--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00
Content-Type: text/plain;
name="xscale_interrupt_buglet"
Content-Disposition: attachment;
filename="xscale_interrupt_buglet"
Content-Transfer-Encoding: base64
RGF0ZTogRnJpLCAxIE5vdiAyMDAyIDE3OjAyOjE2ICswMDAwCkZyb206IENocmlzIEdpbGJlcnQg
PGNocmlzQGRva2Vpbi5jby51az4KVG86IHRob3JwZWpAbmV0YnNkLm9yZwpTdWJqZWN0OiB4c2Nh
bGUgaW50ZXJydXB0IGJ1Z2xldApNZXNzYWdlLUlkOiA8MjAwMjExMDExNzAyMTYuMDZlNzA2NDgu
Y2hyaXNAZG9rZWluLmNvLnVrPgpYLU1haWxlcjogU3lscGhlZWQgdmVyc2lvbiAwLjguNSAoR1RL
KyAxLjIuMTA7IGkzODYtLW5ldGJzZGVsZikKTWltZS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBl
OiB0ZXh0L3BsYWluOyBjaGFyc2V0PVVTLUFTQ0lJCkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6
IDdiaXQKCkhpIEphc29uLAoKQXMgbWVudGlvbmVkIG9uIGljYiwgSSB0aGluayB0aGVyZSdzIGEg
bWlub3IgYnVnIGluIHRoZSBpbnRlcnJ1cHQgY29kZSwKdGhlIHJlYXNvbiBJIHNwb3R0ZWQgaXMg
YmVjYXVzZSBjYXRzIGhhcyBhIHNlcGVyYXRlIHN0YXRjbG9jaywgd2hpY2gKY2F1c2VkIHRoZSBj
bG9jayBpbnRlcnJ1cHQgdG8gYmUgYmxvY2tlZC4KCkFueXdheSB0aGUgZ2lzdCBpczoKd2UgdGFr
ZSBhbiBpbnRlcnJ1cHQsIHN0YXRjbG9jaywgd2UgbWFzayBvdXQgZXZlcnl0aGluZyBiZWxvdyBp
dCBpbiB0aGUKY3VycmVudF9zcGxfbGV2ZWwsIGluY2x1ZGluZyB0aGUgY2xvY2sgaW50ZXJydXB0
LgoKd2hpbGUgaGFuZGxpbmcgdGhlIHN0YXRjbG9jaywgYSBjbG9jayBpbnRlcnJ1cHQgaGFwcGVu
cywgaW4gdGhlIGRpc3BhdGNoCmNvZGUgd2UgYmxvY2sgdGhlIGNsb2NrIGludGVycnVwdCBpbiB0
aGUgaGFyZHdhcmUsIGFkZCBpdCB0byB0aGUgcGVuZGluZwppbnRlcnJ1cHRzIG1hc2ssIGFuZCB3
ZSByZXR1cm4gdG8gdGhlIHN0YXRjbG9jayBpbnRlcnJ1cHQuCgpUaGUgc3RhdGNsb2NrIGludGVy
cnVwdCBmaW5pc2hlcywgcmV0dXJucyB0byB0aGUgaW50ZXJydXB0IGRpc3BhdGNoLCBpdAp1bm1h
c2tzIHN0YXRjbG9jayBpbiB0aGUgaGFyZHdhcmUsIGl0IHRoZW4gcmVzZXRzIHRoZSBjdXJyZW50
X3NwbF9sZXZlbAp0byBwY3BsLCBjaGVja3MgZm9yIHNvZnQgaW50ZXJydXB0cywgYXNzdW1pbmcg
dGhlcmUgYXJlIG5vbmUsIHdlIHRoZW4KcmV0dXJuIHRvIHdoYXRldmVyIHdlIHdlcmUgZG9pbmcu
CgpOb3RlIHRoYXQgb24gZXhpdCBmcm9tIHRoZSBkaXNwYXRjaCB3ZSBkb24ndCBjaGVjayBmb3Ig
cGVuZGluZwppbnRlcnJ1cHRzIGFuZCB1bm1hc2sgdGhlbS4KClRoZSBlYXNpZXN0IHdheSBJIGNh
biBzZWUgdG8gaGFuZGxlIHRoaXMgaXMgdG8gY2hhbmdlIHRoZSBjaGVja2luZyBvZgp0aGUgc29m
dGludGVycnVwdHMgYXQgYWJvdXQgbGluZSA1MTUgb2YgaTgwMzIxX2ljdS5jIHRvIHNwbHgocGNw
bCkuIApUaGF0IHdheSB5b3UgdW5tYXNrIGFueSBoYXJkd2FyZSBpbnRlcnJ1cHRzIGFuZCBkbyBz
b2Z0IGludGVycnVwdHMuCgpJIHRoaW5rIHlvdSBkb24ndCBzZWUgaXQgb24geHNjYWxlIGFzIHlv
dSBkb24ndCBoYXZlIGFueSBpbnRlcnJ1cHRzCmhpZ2hlciB0aGFuIGlwbF9jbG9jaywgc28gSSdt
IGd1ZXNzaW5nIHRoYXQgYW55IHNwbHggdGhhdCBzb2Z0CmludGVycnVwdHMgZG8gcmVlbmFibGVz
IHRoZSBoYXJkd2FyZSBtYXNrcywgd2hpY2ggcHJvYmFibHkgbWVhbnMgeW91CmRvbid0IHNlZSBh
bnkgcHJvYmxlbXMsIHdvcnNlIGNhc2UgZm9yIHlvdSBpcyBwcm9iYWJseSB3YWl0aW5nIGZvciB0
aGUKbmV4dCBjbG9jayBpbnRlcnJ1cHQgdG8gaGFwcGVuLCBhbmQgY2F1c2UgYSBzb2Z0Y2xvY2su
CgpOb3RlIEknZCBoYXBwaWx5IGJlIHByb3ZlZCB3cm9uZywgYnV0IGRvaW5nIGEgc2ltaWxpYXIg
Y2hhbmdlIHRvIHRoZSBuZXcKZm9vdGJyaWRnZSBpbnRlcnJ1cHQgaGFzIGN1cmVkIHRoZSBsYXN0
IGlzc3VlIEkgd2FzIGhhdmluZyB3aXRoIGl0Cih3aGVyZSBJIHdhcyBzZWVpbmcgcGVuZGluZyBj
bG9jayBpbnRlcnJ1cHRzIGJ1dCB0aGUgc3lzdGVtIHdhcyBoYW5kbGluZwp0aGVtIHVudGlsIEkg
Y2F1c2VkIGFub3RoZXIgaW50ZXJydXB0LCBlZyBuZXR3b3JrLi4uKQoKQ2hlZXJzLApDaHJpcwo=
--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00--