Subject: Re: Race condition in generic soft interrupts code?
To: None <Richard.Earnshaw@arm.com>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 07/14/2003 23:47:58
This is a multi-part message in MIME format.

--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Mon, 14 Jul 2003 22:50:08 +0100
Richard Earnshaw <rearnsha@arm.com> wrote:

> I'm posting this here because I believe this problem may affect (at
> least in theory) all platforms based on Jason's generic
> soft-interrupts implementation for the ARM.
> 
> In trying to port the generic soft-ints code to the Integrator board
> I've been running into a race condition that can lead to particular
> interrupts becoming permanently blocked, even though the SPL level is
> 0 and the processor is sitting in the idle loop.  The main symptom is
> that XXX_ipending has a bit set, and the corresponding bit in
> intr_enabled is zero, so the interrupt is permanently blocked.  Here
> is how I think this can happen:
> 
> A high-priority interrupt (high SPL) occurs (lets say it's statclock)
> 
> The code enters XXX_intr_dispatch and blocks that interrupt. Eg:
> 
> 	hwpend = XXX_iintsrc_read();
> 
> 	/*
> 	 * Disable all the interrupts that are pending.  We will
> 	 * reenable them once they are processed and not masked.
> 	 */
> 	intr_enabled &= ~hwpend;
> 	XXX_set_intrmask();
> 
> It then enters the loop below that scans over hwpend bits.  While 
> processing the statclock interrupt the SPL is effectively raised to 
> IPL_STAT_CLOCK with
> 
> 		current_spl_level |= iq->iq_mask;
> 		oldirqstate = enable_interrupts(I32_bit);
> 
> Interrupts are then enabled and the statclock handler code is called:
> 
> 		for (ih = TAILQ_FIRST(&iq->iq_list); ih != NULL;
> 		     ih = TAILQ_NEXT(ih, ih_list)) {
> 			(void) (*ih->ih_func)(ih->ih_arg ? ih->ih_arg : frame);
> 		}
> 
> Now while this is being processed, another lower-priority interrupt
> occurs (say hardclock):  We again enter XXX_intr_dispatch and block
> that source of interrupt as before.  This time, however, we find that
> 	(pcpl & ibit)
> is true for hardclock (since IPL_CLOCK < IPL_STAT_CLOCK), so we fall
> into the code
> 
> 		if (pcpl & ibit) {
> 			/*
> 			 * IRQ is masked; mark it as pending and check
> 			 * the next one.  Note: the IRQ is already disabled.
> 			 */
> 			ifpga_ipending |= ibit;
> 			continue;
> 		}
> 
> and then there is nothing more to do (so we return to the previous 
> interrupt invocation.
> 
> We now complete the handling of the statclock interrupt and restore
> the IPL level, re-enable the statclock interrupt and exit the loop
> (there was only one bit set in hwpend, remember):
> 
> 		current_spl_level = pcpl;
> 
> 		/* Re-enable this interrupt now that's it's cleared. */
> 		intr_enabled |= ibit;
> 		ifpga_set_intrmask();
> 
> At no time do we check for pending interrupts that need to be handled
> at this level.
> 
> It seems that the fix for this is trivial, simply insert the following
> 
> line in the code that cleans up after running the handler list:
> 
> 		restore_interrupts(oldirqstate);
> 		current_spl_level = pcpl;
> 
> +		hwpend |= (XXX_ipending & XXX_INTR_HWMASK) & ~pcpl;
> 
> 		/* Re-enable this interrupt now that's it's cleared. */
> 		intr_enabled |= ibit;
> 		XXX_set_intrmask();
> 
> and this will cause the interrupt that was deferred to be handled in
> the current loop before returning.
> 
> This patch was certainly sufficient to correct the problem I've been 
> encountering (where the main clock interrupt was getting completely
> lost, causing the system to hang during scsi-bus probing -- no clock,
> so no timeout).

Hmmm, which platform was this?  I fixed this on cats, note the tail end
of footbridge_intr_dispatch does an splx(pcpl), I did mail jason about
this, see attached, but he probably lost it in his mountains of emails.

> Jason, does this look correct to you?  If so, then all ports derived
> from your code will need patching.

I added an splx as it felt the right thing to do, but may be less
efficient...

Chris

--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00
Content-Type: text/plain;
 name="xscale_interrupt_buglet"
Content-Disposition: attachment;
 filename="xscale_interrupt_buglet"
Content-Transfer-Encoding: base64

RGF0ZTogRnJpLCAxIE5vdiAyMDAyIDE3OjAyOjE2ICswMDAwCkZyb206IENocmlzIEdpbGJlcnQg
PGNocmlzQGRva2Vpbi5jby51az4KVG86IHRob3JwZWpAbmV0YnNkLm9yZwpTdWJqZWN0OiB4c2Nh
bGUgaW50ZXJydXB0IGJ1Z2xldApNZXNzYWdlLUlkOiA8MjAwMjExMDExNzAyMTYuMDZlNzA2NDgu
Y2hyaXNAZG9rZWluLmNvLnVrPgpYLU1haWxlcjogU3lscGhlZWQgdmVyc2lvbiAwLjguNSAoR1RL
KyAxLjIuMTA7IGkzODYtLW5ldGJzZGVsZikKTWltZS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBl
OiB0ZXh0L3BsYWluOyBjaGFyc2V0PVVTLUFTQ0lJCkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6
IDdiaXQKCkhpIEphc29uLAoKQXMgbWVudGlvbmVkIG9uIGljYiwgSSB0aGluayB0aGVyZSdzIGEg
bWlub3IgYnVnIGluIHRoZSBpbnRlcnJ1cHQgY29kZSwKdGhlIHJlYXNvbiBJIHNwb3R0ZWQgaXMg
YmVjYXVzZSBjYXRzIGhhcyBhIHNlcGVyYXRlIHN0YXRjbG9jaywgd2hpY2gKY2F1c2VkIHRoZSBj
bG9jayBpbnRlcnJ1cHQgdG8gYmUgYmxvY2tlZC4KCkFueXdheSB0aGUgZ2lzdCBpczoKd2UgdGFr
ZSBhbiBpbnRlcnJ1cHQsIHN0YXRjbG9jaywgd2UgbWFzayBvdXQgZXZlcnl0aGluZyBiZWxvdyBp
dCBpbiB0aGUKY3VycmVudF9zcGxfbGV2ZWwsIGluY2x1ZGluZyB0aGUgY2xvY2sgaW50ZXJydXB0
LgoKd2hpbGUgaGFuZGxpbmcgdGhlIHN0YXRjbG9jaywgYSBjbG9jayBpbnRlcnJ1cHQgaGFwcGVu
cywgaW4gdGhlIGRpc3BhdGNoCmNvZGUgd2UgYmxvY2sgdGhlIGNsb2NrIGludGVycnVwdCBpbiB0
aGUgaGFyZHdhcmUsIGFkZCBpdCB0byB0aGUgcGVuZGluZwppbnRlcnJ1cHRzIG1hc2ssIGFuZCB3
ZSByZXR1cm4gdG8gdGhlIHN0YXRjbG9jayBpbnRlcnJ1cHQuCgpUaGUgc3RhdGNsb2NrIGludGVy
cnVwdCBmaW5pc2hlcywgcmV0dXJucyB0byB0aGUgaW50ZXJydXB0IGRpc3BhdGNoLCBpdAp1bm1h
c2tzIHN0YXRjbG9jayBpbiB0aGUgaGFyZHdhcmUsIGl0IHRoZW4gcmVzZXRzIHRoZSBjdXJyZW50
X3NwbF9sZXZlbAp0byBwY3BsLCBjaGVja3MgZm9yIHNvZnQgaW50ZXJydXB0cywgYXNzdW1pbmcg
dGhlcmUgYXJlIG5vbmUsIHdlIHRoZW4KcmV0dXJuIHRvIHdoYXRldmVyIHdlIHdlcmUgZG9pbmcu
CgpOb3RlIHRoYXQgb24gZXhpdCBmcm9tIHRoZSBkaXNwYXRjaCB3ZSBkb24ndCBjaGVjayBmb3Ig
cGVuZGluZwppbnRlcnJ1cHRzIGFuZCB1bm1hc2sgdGhlbS4KClRoZSBlYXNpZXN0IHdheSBJIGNh
biBzZWUgdG8gaGFuZGxlIHRoaXMgaXMgdG8gY2hhbmdlIHRoZSBjaGVja2luZyBvZgp0aGUgc29m
dGludGVycnVwdHMgYXQgYWJvdXQgbGluZSA1MTUgb2YgaTgwMzIxX2ljdS5jIHRvIHNwbHgocGNw
bCkuIApUaGF0IHdheSB5b3UgdW5tYXNrIGFueSBoYXJkd2FyZSBpbnRlcnJ1cHRzIGFuZCBkbyBz
b2Z0IGludGVycnVwdHMuCgpJIHRoaW5rIHlvdSBkb24ndCBzZWUgaXQgb24geHNjYWxlIGFzIHlv
dSBkb24ndCBoYXZlIGFueSBpbnRlcnJ1cHRzCmhpZ2hlciB0aGFuIGlwbF9jbG9jaywgc28gSSdt
IGd1ZXNzaW5nIHRoYXQgYW55IHNwbHggdGhhdCBzb2Z0CmludGVycnVwdHMgZG8gcmVlbmFibGVz
IHRoZSBoYXJkd2FyZSBtYXNrcywgd2hpY2ggcHJvYmFibHkgbWVhbnMgeW91CmRvbid0IHNlZSBh
bnkgcHJvYmxlbXMsIHdvcnNlIGNhc2UgZm9yIHlvdSBpcyBwcm9iYWJseSB3YWl0aW5nIGZvciB0
aGUKbmV4dCBjbG9jayBpbnRlcnJ1cHQgdG8gaGFwcGVuLCBhbmQgY2F1c2UgYSBzb2Z0Y2xvY2su
CgpOb3RlIEknZCBoYXBwaWx5IGJlIHByb3ZlZCB3cm9uZywgYnV0IGRvaW5nIGEgc2ltaWxpYXIg
Y2hhbmdlIHRvIHRoZSBuZXcKZm9vdGJyaWRnZSBpbnRlcnJ1cHQgaGFzIGN1cmVkIHRoZSBsYXN0
IGlzc3VlIEkgd2FzIGhhdmluZyB3aXRoIGl0Cih3aGVyZSBJIHdhcyBzZWVpbmcgcGVuZGluZyBj
bG9jayBpbnRlcnJ1cHRzIGJ1dCB0aGUgc3lzdGVtIHdhcyBoYW5kbGluZwp0aGVtIHVudGlsIEkg
Y2F1c2VkIGFub3RoZXIgaW50ZXJydXB0LCBlZyBuZXR3b3JrLi4uKQoKQ2hlZXJzLApDaHJpcwo=

--Multipart_Mon__14_Jul_2003_23:47:58_+0100_08215a00--