Subject: Re: port-shark/22355 [was: Help needed to fix NetBSD/shark]
To: Julio M. Merino Vidal <jmmv84@gmail.com>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 08/04/2007 11:50:31
Julio M. Merino Vidal wrote:
> Hi,
> 
> Based on my limited understanding of ARM assembly (as in "just learned
> the basics yesterday") and after countless hours of crappy debugging, I
> think I have found THE^Wa bug in the isa_irq.S file.  With my change
> (see below) the machine seems to work fine, but I have also made it work
> in so many different and flawed ways (see the beginning of this thread
> or the contents of the PR) that I'm unsure if this is correct or not.
> 
> The thing is that the file contains this loop:
> 
> Lfind_highest_ipl:
>     ldr    r2, [r7, r9, lsl #2]
>     tst    r8, r2
>     subeq    r9, r9, #1
>     beq    Lfind_highest_ipl

I think what you're missing is that this code looks for the first
IPL/SPL where an interrupt is enabled, so it starts at the top and works
downwards.  So the clock, which is masked at SPL_CLOCK will have the
interrupt line clear in SPL_CLOCK and above.  Only when the code reaches
SPL_AUDIO will tst not find it masked, and so r9 will be SPL_AUDIO on
exit from that code.

Your change means that the interrupt is masked (due to it being added to
disabled_mask) but the spl isn't at the correct level, eg IPL_BIO stuff
will be running at IPL_NONE :)

> AIUI, this locates the highest IPL at which the received IRQs have to be
> served.  After the beq, r9 contains the number of this IPL, and r2
> contains the spl_mask for that level.

when it hangs are you able to print the contents of:
i8259_mask
spl_mask
current_mask
disabled_mask
current_spl_level
current_intr_depth

As I think they might help track down what's masked out and where.  My
feeling is that the clock interrupt is being left masked out by some
code path somewhere, and not being re-enabled.

Given the speed it happens inserting a printf call at exit from the
handler with the current_spl_level may reveal if it's exitting with the
spl_level correctly reset.

If you have the time you might want to look at the ppc rework of
interrupt handling that's been happening in the ppcoea-renovation
branch, as it's trying to improve the handling of 8259s on ppc.

Thanks,
Chris