Subject: Re: port-shark/22355 [was: Help needed to fix NetBSD/shark]
To: Chris Gilbert <chris@dokein.co.uk>
From: Julio M. Merino Vidal <jmmv84@gmail.com>
List: port-arm
Date: 08/04/2007 13:20:54
On 04/08/2007, at 12:50, Chris Gilbert wrote:

> Julio M. Merino Vidal wrote:
>> Hi,
>>
>> Based on my limited understanding of ARM assembly (as in "just  
>> learned
>> the basics yesterday") and after countless hours of crappy  
>> debugging, I
>> think I have found THE^Wa bug in the isa_irq.S file.  With my change
>> (see below) the machine seems to work fine, but I have also made  
>> it work
>> in so many different and flawed ways (see the beginning of this  
>> thread
>> or the contents of the PR) that I'm unsure if this is correct or not.
>>
>> The thing is that the file contains this loop:
>>
>> Lfind_highest_ipl:
>>     ldr    r2, [r7, r9, lsl #2]
>>     tst    r8, r2
>>     subeq    r9, r9, #1
>>     beq    Lfind_highest_ipl
>
> I think what you're missing is that this code looks for the first
> IPL/SPL where an interrupt is enabled, so it starts at the top and  
> works
> downwards.  So the clock, which is masked at SPL_CLOCK will have the
> interrupt line clear in SPL_CLOCK and above.  Only when the code  
> reaches
> SPL_AUDIO will tst not find it masked, and so r9 will be SPL_AUDIO on
> exit from that code.

So, in order to prevent the code after the loop accessing spl_masks 
[_SPL_LEVELS], spl_masks[_SPL_LEVELS - 1] has to be 0 so that the tst  
always sets the Z bit, right?  Otherwise it'd not do the sub and  
reincrementing r9 later on could make the code access an invalid  
array position.

> Your change means that the interrupt is masked (due to it being  
> added to
> disabled_mask) but the spl isn't at the correct level, eg IPL_BIO  
> stuff
> will be running at IPL_NONE :)

Too good to be true :P

>> AIUI, this locates the highest IPL at which the received IRQs have  
>> to be
>> served.  After the beq, r9 contains the number of this IPL, and r2
>> contains the spl_mask for that level.
>
> when it hangs are you able to print the contents of:
> i8259_mask
> spl_mask
> current_mask
> disabled_mask
> current_spl_level
> current_intr_depth
>
> As I think they might help track down what's masked out and where.  My
> feeling is that the clock interrupt is being left masked out by some
> code path somewhere, and not being re-enabled.
>
> Given the speed it happens inserting a printf call at exit from the
> handler with the current_spl_level may reveal if it's exitting with  
> the
> spl_level correctly reset.

I was able to print the values of, e.g. current_spl_level at all  
places where it is modified.  And I'm fairly sure this (the current  
code) is correct.  When the machine gets locked, the last SPL value I  
see is 0, so everything should be enabled and working...

I can try again to get the values of all these variables and not just  
the SPL level.

Thanks,

-- 
Julio M. Merino Vidal <jmmv84@gmail.com>