Subject: Re: port-shark/22355 [was: Help needed to fix NetBSD/shark]
To: Julio M. Merino Vidal <jmmv84@gmail.com>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 08/04/2007 12:30:46
Julio M. Merino Vidal wrote:
> On 04/08/2007, at 12:50, Chris Gilbert wrote:
> 
>> Julio M. Merino Vidal wrote:
>>> Hi,
>>>
>>> Based on my limited understanding of ARM assembly (as in "just learned
>>> the basics yesterday") and after countless hours of crappy debugging, I
>>> think I have found THE^Wa bug in the isa_irq.S file.  With my change
>>> (see below) the machine seems to work fine, but I have also made it work
>>> in so many different and flawed ways (see the beginning of this thread
>>> or the contents of the PR) that I'm unsure if this is correct or not.
>>>
>>> The thing is that the file contains this loop:
>>>
>>> Lfind_highest_ipl:
>>>     ldr    r2, [r7, r9, lsl #2]
>>>     tst    r8, r2
>>>     subeq    r9, r9, #1
>>>     beq    Lfind_highest_ipl
>>
>> I think what you're missing is that this code looks for the first
>> IPL/SPL where an interrupt is enabled, so it starts at the top and works
>> downwards.  So the clock, which is masked at SPL_CLOCK will have the
>> interrupt line clear in SPL_CLOCK and above.  Only when the code reaches
>> SPL_AUDIO will tst not find it masked, and so r9 will be SPL_AUDIO on
>> exit from that code.
> 
> So, in order to prevent the code after the loop accessing
> spl_masks[_SPL_LEVELS], spl_masks[_SPL_LEVELS - 1] has to be 0 so that
> the tst always sets the Z bit, right?  Otherwise it'd not do the sub and
> reincrementing r9 later on could make the code access an invalid array
> position.

More or less, we're looping downwards, so spl_masks[0] = 0xffffffff so
that everything will match on it.  And to handle SPL_SERIAL, you'll
notice that spl_masks[SPL_LEVELS] is 0.

This is setup in arm/arm32/intr.c in set_spl_masks.

>>> AIUI, this locates the highest IPL at which the received IRQs have to be
>>> served.  After the beq, r9 contains the number of this IPL, and r2
>>> contains the spl_mask for that level.
>>
>> when it hangs are you able to print the contents of:
>> i8259_mask
>> spl_mask
>> current_mask
>> disabled_mask
>> current_spl_level
>> current_intr_depth
>>
>> As I think they might help track down what's masked out and where.  My
>> feeling is that the clock interrupt is being left masked out by some
>> code path somewhere, and not being re-enabled.
>>
>> Given the speed it happens inserting a printf call at exit from the
>> handler with the current_spl_level may reveal if it's exitting with the
>> spl_level correctly reset.
> 
> I was able to print the values of, e.g. current_spl_level at all places
> where it is modified.  And I'm fairly sure this (the current code) is
> correct.  When the machine gets locked, the last SPL value I see is 0,
> so everything should be enabled and working...
> 
> I can try again to get the values of all these variables and not just
> the SPL level.

It might be worthwhile, as I suspect that something isn't being unmasked
correctly somewhere.  Although in theory the interrupt code should leave
 the masks in the states they were found.

It half crossed my mind to suggest you change out the messing with
disabled_mask and make calls to raisespl and splx, as the softclock code
 uses the spl routines for masking, so perhaps it's adjusting something
that's missed in the irq handler.

Thanks,
Chris