Subject: Re: Simple way to panic arm kernels
To: Richard Earnshaw <Richard.Earnshaw@arm.com>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 04/27/2004 00:34:37
On Mon, 26 Apr 2004 10:44:56 +0100
Richard Earnshaw <Richard.Earnshaw@arm.com> wrote:

> On Mon, 2004-04-26 at 10:42, Steve Woodford wrote:
> > On Monday 26 April 2004 10:17 am, Richard Earnshaw wrote:
> > > On Sun, 2004-04-25 at 21:50, Richard Earnshaw wrote:
> > 
> > > > I've finally managed to track this down to revision 1.8 of
> > > > arch/arm/include/arm32/frame.h
> > 
> > > A 'current' kernel with frame.h backed off to revision 1.7 ran the
> > > above test for 9 hours last night without panicing.  That's about
> > > 9 times longer than I've ever seen a broken kernel manage.
> > 
> > Thanks for tracking this down. I'll look into the problem RSN.
> 
> I think I've spotted the difference in behaviour.
> 
> The change is in the case where we are NOT returning to user mode.  In
> that case the new sequence completes with IRQs preserved in the new
> code, but blocked in the old.

Hmm, I looked over the code last night and didn't notice that.

I guess that if an IRQ happens while we're exiting we risk the spsr_irq
containing the wrong thing, IE if the following is run with irq's
enabled from while in irq mode:

#define PULLFRAMEFROMSVCANDEXIT						   \
        ldr     r0, [sp], #0x0004;	/* Get the SPSR from stack */	   \
        msr     spsr_all, r0;		/* restore SPSR */		   \
        ldmia   sp, {r0-r14}^;		/* Restore registers (usr mode) */ \
        mov     r0, r0;	  		/* NOP for previous instruction */ \
	add	sp, sp, #(4*15);	/* Adjust the stack pointer */	   \
	ldmia	sp, {sp, lr, pc}^	/* Restore lr and exit */

if after the msr sprs_all, r0,  we get an interrupt, the current cprs
(which indicates we're in irq mode) is saved into sprs_irq, overwriting
the value just loaded into it.  When that interrupt returns, we end up
continuing with the wrong value in sprs_irq.  Which could mean we exit
the irq handler in the wrong mode, with the net effect we run code with
the wrong sp and lr registers?

Does the above sound feasible?  Could this be what's happening?  or am I
just barking up the wrong tree?

Paranoia makes me wonder if we shouldn't make PULLFRAMEFROMSVCANDEXIT
disable irq's to protect itself?  or should we fix
DO_AST_AND_RESTORE_ALIGNMENT_FAULTS to keep the old side-effect of irq's
being disabled?

Chris