Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RED State Exception on E3500



On Fri, 19 Aug 2011, Julian Coleman wrote:

> Hi,
> 
> > I expect this could be due to either bad hardware or somehow the 
> > incorrect values are getting into the instruction cache.
> > 
> > If you want to determine for sure whether it's a hardware problem or a 
> > software problem, you can add a little loop like the one in blast_icache 
> > to clear out the instruction cache just before the RESTORE instruction.  
> > Register scheduling may be an issue there, but traps are already disabled 
> > so the code should be simpler.
> 
> Thanks for the explanation and patch.  Unfortunately, running with it made
> no difference - another RED State Exception on CPU UPAID 6.  I ran `cpuctl
> offline 0 6` (the 2 CPU's that I'd seen the exceptions on, but that crashed
> with:
> 
>   trap type 0x10: cpu 4, pc=112ad60
>   trap type 0x10: cpu 2, pc=112ad60 npc=112ad64 pstate=0x44820006<PRIV,IE>
>   trap type 0x10: cpu 7, pc=112ad60 npc=112ad64 pstate=0x44820006<PRIV,IE>
>   trap type 0x10: cpu 1, pc=112ad60kernel trap 10: illegal instruction
> 
> (0x112ad60 is the start of sysctl_kern_arnd() in src/sys/kern/init_sysctl.c.)
> 
> I'll try swapping out boards and see if I can get it stable.
> 
> Thanks again,

Hm.

The kernel originally used one locked 4MB TLB entry to map in kernel text 
and another one to map in kernel data.  Since then I know the kernel data 
segment has increased beyond 4MB and the text may have as well.  Also the 
code to lock the TLB entries has moved from the kernel to ofwboot.

Anyway, you should try dumping the ITLB of each CPU shortly after the 
machine boots so you can record which slots are the locked kernel entries, 
and then do the same after the RED occurs to see if some of those entries 
have been overwritten.  You should be able to do that both from ddb and 
OBP.

Eduardo



Home | Main Index | Thread Index | Old Index