Re: Ultrasparc III+ kernel panic

To: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
Subject: Re: Ultrasparc III+ kernel panic
From: Eduardo Horvath <eeh%NetBSD.org@localhost>
Date: Wed, 25 Feb 2015 16:13:06 +0000 (UTC)

On Wed, 25 Feb 2015, BERTRAND Joël wrote:

> BERTRAND Joël a écrit :
> > Eduardo Horvath a écrit :
> > > On Tue, 24 Feb 2015, BERTRAND Joël wrote:
> > > 
> > > > matthew green a écrit :
> > > > > > Hm.  From what I remember, f000xxxx is inside OBP.
> > > > > 
> > > > > that's correct :-)
> > > > > 
> > > > > > Instead of randomly swapping out hardware you really should try to
> > > > > > diagnose the problem.  I'd turn on ddb and traptrace in the kernel
> > > > > > and
> > > > > > examine the contents of the traptrace buffer after the panic.  That
> > > > > > should
> > > > > > tell us the sequence of traps that caused the panic.
> > > > > 
> > > > > FWIW, traptrace never was updated for SMP.
> > > > > 
> > > > 
> > > >     Will there a hope to quickly have a fix to obtain traptrace in
> > > > syslog
> > > > ? I'm trying to reproduce this bug on Blade 2000 I have at home
> > > > without any
> > > > success.
> > > 
> > > Putting traptrace back in is not trivial.  It basically involves taking
> > > all of the traptrace code that was removed in locore.s version 1.214,
> > > enhancing it for SMP, and reinserting it into locore.s.  How good are
> > > your
> > > SPARC assembly language skills?
> > 
> >      I haven't written sparc assembly for a very long time (and only on
> > sparc32...) :-(
> > 
> >      I can try to do something, but I'm not sure I have required
> > knowledge to do that without help.
> > 
> >      Best regards,
> > 
> >      JKB
> 
> 	Another one :
> 
> Feb 25 13:03:33 legendre /netbsd: trap type 0x34: cpu 0,
> pc=f0008380text_access_fault: pc=5ac99cd8 va=5ac98000
> Feb 25 13:03:33 legendre /netbsd: npc=f0008384
> pstate=0xffffffff88820006<PRIV,IE>
> Feb 25 13:03:33 legendre /netbsd: Skipping crash dump on recursive panic
> Feb 25 13:03:33 legendre /netbsd: panic: kernel fault
> Feb 25 13:03:33 legendre /netbsd: cpu1: Begin traceback...
> Feb 25 13:03:33 legendre /netbsd: cpu1: End traceback...
> Feb 25 13:03:33 legendre /netbsd: cpu0: shutting down
> Feb 25 13:03:33 legendre /netbsd: cpu1: rebooting
> Feb 25 13:03:33 legendre /netbsd:
> 
> 	If I remember, trap 34 is triggered when kernel tries to access to
> unaligned memory. I have found on mailing list archive some  messages about
> trap 34 in ipfilter and I use on system that often panics ipfilter :

Yes:

#define T_ALIGN         0x034   /* (10) address not properly aligned */

Here's the code in trap.c that's faulting:

                        {
                                char sb[sizeof(PSTATE_BITS) + 64];

                                printf("trap type 0x%x: cpu %d, pc=%lx",
                                       type, cpu_number(), pc);
                                snprintb(sb, sizeof(sb), PSTATE_BITS, 
pstate);
                                printf(" npc=%lx pstate=%s\n",
                                       (long)tf->tf_npc, sb);
                                DEBUGGER(type, tf);
                                panic("%s", type < N_TRAP_TYPES ? 
trap_type[type] : T);
                        }

The first printf succeeds, but the snprintb() does not and somehow it 
generates a text access fault... unless the text fault is on another CPU.  
You might want to add the CPU# to the text_access_fault spew.

Eduardo

References:
- re: Ultrasparc III+ kernel panic
  - From: matthew green
- Re: Ultrasparc III+ kernel panic
  - From: BERTRAND Joël
- Re: Ultrasparc III+ kernel panic
  - From: Eduardo Horvath
- Re: Ultrasparc III+ kernel panic
  - From: BERTRAND Joël
- Re: Ultrasparc III+ kernel panic
  - From: BERTRAND Joël

Prev by Date: Re: Ultrasparc III+ kernel panic
Next by Date: Re: Ultrasparc III+ kernel panic
Previous by Thread: Re: Ultrasparc III+ kernel panic
Next by Thread: Re: Ultrasparc III+ kernel panic
Indexes:

Home | Main Index | Thread Index | Old Index