Port-sparc64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: How to get a crash dump with recursive panic?
On Tue, 10 Jun 2014, Darren Reed wrote:
> On 10/06/2014 1:45 AM, Eduardo Horvath wrote:
> > On Mon, 9 Jun 2014, Darren Reed wrote:
> >
> > > In testing out ipfilter on sparc64, I see a bunch of "Alignment error"
> > > messages like these:
> > >
> > > Alignment error: pid=24522.1 comm=ipfstat dsfsr=00000000:00800001
> > > dsfar=ffffffff:fea0c252 isfsr=00000000:00808000 pc=10e3b0
> > > Alignment error: pid=22537.1 comm=ipfstat dsfsr=00000000:00800001
> > > dsfar=ffffffff:fea02252 isfsr=00000000:00808000 pc=10e3b0
> > > Alignment error: pid=6845.1 comm=ipfstat dsfsr=00000000:00800001
> > > dsfar=ffffffff:fea02252 isfsr=00000000:00808000 pc=10e3b0
> > >
> > > Followed by a panic like this:
> > >
> > > trap type 0x34: cpu 0, pc=109faac npc=109fab0 pstate=0x820006<PRIV,IE>
> > > Skipping crash dump on recursive panic
> > > panic: mem address not aligned
> > > cpu0: Begin traceback...
> > > cpu0: End traceback...
> > > cpu1: shutting down
> > > cpu0: rebooting
> > >
> > > All that I can do is:
> > > (gdb) x/i 0x109faac
> > > 0x109faac <ipf_fixskip+44>: ldx [ %g4 + 0x20 ], %g4
> > >
> > > Further tips anyone?
> > What's the previous panic look like? (I wonder if we have an SMP bug in
> > vpanic()...)
>
> How do I find it?
The "Skipping crash dump on recursive panic" implies there should have
been a panic before the "panic: mem address not aligned".
vpanic() uses the global variable doing_shutdown to indicate a panic is in
progress. It doesn't look like that variable is protected by a lock, so
if multiple CPUs are panicing at the same time maybe vpanic() can get
confused and assume they are all recursive panics. Not that it really
matters....
> As this is from the serial console, I'm assuming that if it never
> gets printed on the console then it never gets printed anywhere.
>
>
> >
> > Trap type 0x34 is an alignment trap. The instruction in question is
> > trying to load an 8-byte integer pointed to by %g4+0x20 into %g4. You can
> > enable DDB and dump the registers to find the contents of %g4. That
> > should not be 8-byte aligned.
> >
> > Beyond that it's a question of debugging the ipfilter code.
> >
> > That ipfstat is getting unaligned accesses implies some data structure is
> > unaligned. You can slap gdb on it to find out what, or you can break into
> > DDB and set the TDB_STOPSIG bit in trapdebug to have the kernel break into
> > DDB on each unaligned access and debug it from there.
>
> Yes - are those messages from the user space code running or kernel space?
Those messages are printed by the kernel if DEBUG is defined when a
userland process generates an unaligned access exception. What usually
happens after that is a SIGBUS is posted to the process and the process
dies. If the process has the MDP_FIXALIGN flag set, the kernel will
attempt to emulate the instruction instead. (I don't think there's any
code that actually sets that flag, but if it is ever set and you start
getting lots of unaligned accesses you definitely want to know about it
'cause emulating instructions will cause major performance degradation.)
Anyway, the simple answer is that all those initial messages are being
generated because the "ipfstat" process is attempting to issue an
unaligned memory access in userland. In this case the contents of the
DSFSR should give details about the type of misalignment and the DSFAR has
contains the faulting address. The address always appears to be
0xfffffffffea02252 (or 0xfea02252 if you're running in 32-bit mode) which
is 16-bit aligned. The DSFSR should indicate whether it was a 32-bit or
64-bit load or store, it's just a question of looking up the register's
bit definitions.
Does that help?
Eduardo
Home |
Main Index |
Thread Index |
Old Index