Re: amd64-7.0_STABLE trap panic

To: "John D. Baker" <jdbaker%mylinuxisp.com@localhost>
Subject: Re: amd64-7.0_STABLE trap panic
From: Paul Goyette <paul%vps1.whooppee.com@localhost>
Date: Sun, 8 Nov 2015 13:11:49 +0800 (PHT)

On Sat, 7 Nov 2015, John D. Baker wrote:

It happened again.  This time, the crashdump shows:

$ crash -M netbsd.3.core -N netbsd.3
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
ufsquota_free() at ufsquota_free+0x15
ufs_reclaim() at ufs_reclaim+0xaf
ffs_reclaim() at ffs_reclaim+0xa1
VOP_RECLAIM() at VOP_RECLAIM+0x2f
vclean() at vclean+0xa6
cleanvnode() at cleanvnode+0xb8
vdrain_thread() at vdrain_thread+0x58
crash>

At the time, it was performing a CVS update to pick up the latest
pull-ups to the netbsd-7 branch while an NFS client was writing to
my home directory with 'scp' copying files from a remote system.

Up through 7.0 (release), I'd never had a problem with the machine.  I'm
reluctant to suspect hardware problems...

Interesting - the stack traceback diverges from your previous report,after the entry for startlwp.


For amd64, this routine is located in sys/arch/amd64/amd64/trap.c and
starts with

void
startlwp(void *arg)
{
        ucontext_t *uc = arg;
        lwp_t *l = curlwp;
        int error __diagused;

        error = cpu_setmcontext(l, &uc->uc_mcontext, uc->uc_flags);
        KASSERT(error == 0);
...

And the machine code at this point looks like:

Dump of assembler code for function startlwp:
   0xffffffff8011b1d7 <+0>:     push   %rbp
   0xffffffff8011b1d8 <+1>:     mov    %rsp,%rbp
   0xffffffff8011b1db <+4>:     push   %r12
   0xffffffff8011b1dd <+6>:     push   %rbx
   0xffffffff8011b1de <+7>:     mov    %rdi,%r12
   0xffffffff8011b1e1 <+10>:    mov    %gs:0x1e8,%rbx
   0xffffffff8011b1ea <+19>:    lea    0x38(%rdi),%rsi
   0xffffffff8011b1ee <+23>:    mov    (%rdi),%edx
   0xffffffff8011b1f0 <+25>:    mov    %rbx,%rdi
   0xffffffff8011b1f3 <+28>:    callq  0xffffffff80119d78 <cpu_setmcontext>
   0xffffffff8011b1f8 <+33>:    test   %eax,%eax

It might be useful if you could use gdb on the crash dump.  Use the
bt command to figure out which frame is for startlwp, then

(gdb) frame <n>
(gdb) info reg

I'm guessing that %rdi is pointing somewhere invalid, and the 'mov(%rdi),%edx' is triggering the fault. (This is probably the reference

to uc->uc_flags)

Now, as for why this is broken, I have no idea.  :(




+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+

Follow-Ups:
- Re: amd64-7.0_STABLE trap panic
  - From: John D. Baker

References:
- amd64-7.0_STABLE trap panic
  - From: John D. Baker
- Re: amd64-7.0_STABLE trap panic
  - From: Paul Goyette
- Re: amd64-7.0_STABLE trap panic
  - From: John D. Baker

Prev by Date: Re: amd64-7.0_STABLE trap panic
Next by Date: Re: amd64-7.0_STABLE trap panic
Previous by Thread: Re: amd64-7.0_STABLE trap panic
Next by Thread: Re: amd64-7.0_STABLE trap panic
Indexes:

Home | Main Index | Thread Index | Old Index