Subject: Re: multiprocessor i386 1.6ZH system crash (also for SPARC64/1.6ZJ)
To: NetBSD Current <current-users@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: current-users
Date: 02/16/2004 12:39:23
On Sun, Feb 15, 2004 at 07:59:42PM +0100, Hauke Fath wrote:
> Um 19:24 Uhr +010015.2.2004 schrieb Timo Schoeler:
> >today my Sun Ultra 2 (Model 2300, 640MByte, second hme/scsi sbus card,
> >qe card) crashed when cleaning the pkgsrc/lang tree and bzipping a file
> >simultaneously -- unreachable via IP (didn't respond to any ICMP
> >requests), had to connect via console...
> >
> >running 1.6ZJ from yesterday evening.
> 
> Attempting a "make clean" in pkgsrc with a 1.6ZK kernel built last night
> (SPARCstation 10, 2xSM71, 1.6.2 userland), I get
> 
> trap type 0x7: pc=0xf01f4880 npc=0xf01f4884 psr=408000c1<S,PS>
> xcall(cpu1,0xf0007974): couldn't ping cpus:kernel: alignment fault trap
> xcall(cpu1,0xf01ef940): couldn't ping cpus:Stopped at
> netbsd:pmap_deactivate+0x18:    ld  [%g1 + %g0], %i0
> 
> db{0}> t
> pmap_deactivate(0xf3fd8670, 0x400006, 0x6, 0xf0237308, 0x554, 0xf02c5e3c)
> at netbsd:exit1+0x6fc
> exit1(0xf3fd8670, 0x0, 0x1, 0x0, 0xf506af28, 0x117bc) at netbsd:sys_exit+0x30
> sys_exit(0x0, 0xf506af28, 0xf506af20, 0x0, 0xf506af28, 0x116e0) at
> netbsd:syscall+0x1c8
> syscall(0x1, 0xf506afb0, 0x100fe084, 0xf506afb0, 0x400, 0xf506af28) at
> 0xf0006524
> db{0}> c
> xcall(cpu0,0xf0007974): couldn't ping cpus: cpu1xcall(cpu0,0xf0007974):
> couldn't ping cpus: cpu1xcall(cpu0,0xf0007974): couldn't ping cpus:
> cpu1panic: xcall(cpu1,0xf01ef940): couldn't ping cpus:alignment
> faultxcall(cpu1,0xf01ef940): couldn't ping cpus:
> 
> and then the machine locked up hard, and had to be power-cycled.
>
> ...

Same here, SPARCstation-20L, 2xTMS390Z55@85MHz (compiled DEBUG/LOCKDEBUG):

sys/arch/sparc/sparc/pmap.c:

  7597  pmap_deactivate(l)
  7598          struct lwp *l;
  7599  {
  7600  #if defined(MULTIPROCESSOR)
  7601          pmap_t pm;
  7602          struct proc *p;
  7603  
  7604          p = l->l_proc;
  7605          if (p->p_vmspace &&
  7606              (pm = p->p_vmspace->vm_map.pmap) != pmap_kernel()) {

Here "p->p_vmspace" is "0xdeadbeef". Looks like a locking problem where the
p_vmspace is already deallocated.

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)