On Fri, Oct 24, 2008 at 08:54:22PM +0200, Manuel Bouyer wrote:
> Hi,
> I've got a -current NetBSD/sparc (sun4U system) system hang, reliably,
> when building pkgsrc/python24 on a 4.0 userland. Here's what ddb says:
> [BREAK]
> Stopped in pid 7114.1 (conftest) at netbsd:cpu_Debugger+0x4: nop
> db> tr
> sparc_interrupt(b945a30, cc7d700, cc7d700, 0, ded7fc0, ded1f80) at
> netbsd:sparc_interrupt+0x1f0
> sys_sa_enable(cc7d700, d4bbdd0, d4bbe10, 1, 0, 1c14000) at
> netbsd:sys_sa_enable+0xd8
> syscall_plain(d4bbed0, cc52d20, 400f4f34, d4bbdd0, 400, 400f4f38) at
> netbsd:syscall_plain+0x318
> ?(ffe00034, ffffb744, c, 3, ffffb7f7, ffffb7f8) at 0x1008c74
> db>
>
> the stack trace is always the same.
>
> dissasembly:
>
> netbsd:sparc_interrupt+0x1c8: subcc %l7, %l2, %g0
> netbsd:sparc_interrupt+0x1cc: bne,pn netbsd:sparc_interrupt+0x1ac
> netbsd:sparc_interrupt+0x1d0: add %sp, 0xb0, %o2
> netbsd:sparc_interrupt+0x1d4: ld [%l2 + 0x18], %l7
> netbsd:sparc_interrupt+0x1d8: membar 0x4l
> netbsd:sparc_interrupt+0x1dc: st %g0, [%l2 + 0x18]
> netbsd:sparc_interrupt+0x1e0: membar 0x4l
> netbsd:sparc_interrupt+0x1e4: ld [%l2 + 0x0], %o4
> netbsd:sparc_interrupt+0x1e8: ld [%l2 + 0x4], %o0
> netbsd:sparc_interrupt+0x1ec: wrpr %g0, 0xe, %pstate
> netbsd:sparc_interrupt+0x1f0: jmpl [%o4 + %g0], %o7
> netbsd:sparc_interrupt+0x1f4: movrz %o0, %o2, %o0
> netbsd:sparc_interrupt+0x1f8: wrpr %g0, 0xc, %pstate
> netbsd:sparc_interrupt+0x1fc: ld [%l2 + 0x20], %l1
> netbsd:sparc_interrupt+0x200: membar 0x4l
> netbsd:sparc_interrupt+0x204: brz,pn %l1,
> netbsd:sparc_interrupt+0x214
> netbsd:sparc_interrupt+0x208: add %l5, %o0, %l5
> netbsd:sparc_interrupt+0x20c: stx %g0, [%l1 + %g0]
> netbsd:sparc_interrupt+0x210: membar 0x4l
> netbsd:sparc_interrupt+0x214: subcc %l7, -0x1, %g0
> netbsd:sparc_interrupt+0x218: bne,pn netbsd:sparc_interrupt+0x1d0
>
>
>
> netbsd:sys_sa_enable+0xac: ld [%i0 + 0x10], %g1
> netbsd:sys_sa_enable+0xb0: or %g0, %l0, %o1
> netbsd:sys_sa_enable+0xb4: subcc %l0, %g1, %g0
> netbsd:sys_sa_enable+0xb8: bne,pn netbsd:sys_sa_enable+0xfc
> netbsd:sys_sa_enable+0xbc: or %g0, %i0, %o0
> netbsd:sys_sa_enable+0xc0: ld [%i0 + 0x24], %g1
> netbsd:sys_sa_enable+0xc4: ld [%i0 + 0x10], %o0
> netbsd:sys_sa_enable+0xc8: or %g1, 0x400, %g1
> netbsd:sys_sa_enable+0xcc: st %g1, [%i0 + 0x24]
> netbsd:sys_sa_enable+0xd0: call netbsd:mutex_vector_exit
> netbsd:sys_sa_enable+0xd4: or %g0, %l2, %i0
> netbsd:sys_sa_enable+0xd8: call netbsd:mutex_exit
> netbsd:sys_sa_enable+0xdc: ld [%l1 + 0xc], %o0
> netbsd:sys_sa_enable+0xe0: return [%i7 + 0x8]
> netbsd:sys_sa_enable+0xe4: nop
>
> any idea ?
gdb netbsd.gdb
list *(sys_sa_enable+0xd8)
list *(sparc_interrupt+0x1f0)
Looking at it, it sure looks like the call to mutex_exit. The C code is:
mutex_enter(p->p_lock);
vp->savp_lwp = l;
p->p_sflag |= PS_SA;
lwp_lock(l);
l->l_flag |= LW_SA; /* We are now an activation LWP */
lwp_unlock(l);
mutex_exit(p->p_lock);
Looking at the other code in sys_sa_enable(), I think all the locking is
fine. While I haven't tried a recent (last 2 days) kernel, this was
working fine on i386.
Try a kernel with LOCKDEBUG?
Take care,
Bill
Attachment:
pgpLmlr57Xldx.pgp
Description: PGP signature