On Fri, Oct 24, 2008 at 08:54:22PM +0200, Manuel Bouyer wrote: > Hi, > I've got a -current NetBSD/sparc (sun4U system) system hang, reliably, > when building pkgsrc/python24 on a 4.0 userland. Here's what ddb says: > [BREAK] > Stopped in pid 7114.1 (conftest) at netbsd:cpu_Debugger+0x4: nop > db> tr > sparc_interrupt(b945a30, cc7d700, cc7d700, 0, ded7fc0, ded1f80) at > netbsd:sparc_interrupt+0x1f0 > sys_sa_enable(cc7d700, d4bbdd0, d4bbe10, 1, 0, 1c14000) at > netbsd:sys_sa_enable+0xd8 > syscall_plain(d4bbed0, cc52d20, 400f4f34, d4bbdd0, 400, 400f4f38) at > netbsd:syscall_plain+0x318 > ?(ffe00034, ffffb744, c, 3, ffffb7f7, ffffb7f8) at 0x1008c74 > db> > > the stack trace is always the same. > > dissasembly: > > netbsd:sparc_interrupt+0x1c8: subcc %l7, %l2, %g0 > netbsd:sparc_interrupt+0x1cc: bne,pn netbsd:sparc_interrupt+0x1ac > netbsd:sparc_interrupt+0x1d0: add %sp, 0xb0, %o2 > netbsd:sparc_interrupt+0x1d4: ld [%l2 + 0x18], %l7 > netbsd:sparc_interrupt+0x1d8: membar 0x4l > netbsd:sparc_interrupt+0x1dc: st %g0, [%l2 + 0x18] > netbsd:sparc_interrupt+0x1e0: membar 0x4l > netbsd:sparc_interrupt+0x1e4: ld [%l2 + 0x0], %o4 > netbsd:sparc_interrupt+0x1e8: ld [%l2 + 0x4], %o0 > netbsd:sparc_interrupt+0x1ec: wrpr %g0, 0xe, %pstate > netbsd:sparc_interrupt+0x1f0: jmpl [%o4 + %g0], %o7 > netbsd:sparc_interrupt+0x1f4: movrz %o0, %o2, %o0 > netbsd:sparc_interrupt+0x1f8: wrpr %g0, 0xc, %pstate > netbsd:sparc_interrupt+0x1fc: ld [%l2 + 0x20], %l1 > netbsd:sparc_interrupt+0x200: membar 0x4l > netbsd:sparc_interrupt+0x204: brz,pn %l1, > netbsd:sparc_interrupt+0x214 > netbsd:sparc_interrupt+0x208: add %l5, %o0, %l5 > netbsd:sparc_interrupt+0x20c: stx %g0, [%l1 + %g0] > netbsd:sparc_interrupt+0x210: membar 0x4l > netbsd:sparc_interrupt+0x214: subcc %l7, -0x1, %g0 > netbsd:sparc_interrupt+0x218: bne,pn netbsd:sparc_interrupt+0x1d0 > > > > netbsd:sys_sa_enable+0xac: ld [%i0 + 0x10], %g1 > netbsd:sys_sa_enable+0xb0: or %g0, %l0, %o1 > netbsd:sys_sa_enable+0xb4: subcc %l0, %g1, %g0 > netbsd:sys_sa_enable+0xb8: bne,pn netbsd:sys_sa_enable+0xfc > netbsd:sys_sa_enable+0xbc: or %g0, %i0, %o0 > netbsd:sys_sa_enable+0xc0: ld [%i0 + 0x24], %g1 > netbsd:sys_sa_enable+0xc4: ld [%i0 + 0x10], %o0 > netbsd:sys_sa_enable+0xc8: or %g1, 0x400, %g1 > netbsd:sys_sa_enable+0xcc: st %g1, [%i0 + 0x24] > netbsd:sys_sa_enable+0xd0: call netbsd:mutex_vector_exit > netbsd:sys_sa_enable+0xd4: or %g0, %l2, %i0 > netbsd:sys_sa_enable+0xd8: call netbsd:mutex_exit > netbsd:sys_sa_enable+0xdc: ld [%l1 + 0xc], %o0 > netbsd:sys_sa_enable+0xe0: return [%i7 + 0x8] > netbsd:sys_sa_enable+0xe4: nop > > any idea ? gdb netbsd.gdb list *(sys_sa_enable+0xd8) list *(sparc_interrupt+0x1f0) Looking at it, it sure looks like the call to mutex_exit. The C code is: mutex_enter(p->p_lock); vp->savp_lwp = l; p->p_sflag |= PS_SA; lwp_lock(l); l->l_flag |= LW_SA; /* We are now an activation LWP */ lwp_unlock(l); mutex_exit(p->p_lock); Looking at the other code in sys_sa_enable(), I think all the locking is fine. While I haven't tried a recent (last 2 days) kernel, this was working fine on i386. Try a kernel with LOCKDEBUG? Take care, Bill
Attachment:
pgpLmlr57Xldx.pgp
Description: PGP signature