tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: -current kernel hang with a 4.0 binary



On Fri, Oct 24, 2008 at 08:54:22PM +0200, Manuel Bouyer wrote:
> Hi,
> I've got a -current NetBSD/sparc (sun4U system) system hang, reliably,
> when building pkgsrc/python24 on a 4.0 userland. Here's what ddb says:
> [BREAK]
> Stopped in pid 7114.1 (conftest) at     netbsd:cpu_Debugger+0x4:        nop
> db> tr
> sparc_interrupt(b945a30, cc7d700, cc7d700, 0, ded7fc0, ded1f80) at 
> netbsd:sparc_interrupt+0x1f0
> sys_sa_enable(cc7d700, d4bbdd0, d4bbe10, 1, 0, 1c14000) at 
> netbsd:sys_sa_enable+0xd8
> syscall_plain(d4bbed0, cc52d20, 400f4f34, d4bbdd0, 400, 400f4f38) at 
> netbsd:syscall_plain+0x318
> ?(ffe00034, ffffb744, c, 3, ffffb7f7, ffffb7f8) at 0x1008c74
> db> 
> 
> the stack trace is always the same.
> 
> dissasembly:
> 
> netbsd:sparc_interrupt+0x1c8:   subcc           %l7, %l2, %g0
> netbsd:sparc_interrupt+0x1cc:   bne,pn          netbsd:sparc_interrupt+0x1ac
> netbsd:sparc_interrupt+0x1d0:   add             %sp, 0xb0, %o2
> netbsd:sparc_interrupt+0x1d4:   ld              [%l2 + 0x18], %l7
> netbsd:sparc_interrupt+0x1d8:   membar          0x4l
> netbsd:sparc_interrupt+0x1dc:   st              %g0, [%l2 + 0x18]
> netbsd:sparc_interrupt+0x1e0:   membar          0x4l
> netbsd:sparc_interrupt+0x1e4:   ld              [%l2 + 0x0], %o4
> netbsd:sparc_interrupt+0x1e8:   ld              [%l2 + 0x4], %o0
> netbsd:sparc_interrupt+0x1ec:   wrpr            %g0, 0xe, %pstate
> netbsd:sparc_interrupt+0x1f0:   jmpl            [%o4 + %g0], %o7
> netbsd:sparc_interrupt+0x1f4:   movrz           %o0, %o2, %o0
> netbsd:sparc_interrupt+0x1f8:   wrpr            %g0, 0xc, %pstate
> netbsd:sparc_interrupt+0x1fc:   ld              [%l2 + 0x20], %l1
> netbsd:sparc_interrupt+0x200:   membar          0x4l
> netbsd:sparc_interrupt+0x204:   brz,pn          %l1, 
> netbsd:sparc_interrupt+0x214
> netbsd:sparc_interrupt+0x208:   add             %l5, %o0, %l5
> netbsd:sparc_interrupt+0x20c:   stx             %g0, [%l1 + %g0]
> netbsd:sparc_interrupt+0x210:   membar          0x4l
> netbsd:sparc_interrupt+0x214:   subcc           %l7, -0x1, %g0
> netbsd:sparc_interrupt+0x218:   bne,pn          netbsd:sparc_interrupt+0x1d0
> 
> 
> 
> netbsd:sys_sa_enable+0xac:      ld              [%i0 + 0x10], %g1
> netbsd:sys_sa_enable+0xb0:      or              %g0, %l0, %o1
> netbsd:sys_sa_enable+0xb4:      subcc           %l0, %g1, %g0
> netbsd:sys_sa_enable+0xb8:      bne,pn          netbsd:sys_sa_enable+0xfc
> netbsd:sys_sa_enable+0xbc:      or              %g0, %i0, %o0
> netbsd:sys_sa_enable+0xc0:      ld              [%i0 + 0x24], %g1
> netbsd:sys_sa_enable+0xc4:      ld              [%i0 + 0x10], %o0
> netbsd:sys_sa_enable+0xc8:      or              %g1, 0x400, %g1
> netbsd:sys_sa_enable+0xcc:      st              %g1, [%i0 + 0x24]
> netbsd:sys_sa_enable+0xd0:      call            netbsd:mutex_vector_exit
> netbsd:sys_sa_enable+0xd4:      or              %g0, %l2, %i0
> netbsd:sys_sa_enable+0xd8:      call            netbsd:mutex_exit
> netbsd:sys_sa_enable+0xdc:      ld              [%l1 + 0xc], %o0
> netbsd:sys_sa_enable+0xe0:      return          [%i7 + 0x8]
> netbsd:sys_sa_enable+0xe4:      nop
> 
> any idea ?

gdb netbsd.gdb
list *(sys_sa_enable+0xd8)
list *(sparc_interrupt+0x1f0)

Looking at it, it sure looks like the call to mutex_exit. The C code is:

        mutex_enter(p->p_lock);
        vp->savp_lwp = l;
        p->p_sflag |= PS_SA;
        lwp_lock(l);
        l->l_flag |= LW_SA; /* We are now an activation LWP */
        lwp_unlock(l);
        mutex_exit(p->p_lock);

Looking at the other code in sys_sa_enable(), I think all the locking is 
fine. While I haven't tried a recent (last 2 days) kernel, this was 
working fine on i386.

Try a kernel with LOCKDEBUG?

Take care,

Bill

Attachment: pgpLmlr57Xldx.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index