NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself




On 2019/05/24 16:45
stix%stix.id.au@localhost wrote:

> >Number:         54227
> >Category:       kern
> >Synopsis:       Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
> >Confidential:   no
> >Severity:       serious
> >Priority:       medium
> >Responsible:    kern-bug-people
> >State:          open
> >Class:          sw-bug
> >Submitter-Id:   net
> >Arrival-Date:   Fri May 24 07:45:00 +0000 2019
> >Originator:     Paul Ripke
> >Release:        NetBSD 8.1_RC1 2019-05-15
> >Organization:
> Paul Ripke
> "Great minds discuss ideas, average minds discuss events, small minds
>  discuss people."
> -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
> >Environment:
> 	
> 	
> System: NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
> Architecture: sparc
> Machine: sparc
> >Description:
> Repeated panics during or shortly after boot, with matching stacks, on
> an old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
> running fine with an old kernel built from netbsd-8:
> 
> NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018
> 
> Booting a kernel from netbsd-8 from the last few days:
> 
> NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
> 
> The ORAC config is just an include of GENERIC with unneeded
> drivers+options nulled out.
> 
> panics either during or shortly after boot, with the following console
> log:
> 
> ---
> Starting sshd.
> Mutex error: mutex_vector_enter,552: locking against myself
> 
> lock address : 0x00000000f04aafc0
> current cpu  :                  0
> current lwp  : 0x00000000f0604680
> owner field  : 0x00000000f0604680 wait/spin:                0/0
> 
> panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: lock 0xf04aafc0 cpu 0 lwp 0xf0604680
> cpu0: Begin traceback...
> 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at netbsd:panic+0x20
> panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at netbsd:lockdebug_abort+0x9c
> lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 0xf0002000) at netbsd:mutex_enter+0x1cc
> mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) at netbsd:sosend+0x44
> sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at netbsd:nfs_send+0x90
> nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at netbsd:nfs_request+0x2f4
> nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at netbsd:nfs_readrpc+0x1dc
> nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) at netbsd:nfs_doio+0x6bc
> nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) at netbsd:sw_reg_start.part.0+0x20
> sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x100000, 0xf06d8cb8) at netbsd:swstrategy+0x3fc
> swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at netbsd:bdev_strategy+0x50
> bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at netbsd:spec_strategy+0x88
> spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 0xf04fb000) at netbsd:uvm_swap_io+0x10c
> uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x100000, 0x100000, 0xf05b6480) at netbsd:uvm_swap_get+0x3c
> uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at netbsd:uvmfault_anonget+0x2c4
> uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
> uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at netbsd:mem_access_fault4m+0x514
> mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at netbsd:memfault_sun4m+0xe8
> memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x20000) at netbsd:copyout+0x28
> copyout(0x0, 0xf3454d88, 0xedb05000, 0xeffff400, 0x0, 0xf0644e60) at netbsd:rt_walktree_visitor+0xc
> rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xeffff400, 0x0, 0x0) at netbsd:rn_walktree+0xbc
> rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) at netbsd:rtbl_walktree+0x30
> rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at netbsd:sysctl_rtable+0x114
> sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at netbsd:sysctl_dispatch+0x94
> sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at netbsd:sys___sysctl+0xc4
> sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xeffff404, 0x1b54, 0xeffff400) at netbsd:syscall+0x248
> syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at netbsd:memfault_sun4m+0x3f4
> cpu0: End traceback...
> Frame pointer is at 0xf3453f20
> Call traceback:
>   pc = 0xf0024fec  args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, 0xf0002000) fp = 0xf3453f90
>   pc = 0xf01dd358  args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, 0xf0344c00) fp = 0xf3453ff8
>   pc = 0xf01dd3e4  args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) fp = 0xf3454058
> rebooting
> ---
> 
> >How-To-Repeat:
> I'm assuming this is likely due to remote/nfs swap, on a
> relatively memory starved machine.
> >Fix:
> My hunch is it might be due to this commit?
> 
> sys/net/rtsock.c                                1.247
> 
>         Protect sysctl_rtable with KERNEL_LOCK and softnet_lock.
>         [ozaki-r, ticket #1203]
> 
> System seems stable with this horrible hacky patch:
> 
> --- a/sys/net/rtsock.c
> +++ b/sys/net/rtsock.c
> @@ -1873,6 +1873,11 @@ again:
>         w.w_needed = 0 - w.w_given;
>         w.w_where = where;
>  
> +       /* XXX(stix): prefill user pages */
> +       for (int offset = 0; offset < *given; offset += 4096) {
> +               subyte((char*)where + offset, 0);
> +       }
> +
>         SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
>         s = splsoftnet();
>         switch (w.w_op) {
> 
> 
> So I guess copyout(9) shouldn't be called at splsoftnet(9) for nfs swap
> to be stable?

softnet_lock is the cause of the panic.

So could you try the below patch?

Thanks,
  ozaki-r

---
diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c
index 399b2049130..4f17e716e29 100644
--- a/sys/net/rtsock.c
+++ b/sys/net/rtsock.c
@@ -1873,7 +1873,7 @@ again:
        w.w_needed = 0 - w.w_given;
        w.w_where = where;

-       SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
+       KERNEL_LOCK_UNLESS_NET_MPSAFE();
        s = splsoftnet();
        switch (w.w_op) {

@@ -1932,7 +1932,7 @@ again:
                break;
        }
        splx(s);
-       SOFTNET_KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
+       KERNEL_UNLOCK_UNLESS_NET_MPSAFE();

        /* check to see if we couldn't allocate memory with NOWAIT */
        if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)



Home | Main Index | Thread Index | Old Index