NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself



The following reply was made to PR kern/54227; it has been noted by GNATS.

From: Ryota Ozaki <ozaki-r%iij.ad.jp@localhost>
To: stix%stix.id.au@localhost
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking
 against myself
Date: Fri, 24 May 2019 19:06:20 +0900 (JST)

 On 2019/05/24 16:45
 stix%stix.id.au@localhost wrote:
 
 > >Number:         54227
 > >Category:       kern
 > >Synopsis:       Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Fri May 24 07:45:00 +0000 2019
 > >Originator:     Paul Ripke
 > >Release:        NetBSD 8.1_RC1 2019-05-15
 > >Organization:
 > Paul Ripke
 > "Great minds discuss ideas, average minds discuss events, small minds
 >  discuss people."
 > -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
 > >Environment:
 > 	
 > 	
 > System: NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
 > Architecture: sparc
 > Machine: sparc
 > >Description:
 > Repeated panics during or shortly after boot, with matching stacks, on
 > an old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
 > running fine with an old kernel built from netbsd-8:
 > 
 > NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018
 > 
 > Booting a kernel from netbsd-8 from the last few days:
 > 
 > NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
 > 
 > The ORAC config is just an include of GENERIC with unneeded
 > drivers+options nulled out.
 > 
 > panics either during or shortly after boot, with the following console
 > log:
 > 
 > ---
 > Starting sshd.
 > Mutex error: mutex_vector_enter,552: locking against myself
 > 
 > lock address : 0x00000000f04aafc0
 > current cpu  :                  0
 > current lwp  : 0x00000000f0604680
 > owner field  : 0x00000000f0604680 wait/spin:                0/0
 > 
 > panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: lock 0xf04aafc0 cpu 0 lwp 0xf0604680
 > cpu0: Begin traceback...
 > 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at netbsd:panic+0x20
 > panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at netbsd:lockdebug_abort+0x9c
 > lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 0xf0002000) at netbsd:mutex_enter+0x1cc
 > mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) at netbsd:sosend+0x44
 > sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at netbsd:nfs_send+0x90
 > nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at netbsd:nfs_request+0x2f4
 > nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at netbsd:nfs_readrpc+0x1dc
 > nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) at netbsd:nfs_doio+0x6bc
 > nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at netbsd:VOP_STRATEGY+0x3c
 > VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) at netbsd:sw_reg_start.part.0+0x20
 > sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x100000, 0xf06d8cb8) at netbsd:swstrategy+0x3fc
 > swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at netbsd:bdev_strategy+0x50
 > bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at netbsd:spec_strategy+0x88
 > spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at netbsd:VOP_STRATEGY+0x3c
 > VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 0xf04fb000) at netbsd:uvm_swap_io+0x10c
 > uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x100000, 0x100000, 0xf05b6480) at netbsd:uvm_swap_get+0x3c
 > uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at netbsd:uvmfault_anonget+0x2c4
 > uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
 > uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at netbsd:mem_access_fault4m+0x514
 > mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at netbsd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x20000) at netbsd:copyout+0x28
 > copyout(0x0, 0xf3454d88, 0xedb05000, 0xeffff400, 0x0, 0xf0644e60) at netbsd:rt_walktree_visitor+0xc
 > rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xeffff400, 0x0, 0x0) at netbsd:rn_walktree+0xbc
 > rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) at netbsd:rtbl_walktree+0x30
 > rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at netbsd:sysctl_rtable+0x114
 > sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at netbsd:sysctl_dispatch+0x94
 > sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at netbsd:sys___sysctl+0xc4
 > sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xeffff404, 0x1b54, 0xeffff400) at netbsd:syscall+0x248
 > syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at netbsd:memfault_sun4m+0x3f4
 > cpu0: End traceback...
 > Frame pointer is at 0xf3453f20
 > Call traceback:
 >   pc = 0xf0024fec  args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, 0xf0002000) fp = 0xf3453f90
 >   pc = 0xf01dd358  args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, 0xf0344c00) fp = 0xf3453ff8
 >   pc = 0xf01dd3e4  args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) fp = 0xf3454058
 > rebooting
 > ---
 > 
 > >How-To-Repeat:
 > I'm assuming this is likely due to remote/nfs swap, on a
 > relatively memory starved machine.
 > >Fix:
 > My hunch is it might be due to this commit?
 > 
 > sys/net/rtsock.c                                1.247
 > 
 >         Protect sysctl_rtable with KERNEL_LOCK and softnet_lock.
 >         [ozaki-r, ticket #1203]
 > 
 > System seems stable with this horrible hacky patch:
 > 
 > --- a/sys/net/rtsock.c
 > +++ b/sys/net/rtsock.c
 > @@ -1873,6 +1873,11 @@ again:
 >         w.w_needed = 0 - w.w_given;
 >         w.w_where = where;
 >  
 > +       /* XXX(stix): prefill user pages */
 > +       for (int offset = 0; offset < *given; offset += 4096) {
 > +               subyte((char*)where + offset, 0);
 > +       }
 > +
 >         SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
 >         s = splsoftnet();
 >         switch (w.w_op) {
 > 
 > 
 > So I guess copyout(9) shouldn't be called at splsoftnet(9) for nfs swap
 > to be stable?
 
 softnet_lock is the cause of the panic.
 
 So could you try the below patch?
 
 Thanks,
   ozaki-r
 
 ---
 diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c
 index 399b2049130..4f17e716e29 100644
 --- a/sys/net/rtsock.c
 +++ b/sys/net/rtsock.c
 @@ -1873,7 +1873,7 @@ again:
         w.w_needed = 0 - w.w_given;
         w.w_where = where;
 
 -       SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
 +       KERNEL_LOCK_UNLESS_NET_MPSAFE();
         s = splsoftnet();
         switch (w.w_op) {
 
 @@ -1932,7 +1932,7 @@ again:
                 break;
         }
         splx(s);
 -       SOFTNET_KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
 +       KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
 
         /* check to see if we couldn't allocate memory with NOWAIT */
         if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)
 



Home | Main Index | Thread Index | Old Index