NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself



>Number:         54227
>Category:       kern
>Synopsis:       Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 24 07:45:00 +0000 2019
>Originator:     Paul Ripke
>Release:        NetBSD 8.1_RC1 2019-05-15
>Organization:
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>Environment:
	
	
System: NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
Architecture: sparc
Machine: sparc
>Description:
Repeated panics during or shortly after boot, with matching stacks, on
an old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
running fine with an old kernel built from netbsd-8:

NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018

Booting a kernel from netbsd-8 from the last few days:

NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019

The ORAC config is just an include of GENERIC with unneeded
drivers+options nulled out.

panics either during or shortly after boot, with the following console
log:

---
Starting sshd.
Mutex error: mutex_vector_enter,552: locking against myself

lock address : 0x00000000f04aafc0
current cpu  :                  0
current lwp  : 0x00000000f0604680
owner field  : 0x00000000f0604680 wait/spin:                0/0

panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: lock 0xf04aafc0 cpu 0 lwp 0xf0604680
cpu0: Begin traceback...
0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at netbsd:panic+0x20
panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at netbsd:lockdebug_abort+0x9c
lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 0xf0002000) at netbsd:mutex_enter+0x1cc
mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) at netbsd:sosend+0x44
sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at netbsd:nfs_send+0x90
nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at netbsd:nfs_request+0x2f4
nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at netbsd:nfs_readrpc+0x1dc
nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) at netbsd:nfs_doio+0x6bc
nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at netbsd:VOP_STRATEGY+0x3c
VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) at netbsd:sw_reg_start.part.0+0x20
sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x100000, 0xf06d8cb8) at netbsd:swstrategy+0x3fc
swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at netbsd:bdev_strategy+0x50
bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at netbsd:spec_strategy+0x88
spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at netbsd:VOP_STRATEGY+0x3c
VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 0xf04fb000) at netbsd:uvm_swap_io+0x10c
uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x100000, 0x100000, 0xf05b6480) at netbsd:uvm_swap_get+0x3c
uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at netbsd:uvmfault_anonget+0x2c4
uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at netbsd:mem_access_fault4m+0x514
mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at netbsd:memfault_sun4m+0xe8
memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x20000) at netbsd:copyout+0x28
copyout(0x0, 0xf3454d88, 0xedb05000, 0xeffff400, 0x0, 0xf0644e60) at netbsd:rt_walktree_visitor+0xc
rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xeffff400, 0x0, 0x0) at netbsd:rn_walktree+0xbc
rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) at netbsd:rtbl_walktree+0x30
rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at netbsd:sysctl_rtable+0x114
sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at netbsd:sysctl_dispatch+0x94
sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at netbsd:sys___sysctl+0xc4
sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xeffff404, 0x1b54, 0xeffff400) at netbsd:syscall+0x248
syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at netbsd:memfault_sun4m+0x3f4
cpu0: End traceback...
Frame pointer is at 0xf3453f20
Call traceback:
  pc = 0xf0024fec  args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, 0xf0002000) fp = 0xf3453f90
  pc = 0xf01dd358  args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, 0xf0344c00) fp = 0xf3453ff8
  pc = 0xf01dd3e4  args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) fp = 0xf3454058
rebooting
---

>How-To-Repeat:
I'm assuming this is likely due to remote/nfs swap, on a
relatively memory starved machine.
>Fix:
My hunch is it might be due to this commit?

sys/net/rtsock.c                                1.247

        Protect sysctl_rtable with KERNEL_LOCK and softnet_lock.
        [ozaki-r, ticket #1203]

System seems stable with this horrible hacky patch:

--- a/sys/net/rtsock.c
+++ b/sys/net/rtsock.c
@@ -1873,6 +1873,11 @@ again:
        w.w_needed = 0 - w.w_given;
        w.w_where = where;
 
+       /* XXX(stix): prefill user pages */
+       for (int offset = 0; offset < *given; offset += 4096) {
+               subyte((char*)where + offset, 0);
+       }
+
        SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
        s = splsoftnet();
        switch (w.w_op) {


So I guess copyout(9) shouldn't be called at splsoftnet(9) for nfs swap
to be stable?



Home | Main Index | Thread Index | Old Index