NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
The following reply was made to PR kern/54227; it has been noted by GNATS.
From: Ryota Ozaki <ozaki-r%iij.ad.jp@localhost>
To: stix%stix.id.au@localhost
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/54227: Panic on netbsd 8.1 sparc nfsroot: sosend: locking
against myself
Date: Fri, 24 May 2019 19:06:20 +0900 (JST)
On 2019/05/24 16:45
stix%stix.id.au@localhost wrote:
> >Number: 54227
> >Category: kern
> >Synopsis: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Fri May 24 07:45:00 +0000 2019
> >Originator: Paul Ripke
> >Release: NetBSD 8.1_RC1 2019-05-15
> >Organization:
> Paul Ripke
> "Great minds discuss ideas, average minds discuss events, small minds
> discuss people."
> -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
> >Environment:
>
>
> System: NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
> Architecture: sparc
> Machine: sparc
> >Description:
> Repeated panics during or shortly after boot, with matching stacks, on
> an old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been
> running fine with an old kernel built from netbsd-8:
>
> NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018
>
> Booting a kernel from netbsd-8 from the last few days:
>
> NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019
>
> The ORAC config is just an include of GENERIC with unneeded
> drivers+options nulled out.
>
> panics either during or shortly after boot, with the following console
> log:
>
> ---
> Starting sshd.
> Mutex error: mutex_vector_enter,552: locking against myself
>
> lock address : 0x00000000f04aafc0
> current cpu : 0
> current lwp : 0x00000000f0604680
> owner field : 0x00000000f0604680 wait/spin: 0/0
>
> panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: lock 0xf04aafc0 cpu 0 lwp 0xf0604680
> cpu0: Begin traceback...
> 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at netbsd:panic+0x20
> panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at netbsd:lockdebug_abort+0x9c
> lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, 0xf0002000) at netbsd:mutex_enter+0x1cc
> mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) at netbsd:sosend+0x44
> sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at netbsd:nfs_send+0x90
> nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at netbsd:nfs_request+0x2f4
> nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at netbsd:nfs_readrpc+0x1dc
> nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) at netbsd:nfs_doio+0x6bc
> nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) at netbsd:sw_reg_start.part.0+0x20
> sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x100000, 0xf06d8cb8) at netbsd:swstrategy+0x3fc
> swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at netbsd:bdev_strategy+0x50
> bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at netbsd:spec_strategy+0x88
> spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at netbsd:VOP_STRATEGY+0x3c
> VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, 0xf04fb000) at netbsd:uvm_swap_io+0x10c
> uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x100000, 0x100000, 0xf05b6480) at netbsd:uvm_swap_get+0x3c
> uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at netbsd:uvmfault_anonget+0x2c4
> uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, 0xf044a530) at netbsd:uvm_fault_internal+0xbbc
> uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at netbsd:mem_access_fault4m+0x514
> mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at netbsd:memfault_sun4m+0xe8
> memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x20000) at netbsd:copyout+0x28
> copyout(0x0, 0xf3454d88, 0xedb05000, 0xeffff400, 0x0, 0xf0644e60) at netbsd:rt_walktree_visitor+0xc
> rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xeffff400, 0x0, 0x0) at netbsd:rn_walktree+0xbc
> rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) at netbsd:rtbl_walktree+0x30
> rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at netbsd:sysctl_rtable+0x114
> sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at netbsd:sysctl_dispatch+0x94
> sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at netbsd:sys___sysctl+0xc4
> sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xeffff404, 0x1b54, 0xeffff400) at netbsd:syscall+0x248
> syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at netbsd:memfault_sun4m+0x3f4
> cpu0: End traceback...
> Frame pointer is at 0xf3453f20
> Call traceback:
> pc = 0xf0024fec args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, 0xf0002000) fp = 0xf3453f90
> pc = 0xf01dd358 args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, 0xf0344c00) fp = 0xf3453ff8
> pc = 0xf01dd3e4 args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) fp = 0xf3454058
> rebooting
> ---
>
> >How-To-Repeat:
> I'm assuming this is likely due to remote/nfs swap, on a
> relatively memory starved machine.
> >Fix:
> My hunch is it might be due to this commit?
>
> sys/net/rtsock.c 1.247
>
> Protect sysctl_rtable with KERNEL_LOCK and softnet_lock.
> [ozaki-r, ticket #1203]
>
> System seems stable with this horrible hacky patch:
>
> --- a/sys/net/rtsock.c
> +++ b/sys/net/rtsock.c
> @@ -1873,6 +1873,11 @@ again:
> w.w_needed = 0 - w.w_given;
> w.w_where = where;
>
> + /* XXX(stix): prefill user pages */
> + for (int offset = 0; offset < *given; offset += 4096) {
> + subyte((char*)where + offset, 0);
> + }
> +
> SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
> s = splsoftnet();
> switch (w.w_op) {
>
>
> So I guess copyout(9) shouldn't be called at splsoftnet(9) for nfs swap
> to be stable?
softnet_lock is the cause of the panic.
So could you try the below patch?
Thanks,
ozaki-r
---
diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c
index 399b2049130..4f17e716e29 100644
--- a/sys/net/rtsock.c
+++ b/sys/net/rtsock.c
@@ -1873,7 +1873,7 @@ again:
w.w_needed = 0 - w.w_given;
w.w_where = where;
- SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
+ KERNEL_LOCK_UNLESS_NET_MPSAFE();
s = splsoftnet();
switch (w.w_op) {
@@ -1932,7 +1932,7 @@ again:
break;
}
splx(s);
- SOFTNET_KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
+ KERNEL_UNLOCK_UNLESS_NET_MPSAFE();
/* check to see if we couldn't allocate memory with NOWAIT */
if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)
Home |
Main Index |
Thread Index |
Old Index