Re: A few crashes with yesterday's amd64-current -- IPv6 related?

To: Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost>
Subject: Re: A few crashes with yesterday's amd64-current -- IPv6 related?
From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
Date: Sun, 5 Mar 2017 20:18:35 +0900

Hi Tom

Thank you for the reports.


On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> wrote:
> I updated again yesterday, and it seems at least one stability issue
> has been introduced since 7.99.59, which I was running before this.
>
> The first crash came when I was trying to shut down to single user after
> booting the new kernel with the existing userland.  I *think* it was
> triggered by the kernel missing the correct module directory; I caught a
> glimpse of it trying to access a module to connect to the console, and I
> later discovered that my ttys file had console enabled instead of ttyE0:
>
> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, but preemption is enabled and the caller is not in a softint or CPU-bound LWP
> cpu1: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> psref_release() at netbsd:psref_release+0xf8
> ip_setmoptions() at netbsd:ip_setmoptions+0x269
> ip_ctloutput() at netbsd:ip_ctloutput+0x1ee
> rip_ctloutput() at netbsd:rip_ctloutput+0xee
> rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c
> sosetopt() at netbsd:sosetopt+0x67
> sys_setsockopt() at netbsd:sys_setsockopt+0x91
> syscall() at netbsd:syscall+0x1d8
> --- syscall (number 105) ---
> 7eb0dacdb16a:
> cpu1: End traceback...

I fixed the panic in -current.

>
> Then it crashed during boot, seemingly related to fsck:
>
> panic: ffs_sync: rofs mod, fs=/
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ffs_sync() at netbsd:ffs_sync+0x26b
> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
> sched_sync() at netbsd:sched_sync+0x27b
> cpu0: End traceback...
>
> Anyway, I installed the complete updated userland on the machine, and
> started updating a bunch of packages from source, with all disk activity
> over NFS over UDP over IPv6.  After about three hours:
>
> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file "/usr/src/sys/dev/ic/rtl8169.c", line 1380
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> re_txeof() at netbsd:re_txeof+0x250
> re_intr() at netbsd:re_intr+0x11b
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
> --- interrupt ---
> x86_mwait() at netbsd:x86_mwait+0xd
> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
> idle_loop() at netbsd:idle_loop+0x18c
> cpu0: End traceback...
> uvm_fault(0xfffffe80cbca48c0, 0x0, 2) -> e
> fatal page fault in supervisor mode
> trap type 6 code 2 rip ffffffff8095500b cs 8 rflags 10282 cr2 84 ilevel 8 rsp fffffe8040afea80
> curlwp 0xfffffe804dedaa20 pid 20873.1 lowest kstack 0xfffffe8040afb2c0
>
> Once more, it crashed during boot, just like after the first crash:
>
> panic: ffs_sync: rofs mod, fs=/
> cpu1: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ffs_sync() at netbsd:ffs_sync+0x26b
> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
> sched_sync() at netbsd:sched_sync+0x27b
> cpu1: End traceback...
>
> I tried to continue building packages over NFS, but this happened again:
>
> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file "/usr/src/sys/dev/ic/rtl8169.c", line 1380
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
> re_txeof() at netbsd:re_txeof+0x250
> re_intr() at netbsd:re_intr+0x11b
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
> --- interrupt ---
> x86_mwait() at netbsd:x86_mwait+0xd
> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
> idle_loop() at netbsd:idle_loop+0x18c
> cpu0: End traceback...
>
> This is when I pointed WRKOBJDIR to a local scratch directory in
> /etc/mk.conf, thus reducing the amount of network traffic severely.
> It's now building happily.  :)
>
> I've noticed quite a few IPv6 changes, lately.  Might these mbuf related
> assertions have something to do with that?

I doubt rather the change of rtl8169.c,v 1.149; it applied the deferred
if_start mechanism to re(4).

Could you apply the following patch and try again? If the patch doesn't
help, could you revert rtl8169.c,v 1.149 and try again?

Thanks,
  ozaki-r

diff --git a/sys/net/if.c b/sys/net/if.c
index 482bcbe..61c1b50 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -1008,8 +1008,11 @@ if_deferred_start_softint(void *arg)
 static void
 if_deferred_start_common(struct ifnet *ifp)
 {
+       int s;

+       s = splnet();
        if_start_lock(ifp);
+       splx(s);
 }

 static inline bool

Follow-Ups:
- Re: A few crashes with yesterday's amd64-current -- IPv6 related?
  - From: Tom Ivar Helbekkmo
- Re: A few crashes with yesterday's amd64-current -- IPv6 related?
  - From: Ryota Ozaki

References:
- A few crashes with yesterday's amd64-current -- IPv6 related?
  - From: Tom Ivar Helbekkmo

Prev by Date: A few crashes with yesterday's amd64-current -- IPv6 related?
Next by Date: Re: A few crashes with yesterday's amd64-current -- IPv6 related?
Previous by Thread: A few crashes with yesterday's amd64-current -- IPv6 related?
Next by Thread: Re: A few crashes with yesterday's amd64-current -- IPv6 related?
Indexes:

Home | Main Index | Thread Index | Old Index