Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A few crashes with yesterday's amd64-current -- IPv6 related?



On Sun, Mar 5, 2017 at 8:18 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> Hi Tom
>
> Thank you for the reports.
>
>
> On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmo <tih%hamartun.priv.no@localhost> wrote:
>> I updated again yesterday, and it seems at least one stability issue
>> has been introduced since 7.99.59, which I was running before this.
>>
>> The first crash came when I was trying to shut down to single user after
>> booting the new kernel with the existing userland.  I *think* it was
>> triggered by the kernel missing the correct module directory; I caught a
>> glimpse of it trying to access a module to connect to the console, and I
>> later discovered that my ttys file had console enabled instead of ttyE0:
>>
>> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, but preemption is enabled and the caller is not in a softint or CPU-bound LWP
>> cpu1: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> psref_release() at netbsd:psref_release+0xf8
>> ip_setmoptions() at netbsd:ip_setmoptions+0x269
>> ip_ctloutput() at netbsd:ip_ctloutput+0x1ee
>> rip_ctloutput() at netbsd:rip_ctloutput+0xee
>> rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c
>> sosetopt() at netbsd:sosetopt+0x67
>> sys_setsockopt() at netbsd:sys_setsockopt+0x91
>> syscall() at netbsd:syscall+0x1d8
>> --- syscall (number 105) ---
>> 7eb0dacdb16a:
>> cpu1: End traceback...
>
> I fixed the panic in -current.
>
>>
>> Then it crashed during boot, seemingly related to fsck:
>>
>> panic: ffs_sync: rofs mod, fs=/
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> snprintf() at netbsd:snprintf
>> ffs_sync() at netbsd:ffs_sync+0x26b
>> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
>> sched_sync() at netbsd:sched_sync+0x27b
>> cpu0: End traceback...
>>
>> Anyway, I installed the complete updated userland on the machine, and
>> started updating a bunch of packages from source, with all disk activity
>> over NFS over UDP over IPv6.  After about three hours:
>>
>> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file "/usr/src/sys/dev/ic/rtl8169.c", line 1380
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> re_txeof() at netbsd:re_txeof+0x250
>> re_intr() at netbsd:re_intr+0x11b
>> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
>> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
>> --- interrupt ---
>> x86_mwait() at netbsd:x86_mwait+0xd
>> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
>> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
>> idle_loop() at netbsd:idle_loop+0x18c
>> cpu0: End traceback...
>> uvm_fault(0xfffffe80cbca48c0, 0x0, 2) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 2 rip ffffffff8095500b cs 8 rflags 10282 cr2 84 ilevel 8 rsp fffffe8040afea80
>> curlwp 0xfffffe804dedaa20 pid 20873.1 lowest kstack 0xfffffe8040afb2c0
>>
>> Once more, it crashed during boot, just like after the first crash:
>>
>> panic: ffs_sync: rofs mod, fs=/
>> cpu1: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> snprintf() at netbsd:snprintf
>> ffs_sync() at netbsd:ffs_sync+0x26b
>> VFS_SYNC() at netbsd:VFS_SYNC+0x1c
>> sched_sync() at netbsd:sched_sync+0x27b
>> cpu1: End traceback...
>>
>> I tried to continue building packages over NFS, but this happened again:
>>
>> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file "/usr/src/sys/dev/ic/rtl8169.c", line 1380
>> cpu0: Begin traceback...
>> vpanic() at netbsd:vpanic+0x140
>> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
>> re_txeof() at netbsd:re_txeof+0x250
>> re_intr() at netbsd:re_intr+0x11b
>> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
>> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
>> --- interrupt ---
>> x86_mwait() at netbsd:x86_mwait+0xd
>> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
>> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
>> idle_loop() at netbsd:idle_loop+0x18c
>> cpu0: End traceback...
>>
>> This is when I pointed WRKOBJDIR to a local scratch directory in
>> /etc/mk.conf, thus reducing the amount of network traffic severely.
>> It's now building happily.  :)
>>
>> I've noticed quite a few IPv6 changes, lately.  Might these mbuf related
>> assertions have something to do with that?
>
> I doubt rather the change of rtl8169.c,v 1.149; it applied the deferred
> if_start mechanism to re(4).
>
> Could you apply the following patch and try again? If the patch doesn't
> help, could you revert rtl8169.c,v 1.149 and try again?

Oops. Reverting the commit makes no sense. Please ignore the second
request.

  ozaki-r

>
> Thanks,
>   ozaki-r
>
> diff --git a/sys/net/if.c b/sys/net/if.c
> index 482bcbe..61c1b50 100644
> --- a/sys/net/if.c
> +++ b/sys/net/if.c
> @@ -1008,8 +1008,11 @@ if_deferred_start_softint(void *arg)
>  static void
>  if_deferred_start_common(struct ifnet *ifp)
>  {
> +       int s;
>
> +       s = splnet();
>         if_start_lock(ifp);
> +       splx(s);
>  }
>
>  static inline bool


Home | Main Index | Thread Index | Old Index