NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proxy server, mode intercept on NetBSD 7.0.1



For what it's worth, I have upgraded my kernel to NETBSD-7 as of about
Aug 1, which includes some recent fixes to ipfilter by christos.  The
crash has occurred with the new kernel.  For some reason that core dump
does not show up in /var/crash.  I don't know why but if it happens
again and I get a core dump I'll post a new trace.

Stephen I don't know if you have noticed a pattern to your crashes but
mine tends to happen when the box is forwarding VPN traffic.



On Thu, Aug 4, 2016, at 06:39 AM, Stephen Borrill wrote:
> On Wed, 3 Aug 2016, Christos Zoulas wrote:
> > In article <Pine.NEB.4.64.1608021259170.524%ugly.internal.precedence.co.uk@localhost>,
> > Stephen Borrill  <netbsd%precedence.co.uk@localhost> wrote:
> >> On Mon, 1 Aug 2016, metalliqaz%fastmail.fm@localhost wrote:
> >>> I've been very disappointed with the quality of NetBSD 7.0.1 since I
> >>> upgraded from 6.1.5 a few weeks ago.  I've been running pretty much the
> >>> same system config as my home router/NAT/firewall/server since NetBSD
> >>> 1.5.  I believe that's almost 15 years of ipfilter/ipnat.  It has always
> >>> worked well for me... until I moved to NetBSD 7.   I've had several
> >>> issues with various parts of the OS, but ipf is the one that causes
> >>> random kernel panics.
> >>
> >> I've got to agree with you. I've been using NetBSD for commercial products
> >> since 1996 and NetBSD 7 is the first upgrade that's got me nervous.
> >> Kudos to developers who've helped out with USB failing to work, squid
> >> interception, etc. The random lockups and panics with IPfilter are the
> >> most worrying for me though:
> >>
> >> http://gnats.netbsd.org/50168
> >>
> >> I believe that the bugs are triggered by external packets which is why
> >> they are random (disconnecting from the Internet stops the problems).
> >> Machines which have been solid for months have just started locking. I
> >> count this as a remote DoS vulnerability, but haven't yet tracked down
> >> the triggers.
> >>
> >> We need to support an installed base of a mix of netbsd-5 and
> >> netbsd-7 machines. Until we complete the upgrade to netbsd-7, npf will
> >> increase that workload because of duplication of effort. Even so as the
> >> firewall rules are autogenerated and have been developed over a number of
> >> years, it is not a small change to go into production systems.
> >>
> >>>
> >> -------------------------------------------------------------------------------
> >>>
> >>> bash-4.3# crash -M netbsd.0.core -N netbsd.0
> >>> Crash version 7.0.1, image version 7.0.1.
> >>> System panicked: trap
> >>> Backtrace from time of crash is available.
> >>> crash> bt
> >>> _KERNEL_OPT_NARCNET() at 0
> >>> _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x1
> >>> vpanic() at vpanic+0x145
> >>> snprintf() at snprintf
> >>> startlwp() at startlwp
> >>> calltrap() at calltrap+0x11
> >>> ipf_frag_expire() at ipf_frag_expire+0x76
> >>> ipf_slowtimer() at ipf_slowtimer+0x15
> >>> ipf_timer_func() at ipf_timer_func+0x2d
> >>> callout_softclock() at callout_softclock+0x248
> >>> softint_dispatch() at softint_dispatch+0x7d
> >>> DDB lost frame for Xsoftintr+0x4f, trying 0xfffffe80cefcaff0
> >>> Xsoftintr() at Xsoftintr+0x4f
> >>> --- interrupt ---
> >>> 0:
> >>> crash> q
> >>> bash-4.3# crash -M netbsd.1.core -N netbsd.1
> >>> Crash version 7.0.1, image version 7.0.1.
> >>> System panicked: trap
> >>> Backtrace from time of crash is available.
> >>> crash> bt
> >>> _KERNEL_OPT_NARCNET() at 0
> >>> _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x7
> >>> vpanic() at vpanic+0x145
> >>> snprintf() at snprintf
> >>> startlwp() at startlwp
> >>> calltrap() at calltrap+0x11
> >>> ipf_frag_delete() at ipf_frag_delete+0x74
> 
> So the machine that's just started exhibiting this problem is panicking
> in 
> the same way as above:
> curlwp 0xfffffe847ef2c860 pid 0.5 lowest kstack 0xfffffe811d2042c0
> panic: trap
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x13c
> snprintf() at netbsd:snprintf
> startlwp() at netbsd:startlwp
> alltraps() at netbsd:alltraps+0x96
> ipf_frag_expire() at netbsd:ipf_frag_expire+0x152
> ipf_slowtimer() at netbsd:ipf_slowtimer+0x15
> ipf_timer_func() at netbsd:ipf_timer_func+0x2d
> callout_softclock() at netbsd:callout_softclock+0x248
> softint_dispatch() at netbsd:softint_dispatch+0x79
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe811d206ff0
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> 
> Contrary to my PR 50168 (which has a different traceback), this is 
> happening even with the only firewall rules being "pass in all" and
> "pass out all". In my PR, this was a sufficient workaround, so this looks 
> like a different problem.
> 
> > This seems to be dying at:
> >
> > static void
> > ipf_frag_free(ipf_frag_softc_t *softf, ipfr_t *fra)
> > {
> >        KFREE(fra);
> > ->        FBUMP(ifs_expire);
> >        softf->ipfr_stats.ifs_inuse--;
> > }
> >
> > I would comment the last 2 lines and see if I get something better.
> > There seems to be some memory corruption (surprise)....
> 
> So:
> --- sys/external/bsd/ipf/netinet/ip_frag.c      22 Jul 2012 14:27:51
> -0000 
> 1.3
> +++ sys/external/bsd/ipf/netinet/ip_frag.c      4 Aug 2016 10:37:03 -0000
> @@ -990,8 +990,8 @@
>   ipf_frag_free(ipf_frag_softc_t *softf, ipfr_t *fra)
>   {
>          KFREE(fra);
> -       FBUMP(ifs_expire);
> -       softf->ipfr_stats.ifs_inuse--;
> +/*     FBUMP(ifs_expire);
> +       softf->ipfr_stats.ifs_inuse--;*/
>   }
> 
> 
> What's the "something better" you are expecting? On this machine (amd64)
> I 
> seem to be unable to get coredumps. I suspect this may be down to 16GB 
> swap partition and 16GB RAM.
> 
> -- 
> Stephen
> 


Home | Main Index | Thread Index | Old Index