tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UDP_ENCAP_ESPINUDP_NON_IKE



Ryota,

I discovered this crash on netbsd current kernels is triggered by a rarely encountered routing configuration for IPv6.

Here are the results of decoding the backtrace when it crashes (if_mcast_op is there):

address                            function            File:line number

?() at ffffffff80205845        breakpoint        ??:?
?() at ffffffff804ee4e8        vpanic src/sys/kern/subr_prf.c:343
?() at ffffffff805ec155        kern_assert        ??:?
?() at ffffffff8057ef93        if_mcast_op src/sys/net/if.c:3595 (discriminator 1) ?() at ffffffff802ce97c        in6_addmulti src/sys/netinet6/mld6.c:747 (discriminator 3)
?() at ffffffff802d333f        nd6_rtrequest src/sys/netinet6/nd6.c:1585
?() at ffffffff805af696        rtrequest1 src/sys/net/route.c:1292
?() at ffffffff805b2aac        route_output src/sys/net/rtsock.c:759
?() at ffffffff805b0c8a        route_send src/sys/net/rtsock.c:473
?() at ffffffff80520989        sosend src/sys/kern/uipc_socket.c:1075
?() at ffffffff80505e28        soo_write src/sys/kern/sys_socket.c:122
?() at ffffffff804fa433        dofilewrite src/sys/kern/sys_generic.c:350
?() at ffffffff804fa539        sys_write src/sys/kern/sys_generic.c:320
?() at ffffffff8020f2bc        sy_call src/sys/sys/syscallvar.h:66

After further testing I discovered the crash does not occur unless IPv6 routing to the VPN client is configured a certain way.

After looking at this decoding of the backtrace which involves routing and IPv6, I learned the crash is triggered by the -proxy modifier to the route command I was using in the ipv6-up script. Keep in mind I am using a NetBSD 7 userland and the NetBSD 7 version of the route command. I do not know if current's version of the route command can also use the -proxy modifier.

More details:

A while ago I discovered IPv6 connectivity to the VPN client requires that a route be added to the peer in the ipv6-up script of pppd, which is called when ppp0 comes up after phase1 and phase2 are established if the +ipv6 option is set in /etc/ppp/options. So I included this line in my ipv6-up script (In ipv6-up, $4 is the local IPv6 address on the ppp link, $5 is the remote IPv6 address on the ppp link, and $1 is the ppp interface name):

/sbin/route add -inet6 $5%$1 $4%$1 -interface -proxy

This provided connectivity between the peers on the ppp link. I added the -proxy modifier hoping to get the VPN client appear to be on the link-local ethernet network (just as pppd's proxyarp option does this in IPv4, proxy ndp theoretically can do this in IPv6). Although the -proxy modifier to the route command did not work to provide proxy ndp for IPv6 on NetBSD, nor did using the ndp proxy command, it did not cause a system crash on NetBSD 7 or 8 kernels, but this -proxy modifier is what triggers the crash on NetBSD current kernels. I did find a solution for proxy ndp on NetBSD 7, but it required a patch to the NetBSD 7 kernel and use of the -proxy modifier in the route command.

When I do this instead in ipv6-up I do not see a crash:

/sbin/route add -inet6 $5%$1 $4%$1 -interface

Without the -proxy modifier to the route command, there is no crash and IPv4 connectivity for the VPN client works fine using the proxyarp option in pppd. For IPv6, I only have connectivity on the link-local ppp link, as expected when only using link-local addresses without proxy ndp.

According to route's man page, the -proxy modifier sets the RTF_ANNOUNCE flag, and as far as I can tell from the web interface for route's man page -proxy is still valid for NetBSD 8.0, although maybe it is not actually available in NetBSD current now, in which case this crash would never be seen in ordinary systems using current's route command. But using NetBSD 7's route command with the -proxy modifier with a current kernel, you will see this crash.

Chuck


On 05/29/2018 09:26 PM, Ryota Ozaki wrote:
On Wed, May 30, 2018 at 7:02 AM Chuck Zmudzinski <frchuckz%gmail.com@localhost> wrote:
Ryota,

Here is what I am getting with the crash. I do not know how to decode
it.
Please do
   addr2line -f -e <kernel_binary> <address>
for each address.

Or
   objdump -d  <kernel_binary>
and search functions containing each address from the output by hand.

Or if you can do, build a kernel with 'makeoptions    DEBUG="-g"'
and use it, then you can get a backtrace with symbols on a panic.

Thanks,
   ozaki-r

I type bt and just get a bunch of hex numbers that I do not know how
to interpret. I try sync and get a messages that dumping to dev 142,1
(offset=6291455, size=0): not possible. After reboot, there is no core
dump in /var/crash. Maybe it is somewhere else. I checked that I do have
a dump device configured and I think I am still using the default values
for savecore. What else can I try to decode this? I tried using a
separate larger partition for /var/crash but that didn't make any
difference.

Chuck

Here is the output from bt and sync from the db prompt:

db{1}> bt
?() at ffffffff80205845
?() at ffffffff804ee4e8
?() at ffffffff805ec155
?() at ffffffff8057ef93
?() at ffffffff802ce97c
?() at ffffffff802d333f
?() at ffffffff805af696
?() at ffffffff805b2aac
?() at ffffffff805b0c8a
?() at ffffffff80520989
?() at ffffffff80505e28
?() at ffffffff804fa433
?() at ffffffff804fa539
?() at ffffffff8020f2bc
db{1}> sync

[ 1634.8391410] dumping to dev 142,1 (offset=6291455, size=0): not possible
[ 1634.8391410] rebooting...


On 05/29/2018 04:42 AM, Ryota Ozaki wrote:
On Fri, May 25, 2018 at 5:20 AM Maxime Villard <max%m00nbsd.net@localhost> wrote:

Le 24/05/2018 à 21:13, Chuck Zmudzinski a écrit :
Well, the crash is repeatable on the one week old daily snapshot current
kernel. Again, here is the current kernel I am using:

NetBSD 8.99.17 (XEN3_DOMU) #0: Wed May 16 21:54:38 UTC 2018
mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/xen/compile/XEN3_DOMU

What is happening is ... crazy.

With the current kernel, when the remote client connects, we get caught
in
an endless loop of creating ipsec security associations. The log shows
phase1
is created, then the phase2 associations, then we respond to negotiate
a new
phase1 and two new phase 2's, and I think this loop just continued
until we
ran out of memory. The windows client actually thought we were
connected and
showed it was connected in the network control panel, but the racoon log
never reported that a ppp interface was up. When you look at the
attached
snippets from the logs, I bet you will agree that many ppp interfaces
and
ipsec SAs were created and when we finally ran out of memory to create
another one, we crashed. I say this because the trace indicated the
crash
occurred at this branch. [1]. From the console at the start of the crash
report, I got this:

[ 334.5292103] panic: kernel diagnostic assertion "IFNET_LOCKED(ifp)"
failed: file "/usr/src/sys/net/if.c", line 3595
I don't understand line 3595 because if.c only has 661 lines, unless
there
was a mistake in how I copied it from the log.
You're looking at the wrong revision of if.c, yours seems to be [1].
The main issue here is that we reach this place with ifp unlocked. It's
probably not related to the system running out of memory.
That several entries get created in a loop, appears to be a separate
problem.

I know that several changes were made in netbsd-current for MPification.
It
may be that you exercise a particular condition that breaks an assumption
somewhere.
Ryota, Kengo, could you have a look?
I'm sorry I've looked the mail now.

Chuck, could you decode the backtrace of the panic? In this case the path
to the assertion (probably in if_mcast_op) is important.

Thanks,
     ozaki-r


Thanks,
Maxime
[1] https://nxr.netbsd.org/xref/src/sys/net/if.c?r=1.423#3595



Home | Main Index | Thread Index | Old Index