Subject: kern/25320: There is definitely something rotten in mbuf land
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <jlouis@mongers.org>
List: netbsd-bugs
Date: 04/25/2004 21:09:30
>Number: 25320
>Category: kern
>Synopsis: When NetBSD acts as an inet6-router, the kernel locks up
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 25 19:09:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Jesper Louis Andersen
>Release: NetBSD 2.0_BETA 22 April 2004
>Organization:
N/A
>Environment:
System: NetBSD sarah 2.0_BETA NetBSD 2.0_BETA (GENERIC) #0: Sun Apr 18 22:36:13 CEST 2004 root@annah:/usr/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
This document describes what I have tried to do in order to narrow down the
problem with my router.
Layout:
Laptop 2.0Beta <-> Router 2.0Beta via an IPv6 tunnel <-> Internet.
Symptom:
Connect from laptop over router via cvsync to grappa.unix-ag.uni-kl.de,
port 7777. Router promptly locks up. Not responding on NICs, not responding
on keyboard.
Last known good version of NetBSD: 1.6ZI sources around 8 Feb 2004.
Problem appeared with sources from: 2.0Beta 22 April 2004
Narrowing down the problem:
#0 inet6 works for ssh to another host. I can connect to grappa without
the system locks up. It is first at the time where I try the cvsync it goes
wrong.
#1 Disabled ALTQD.
Still locks up
#2 Furthermore disabled IPF/IPNAT.
Still locks up
#3 Tried to build kernel with DIAGNOSTIC/DEBUG.
Bombs kernel in the swapper
which is certainly not related
to this problem. So this does not
buy me anything.
#4 Tried connecting to grappa via another kind of protocol.
rsync. Works.
#5 Tried connecting to grappa directly from router.
This works perfectly.
#6 Tried building a GENERIC kernel and testing with that.
Works!
#7 Built sources from 28 March and tested...
Not needed anymore.
#8 Diffed kernels and looked at what is wierd
ALTQ + IPSEC Enabled
#9 Built kernel with IPSEC
This kernel locks up
#10 Built kernel with ALTQ
This kernel also locks up
Currently unchecked things:
The time frame is long. I could issue a number of kernel compiles to narrow
it down.
So to conclude:
Between 8 Feb and 22 Apr some bug was introduced which makes the kernel
lock up for me. The current problem is that I do not know how I could make
the kernel drop to DDB or force a core-dump which I could examine further
under gdb. The router is not placed in an environment where I cannot play with
it, so ideas are greatly welcomed. I might even learn a bit o' kernel debugging
in the run ;)
References:
kern/25312 seems to address a problem which could be related. I am not sure about
this at all though.
>How-To-Repeat:
Let NetBSD act as a router, and try a cvsync to grappa from behind the router.
I would like to hear of others with the same problems.
>Fix:
Workaround:
Disable IPSEC and ALTQ, but I have a hunch that there is more to the
story than that.
Fix:
Currently unknown by me. It is still too broad for me to traversing source
code.
>Release-Note:
>Audit-Trail:
>Unformatted: