Subject: kern/25320: There is definitely something rotten in mbuf land
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <jlouis@mongers.org>
List: netbsd-bugs
Date: 04/25/2004 21:09:30
>Number:         25320
>Category:       kern
>Synopsis:       When NetBSD acts as an inet6-router, the kernel locks up
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 25 19:09:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Jesper Louis Andersen
>Release:        NetBSD 2.0_BETA 22 April 2004
>Organization:
	N/A
>Environment:
	
	
System: NetBSD sarah 2.0_BETA NetBSD 2.0_BETA (GENERIC) #0: Sun Apr 18 22:36:13 CEST 2004 root@annah:/usr/src/sys/arch/i386/compile/GENERIC i386
	
Architecture: i386
Machine: i386
>Description:

This document describes what I have tried to do in order to narrow down the
problem with my router.

Layout:

  Laptop 2.0Beta <-> Router 2.0Beta via an IPv6 tunnel <-> Internet. 

Symptom:

  Connect from laptop over router via cvsync to grappa.unix-ag.uni-kl.de,
  port 7777. Router promptly locks up. Not responding on NICs, not responding
  on keyboard. 

Last known good version of NetBSD: 1.6ZI sources around 8 Feb 2004.
Problem appeared with sources from: 2.0Beta 22 April 2004

Narrowing down the problem:

  #0 inet6 works for ssh to another host. I can connect to grappa without
     the system locks up. It is first at the time where I try the cvsync it goes
     wrong. 
  #1 Disabled ALTQD.
		 Still locks up
  #2 Furthermore disabled IPF/IPNAT. 
		 Still locks up
  #3 Tried to build kernel with DIAGNOSTIC/DEBUG. 
		 Bombs kernel in the swapper
		 which is certainly not related
		 to this problem. So this does not
		 buy me anything.
  #4 Tried connecting to grappa via another kind of protocol.
	         rsync. Works.
  #5 Tried connecting to grappa directly from router.
		 This works perfectly.
  #6 Tried building a GENERIC kernel and testing with that.
	         Works!
  #7 Built sources from 28 March and tested... 
                 Not needed anymore.
  #8 Diffed kernels and looked at what is wierd
                 ALTQ + IPSEC Enabled
  #9 Built kernel with IPSEC
                 This kernel locks up
  #10 Built kernel with ALTQ
	         This kernel also locks up

  Currently unchecked things:
    The time frame is long. I could issue a number of kernel compiles to narrow
    it down.

So to conclude: 
  Between 8 Feb and 22 Apr some bug was introduced which makes the kernel
  lock up for me. The current problem is that I do not know how I could make
  the kernel drop to DDB or force a core-dump which I could examine further
  under gdb. The router is not placed in an environment where I cannot play with
  it, so ideas are greatly welcomed. I might even learn a bit o' kernel debugging
  in the run ;)

References:
  kern/25312 seems to address a problem which could be related. I am not sure about
  this at all though.

>How-To-Repeat:
	Let NetBSD act as a router, and try a cvsync to grappa from behind the router. 
	I would like to hear of others with the same problems.
>Fix:
	Workaround: 
		Disable IPSEC and ALTQ, but I have a hunch that there is more to the
                  story than that.

	Fix: 
		Currently unknown by me. It is still too broad for me to traversing source
                  code.

>Release-Note:
>Audit-Trail:
>Unformatted: