Subject: kern/22005: i386 panic running ifconfig with heavy UDP traffic
To: None <gnats-bugs@gnats.netbsd.org>
From: Douglas Wade Needham <dneedham@naapo.org>
List: netbsd-bugs
Date: 06/27/2003 23:19:36
>Number:         22005
>Category:       kern
>Synopsis:       System panic while a ifconfig is running
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 28 03:20:01 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Douglas Wade Needham
>Release:        NetBSD 1.6.1 (sources as of Apr 6 09:44:00EDT)
>Organization:
	North American Astrophysical Observatory
>Environment:
System: NetBSD bfc0 1.6.1 NetBSD 1.6.1 (BFC) #0: Fri Jun 27 06:53:05 EDT 2003 root@display:/usr/src/sys/arch/i386/compile/BFC i386
Architecture: i386
Machine: i386
Hardware: Multiple Gigabyte GA7VAX with 256MB of RAM using dual NICs.
>Description:
    
    While sustaining an extremely heavy UDP traffic flow (~80Mbps) on
    one of two rtk interfaces, the machine will panic.  The data on
    the interface consists of packets primarily having 4KB of
    application data from a radio telescope.  Messages end with the
    following:

	uvm_fault(0xd413f17c, 0x0, 0, 1) -> e
	fatal page fault in supervisor mode
	trap type 6 code 0 eip c0240526 cs 8 eflags 10286 cr2 1 cpl c0000000
	panic: trap
	syncing disks... uvm_fault(0xd413f17c, 0x0, 0, 1) -> e
	fatal page fault in supervisor mode
	trap type 6 code 0 eip c0240526 cs 8 eflags 10286 cr2 1 cpl c0000000
	panic: trap

    Inspection of the crash dump indicates that an ifconfig is
    being run, and that the kernel stack is as follows:

	#0  0x1 in ?? ()
	#1  0xc029c64b in cpu_reboot (howto=260, bootstr=0x0)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/machdep.c:2236
	#2  0xc01fc426 in panic ()
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/subr_prf.c:253
	#3  0xc02a36e2 in trap (frame={tf_gs = 16, tf_fs = 16, tf_es = 16, tf_ds = 16, 
	      tf_edi = -1, tf_esi = -757862384, tf_ebp = -736859296, tf_ebx = -1, 
	      tf_edx = -1063837184, tf_ecx = -1064109312, tf_eax = 0, tf_trapno = 6, 
	      tf_err = 0, tf_eip = -1071381210, tf_cs = 8, tf_eflags = 66182, 
	      tf_esp = -1064109312, tf_ss = -1072685334, tf_vm86_es = 4, 
	      tf_vm86_ds = -1071657377, tf_vm86_fs = 10, tf_vm86_gs = 5})
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/trap.c:231
	#4  0xc0100c39 in calltrap ()
	#5  0xc0240253 in ipintr ()
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/ip_input.c:381
	#6  0xc0101fc4 in Xsoftnet ()
	#7  0xc029c623 in cpu_reboot (howto=256, bootstr=0x0)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/machdep.c:2223
	#8  0xc01fc426 in panic ()
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/subr_prf.c:253
	#9  0xc02a36e2 in trap (frame={tf_gs = 16, tf_fs = 16, tf_es = 16, tf_ds = 16, 
	      tf_edi = -1, tf_esi = -757759984, tf_ebp = -736858912, tf_ebx = -1, 
	      tf_edx = -1063837184, tf_ecx = -1063525376, tf_eax = 0, tf_trapno = 6, 
	      tf_err = 0, tf_eip = -1071381210, tf_cs = 8, tf_eflags = 66182, 
	      tf_esp = -1063525376, tf_ss = -1072685282, tf_vm86_es = 4, 
	      tf_vm86_ds = 99747316, tf_vm86_fs = -1454069420, tf_vm86_gs = 23708})
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/trap.c:231
	#10 0xc0100c39 in calltrap ()
	#11 0xc0240253 in ipintr ()
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/ip_input.c:381
	#12 0xc0101fc4 in Xsoftnet ()
	#13 0xc0257eef in udp_usrreq (so=0xc096bd24, req=11, m=0x8040691a, 
	    nam=0xd4146ec0, control=0xc08ba42c, p=0xd4173744)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/udp_usrreq.c:963
	#14 0xc0227587 in ifioctl (so=0xc096bd24, cmd=2151704858, 
	    data=0xd4146ec0 "rtk1", p=0xd4173744)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../net/if.c:1532
	#15 0xc0201e08 in soo_ioctl (fp=0xd413009c, cmd=2151704858, 
	    data=0xd4146ec0 "rtk1", p=0xd4173744)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/sys_socket.c:139
	#16 0xc01ff50d in sys_ioctl (p=0xd4173744, v=0xd4146f80, retval=0xd4146f78)
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/sys_generic.c:616
	#17 0xc02a31cb in syscall_plain (frame={tf_gs = 31, tf_fs = 31, tf_es = 31, 
	      tf_ds = 31, tf_edi = 134716555, tf_esi = -1077945544, 
	      tf_ebp = -1077945684, tf_ebx = 0, tf_edx = 0, tf_ecx = 134783568, 
	      tf_eax = 54, tf_trapno = 3, tf_err = 2, tf_eip = 134692867, tf_cs = 23, 
	      tf_eflags = 663, tf_esp = -1077945824, tf_ss = 31, tf_vm86_es = 0, 
	      tf_vm86_ds = 0, tf_vm86_fs = 0, tf_vm86_gs = 0})
	    at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/syscall.c:140
	#18 0xc0100d06 in syscall1 ()
	can not access 0xbfbfdaac, invalid translation (invalid PDE)
	can not access 0xbfbfdaac, invalid translation (invalid PDE)
	Cannot access memory at address 0xbfbfdaac

    The kernel in question is essentially a GENERIC kernel with most
    of the unused NICs/HBAs disabled, and APM enabled.  This may be a
    race condition between the ifconfig and a interrupt.  A more
    complete crash dump analysis is available at the following URL:

        http://cinnion.ka8zrt.com/bfc0_crash_analysis

    Panics happen about once or twice a day, and occur both on a
    system with Athlon 2400+ and an Athlon 2500+ (Barton).  In
    addition, the following related deficiences have been noted:

    - Large numbers of watchdog timeouts can occur on the interface
      handling the data from the telescope.  However, none were seen before
      the latest panic. 
    - Process listings using both ps and the xps gdb macro do not return a 
      valid PPID.

    And finally, though of a lesser degree.

     - While other OSes (Linux, UnixWare, HP/UX) permit the
       transmission of UDP with application data packet sizes of 4KB,
       NetBSD does not permit this even though this is perfectly valid
       per the RFCs (IP should fragment and reassemble).  Yea...not
       ideal, but what our main researcher will use in argument for
       Linux.

     Please send email, and I can get you additional information if
     necessary.

>How-To-Repeat:
    
    Subject a system to an extremely heavy UDP/IP load (around 2K
    packets/sec), and run ifconfig (exact arguments currently
    unknown).  It may take the 4KB application payload to get the
    fragmentation and trigger the problem, but I suspect all it would
    do is amplify the problem, not cause it.

>Fix:
	Unknown at this time
>Release-Note:
>Audit-Trail:
>Unformatted: