Subject: kern/27037: ipfilter or ipv6 crash, something to do with fragments, on sparc64
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <carton@Ivy.NET>
List: netbsd-bugs
Date: 09/26/2004 02:50:27
>Number:         27037
>Category:       kern
>Synopsis:       ipfilter or ipv6 crash, something to do with fragments, on sparc64
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 26 02:51:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Miles Nordin
>Release:        NetBSD 2.0_BETA
>Organization:
	
>Environment:
	
	
System: NetBSD lucette 2.0_BETA NetBSD 2.0_BETA (LUCETTE-$Revision: 1.1 $) #4: Sat Sep 11 13:03:44 EDT 2004 carton@castrovalva:/scratch/src/sys/arch/sparc64/compile/LUCETTE sparc64
Architecture: sparc64
Machine: sparc64
>Description:
in the text below, 3ffe:0bc0:206::/48 is my ipv6 prefix.

Script started on Sun Sep 26 02:09:38 2004
$ sudo cu -l ttyC1 -s 9600
Password:
Connected.

db> db mesg 0t400
ansmit underrun; new threshold: 96/256 bytes
tlp1: transmit underrun; new threshold: 128/512 bytes
tlp0: receive ring overrun
tlp1: transmit underrun; new threshold: 160/1024 bytes
arpresolve: can't allocate llinfo on tlp2 for 127.0.0.1
tlp2: transmit underrun; new threshold: 96/256 bytes
tlp2: transmit underrun; new threshold: 128/512 bytes
panic: lockmgr: no context
kdb breakpoint at 130ba04
db> bt
lockmgr(18468a0, 1, 0, 0, 0, 318c000) at netbsd:lockmgr+0x28c
uvmfault_lookup(e00170a0, 0, e0017948, 0, 3ffe0bc0, 0) at netbsd:uvmfault_lookup
+0x1c0
uvm_fault(1846898, 0, 2, 2, 180c400, 500) at netbsd:uvm_fault+0x6c
data_access_fault(e00172a0, 30, 1044534, 0, 0, 80080d) at netbsd:data_access_fau
lt+0x418
?(0, e00176c0, 4d0, 1, 3, e0017c80) at 0x100871c
fr_coalesce(e00176c0, e00176f0, ffffffffffffffff, e00176e0, 2, e00176e0) at netb
sd:fr_coalesce+0xc
frpr_ipv6hdr(e00176c0, 996fe, 10b1ba0, 0, 2dbc0c17fc0, 1) at netbsd:frpr_ipv6hdr
+0x1b8
fr_makefrip(28, c6ba840, e00176c0, fefefefefefefeff, 12ff564, 2ea400) at netbsd:
fr_makefrip+0x60
fr_checkicmp6matchingstate(e0017a20, 30, ffffffffffffffff, 0, 0, 318c000) at net
bsd:fr_checkicmp6matchingstate+0xdc
fr_stlookup(0, 180c800, e0017948, 0, 3ffe0bc0, 0) at netbsd:fr_stlookup+0x518
fr_checkstate(e0017a20, e0017a1c, e0017a20, 180c400, 180c400, 500) at netbsd:fr_
checkstate+0x27c
fr_check(3358880, 10, 30c0078, 0, e0017ba8, a) at netbsd:fr_check+0x6d4
pfil_run_hooks(189d138, e0017d20, 30c0078, 1, 3, e0017c80) at netbsd:pfil_run_ho
oks+0x54
ip6_input(3358880, fc660000, 3082b80, 0, 7, 512) at netbsd:ip6_input+0xc78
--db_more--           ip6intr(0, 996fe, 10b1ba0, 0, 2dbc0c17fc0, 1) at netbsd:ip6intr+0x54
softnet(1000000, 0, e0017ed0, fefefefefefefeff, 12ff564, 2ea400) at netbsd:softn
et+0x88
sparc64_ipi_flush_all(0, 0, 136ba84, 0, ffffffffffffffff, 0) at netbsd:sparc64_i
pi_flush_all+0x23c
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 1003          1008     1003        405 2  0x4002    1              ksh   ttyin
 1008          1005     1005        405 2   0x100    1             sshd  select
 1005            96     1005          0 2   0x101    1             sshd   netio
 1428          1079     1428        405 2  0x4002    1              ksh   ttyin
 1079           609      609        405 2   0x100    1             sshd  select
 609             96      609          0 2   0x101    1             sshd   netio
 662           1050     1050         12 2  0x4100    1           pickup  select
 531              1      531          0 2  0x4002    1            getty   ttyin
 295              1      295          0 2   0x101    1             bgpd  select
 172              1      172          0 2       0    1             cron nanosle
 1212             1     1212          0 2       0    1            inetd  kqread
 984              1      984         67 2       0    1             ircd nanosle
 658              1      658          0 2   0x101    1           ospf6d  select
 989              1      989          0 2   0x101    1            ospfd  select
 227           1050     1050         12 2  0x4100    1             qmgr  select
 1050             1     1050          0 2  0x4108    1           master  select
 96               1       96          0 2       0    1             sshd  select
 913              1      913          0 2       0    1           rtadvd    poll
 821              1      821          0 2       0    1            rarpd  select
 893              1      893         15 2   0x100    1             ntpd   pause
 824              1      824          0 2       0    1            dhcpd  select
--db_more--            537              1      537          0 2       0    1        mount_mfs  mfsidl
 498              1      498          0 2       0    1          rpcbind    poll
 471              1      471         14 2   0x500    3            named       *
 443              1      443          0 2       0    1            ipmon nanosle
 382              1      382          0 2       0    1            altqd  select
 319              1      319          0 2       0    1          syslogd    poll
 374              1      374          0 2   0x101    1            zebra  select
 307              1       16          0 2  0x4002    1          choparp  select
 15               0        0          0 2 0x20200    1         aiodoned aiodone
 14               0        0          0 2 0x20200    1          ioflush  syncer
 13               0        0          0 2 0x20200    1       pagedaemon pgdaemo
 12               0        0          0 2 0x20200    1       lfs_writer lfswrit
 11               0        0          0 2 0x20200    1        atapibus0  sccomp
 10               0        0          0 2 0x20200    1         scsibus1  sccomp
 9                0        0          0 2 0x20200    1         scsibus0  sccomp
 8                0        0          0 2 0x20200    1             usb1  usbevt
 7                0        0          0 2 0x20200    1          atabus1   atath
 6                0        0          0 2 0x20200    1          atabus0   atath
 5                0        0          0 2 0x20200    1          usbtask  usbtsk
 4                0        0          0 2 0x20200    1             usb0  usbevt
 3                0        0          0 2 0x20200    1           sysmon smtaskq
 2                0        0          0 2 0x20200    1        cryptoret crypto_
 1                0        1          0 2  0x4000    1             init    wait
--db_more--            0               -1        0          0 2 0x20200    1          swapper schedul
db> reboot
syncing disks... gem1: MAC rx fault, status 3
tlp2: receive ring overrun
tlp1: receive ring overrun
tlp0: receive ring overrun
3 3 2 1 done
rebooting

Res
LOM event: +15d+5h10m35s host reset
etting ... 


Netra T1 200 (UltraSPARC-IIe 500MHz), No Keyboard
[...]
Script done on Sun Sep 26 02:17:35 2004
>How-To-Repeat:
mount NFS over IPv4, client is 192.168.3.102 on crashed machine's tlp3
                     server is 216.158.24.196 on crashed machine's tlp2

crashed machine has a lot of network activity.  the NFS mount is the 
only thing I know of that I don't usually do, but I can't be absolutely 
sure what caused the crash.

excerpts from crashed machine's ipf.conf

# grimalkin nfs
pass in  quick on tlp2 proto udp from 216.158.24.196/32 port > 1023 to 192.168.3.102/32 port > 1023
pass in  quick on tlp2 proto udp from 216.158.24.196/32 port = nfs to 192.168.3.102/32
pass out quick on tlp2 proto udp from 192.168.3.102/32 to 216.158.24.196/32 with frag
pass in  quick on tlp2 proto udp from 216.158.24.196/32 to 192.168.3.102/32 with frag
#
# outgoing only tcp
pass out quick on tlp2 proto tcp from 192.168.0.0/16 to any flags S/SAFR keep state
block out log on tlp2 proto tcp from 192.168.0.0/16 to any
block return-icmp(filter-prohib) in log on tlp2 proto tcp from any to 192.168.0.0/16
#
# outgoing only udp
# hrm... maybe too permissive.
pass out quick on tlp2 proto udp from 192.168.0.0/16 to any keep state
block out log on tlp2 proto udp from 192.168.0.0/16 to any
block return-icmp(filter-prohib) in log on tlp2 proto udp from any to 192.168.0.0/16
#
# for ICMP_INFOTYPE stuff like echo-request, ask to keep state.  
# not sure if it works for all these.
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type echo
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type timest 
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type inforeq 
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type maskreq
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type echo keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type timest keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type inforeq keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type maskreq keep state
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type echorep 
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type timestrep 
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type inforep 
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type maskrep
#
# no redirs.
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type redir 
#
# to facilitate experimentation, pass what we don't understand.
pass in quick on tlp2 proto icmp from any to 192.168.0.0/16
#

>Fix:
unknown.  firewall is a semi-production machine.  not sure I can repeat it.

>Release-Note:
>Audit-Trail:
>Unformatted:
 I'm running 2.0 BETA 2004-08-15 with the following files upgraded:
 
 netinet/fil.c                   1.61.2.7           pr#26666
 kern/uipc_mbuf.c                1.80.2.3           pr#26733
 sys/mbuf.h                      1.90.2.3           pr#26733
 netinet/ip_fil_netbsd.c         1.3.2.10           pr#26733
 netinet6/raw_ip6.c              1.63.2.2           pr#26733
 kern/kern_lock.c                1.75.2.1