Subject: kern/27166: ``Invalid argument'' loading ipfilter 4.1.3 rules
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <carton@Ivy.NET>
List: netbsd-bugs
Date: 10/06/2004 17:09:07
>Number:         27166
>Category:       kern
>Synopsis:       ``Invalid argument'' loading ipfilter 4.1.3 rules
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 06 17:10:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Miles Nordin
>Release:        NetBSD 2.0_BETA 2004-08-15
>Organization:
Ivy Ministries
>Environment:
locally pulled up most kernel changes listed in doc/CHANGES-2.0 for 2.0_RC2
netinet/fil.c                   1.61.2.7           pr#26666 t#783, t#888
kern/uipc_mbuf.c                1.80.2.4           pr#26733 t#831, and t#841
sys/mbuf.h                      1.90.2.4           pr#26733 t#831, and t#839
netinet/ip_fil_netbsd.c         1.3.2.10           pr#26733 t#833
netinet6/raw_ip6.c              1.63.2.2           pr#26733 t#836
kern/kern_lock.c                1.75.2.1           t#752
lib/libkern/arc4random.c        1.11.2.1           t#824
nfs/nfs_bio.c                   1.116.2.2          t#858
nfs/nfs_subs.c                  1.132.2.3          t#858, t#889
nfs/nfs_var.h                   1.42.2.2           t#858
nfs/nfsnode.h                   1.46.2.2           t#858
ufs/ufs/ufs_bmap.c              1.28.2.2           t#859
sys/netinet/tcp_input.c         1.190.2.6          t#861
sys/netinet/tcp_subr.c          1.160.2.5          t#861
sys/netinet/tcp_var.h           1.106.2.2          t#861

System: NetBSD lucette 2.0_BETA NetBSD 2.0_BETA (LUCETTE-$Revision: 1.1 $) #4: Mon Oct 4 23:44:38 EDT 2004 carton@castrovalva:/scratch/src/sys/arch/sparc64/compile/LUCETTE sparc64
Architecture: sparc64
Machine: sparc64
>Description:
$ sudo /etc/rc.d/ipfilter reload
Reloading ipfilter rules.
380:ioctl(add/insert rule): Invalid argument
386:ioctl(add/insert rule): Invalid argument
Set 1 now inactive
$ 

0. Note this is not a syntax error in ipf.conf, because /sbin/ipf parsed 
   the file and called into the kernel to load it, and in fact even 
   switched to the new ruleset.

1. If I don't edit /etc/ipf.conf, the lines where it encounters the error 
   don't change if I 'ipfilter reload' over and over.  If I stop ipnat and 
   ipfilter and restart them, the error still doesn't change.

2. If I add a comment to /etc/ipf.conf, the error still happens on the 
   <n>th rule loaded, not the <n>th line of the file.

3. If I comment out a rule above the one where the error occurred, or if 
   I comment out the rule that caused the error, the error still happens 
   on the <n>th rule loaded.  AFAICT it doesn't have to do with the 
   specific content of the rule.

4. If I comment out large numbers of rules, the errors move around 
   erratically, and I can get as many as 5 errors.

5. AFAICT those rules that don't experience errors loading into the kernel 
   are blocking/passing traffic just fine, and they are actually loaded:

$ ( sudo ipfstat -il; sudo ipfstat -ol ) | wc -l
     226
$ sed -e '/^#/d' -e '/^$/d' < /etc/ipf.conf | wc -l
     228

6. Here is /etc/ipf.conf near where the error occurred:

$ awk '{ print FNR, " ", $0 }' < /etc/ipf.conf
[...]
377   pass out quick on gem1 proto icmp from 192.168.0.0/16 to any icmp-type echo keep state
378   pass out quick on tlp0 proto icmp from 192.168.0.0/16 to any icmp-type timest keep state
379   pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type timest keep state
380   pass out quick on gem1 proto icmp from 192.168.0.0/16 to any icmp-type timest keep state
381   pass out quick on tlp0 proto icmp from 192.168.0.0/16 to any icmp-type inforeq keep state
382   pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type inforeq keep state
383   pass out quick on gem1 proto icmp from 192.168.0.0/16 to any icmp-type inforeq keep state
384   pass out quick on tlp0 proto icmp from 192.168.0.0/16 to any icmp-type maskreq keep state
385   pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type maskreq keep state
386   pass out quick on gem1 proto icmp from 192.168.0.0/16 to any icmp-type maskreq keep state
387   block in quick on tlp0 proto icmp from any to 192.168.0.0/16 icmp-type echorep 
388   block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type echorep 
389   block in quick on gem1 proto icmp from any to 192.168.0.0/16 icmp-type echorep 

As you can see, the error doesn't occur on the last rules, and there are 
very similar rules right after the one with the error that get loaded fine.

>How-To-Repeat:
not totally sure this problem will persist after a reboot.  will ammend 
the PR after rebooting, but I can't now.

system has been somewhat busy in the past, > 25,000 NAT state entries and 
another 'keep state' for each of those.

>Fix:
workaround is to delete blocks of obsolete rules, move order-independent 
rules around, until the error occurs on a rule I don't care about much.

>Release-Note:
>Audit-Trail:
>Unformatted: