I've finally had some time to work on this. Here is the result so far: In the ipfilter 5.1.2 code in -current, there are two locking-related bugs (in one case a lock is released too soon, and in one case, a lock can be leaked), and the custom-built red-black tree seems to have some bugs. I haven't looked into what bugs the ipf rb-tree implementation has; at Christos's suggestion, I've just switched it to use our <sys/rbtree.h> implementation, and that solves the problem. The ipf rb-tree implementation is implemented as cpp macros, so I swapped out the rb-tree implementation by adding a different set of macros that call <sys/rbtree.h> functions. I believe this may be a minimally-invasive change to the ipf code base, and it should maintain compatibility with all the other OSes ipf is built on. That said, the bug will remain on other OSes, so Darren may want to check that out. (The bug manifests as a kernel panic or hard hang during a call to RBI_SEARCH or RBI_INSERT.) Attached is a patch that keeps my router from panicking or hanging on heavy NAT load. Would anyone like to take a look at it? I think these changes should be incorporated into -current. After that, there are still a couple other ipf problems that cause serious issues, although they don't kill the machine. For example, the ns_bucketlen measure of elements in each bucket in the hash table that keeps NAT state can be decremented below 0. Since it's an unsigned int, that makes it look as if the bucket is way over-full, and no new state can be tracked between the two hosts in question. I'll try to look into this later today. - Geoff
Attachment:
locking-and-rbtree-patch
Description: Binary data