NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/52043: npf kernel panic on sparc64

>Number:         52043
>Category:       kern
>Synopsis:       npf kernel panic on sparc64
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 07 05:50:00 +0000 2017
>Originator:     Dakotah Lambert
>Release:        NetBSD 7.0.2
Earlham College
NetBSD 7.0.2 NetBSD 7.0.2 (GENERIC.DEBUG) #0: Mon Mar 6 19:35:56 EST 2017 root@:/var/src/sys/arch/sparc64/compile/GENERIC.DEBUG sparc64
I have a Sun Netra T1 AC200 server, UltraSPARC IIe at 500MHz with 1Gb RAM, and two hard drives.  I run an SSH server on the machine (public-key only, no passwords), and the SSH log tends to fill up with bad authorization attempts from what I assume are bots.  Since SCSI drives are hard to find, I installed fail2ban and configured npf in hopes of reducing the volume of data that gets dumped into this log.

The contents of /etc/npf.conf are:


set bpf.jit off

table <fail2ban> type tree dynamic

group "external" on $ext_if {
        pass in final from $local_net
        block in final from <fail2ban>
        pass out final all
        pass all

group default {
        pass final on lo0 all
        block all


The "set bpf.jit off" was added because npf told me to put it in.  The "gem0" is one of the two built-in Ethernet interfaces of the server.

Before configuring npf and allowing the module (the only LKM I use) to load, the server never went down unexpectedly.  Unfortunately, its reliability fell from "constantly up" to "crashes after a couple hours to a day" after having made this change.

Since the crash appears in ptree_insert_node_common (backtrace at end of section), I am tempted to believe that changing my table from "tree" to "hash" might act as a work-around, but I have not tested this yet.

$ ident /netbsd | grep ptree.c
     $NetBSD: ptree.c,v 1.10 2012/10/06 22:15:09 matt Exp $
$ ident /stand/sparc64/7.0/modules/npf | grep npf_tableset.c
     $NetBSD: npf_tableset.c,v 1.22 2014/08/11 01:54:12 rmind Exp $
$ ident /stand/sparc64/7.0/modules/npf | grep npf_ctl.c
     $NetBSD: npf_ctl.c,v 2015/06/10 16:57:58 snj Exp $

I am not sure where "line 501" comes from, as the assertion that failed appears to be at line 450 in the actual C code.

But following the backtrace, it looks like npf_table_insert has its third parameter set to 0.  From npf_ctl.c:

   751          case NPF_CMD_TABLE_ADD:
   752                  error = npf_table_insert(t, nct->nct_data.ent.alen,
   753                      &nct->nct_data.ent.addr, nct->nct_data.ent.mask);
   754                  break;

Then "&nct->nct_data.ent.addr" is evaluating to 0 (NULL).  Might that be the problem?


panic: kernel diagnostic assertion "PTN_LEAF_POSITION(ptn) == id.id_parent_slot" failed: file "../../../../../../lib/libkern/../../../common/lib/libc/gen/ptree.c", line 501
cpu0: Begin traceback...
cpu0: End traceback...
Stopped in pid 1426.1 (npfctl) at       netbsd:cpu_Debugger+0x4:        nop
db{0}> bt
db{0}> sync
Frame pointer is at 0x12dbbc411
Call traceback:
 netbsd:cpu_reboot+0x208(a, 1c99748, 0, 1c99400, 1cd4b60, 1c93800) fp = 12dbbc4d1
 netbsd:db_sync_cmd+0x20(100, 0, 1c19c00, 1cb3000, f, 102d3c960) fp = 12dbbc581
 netbsd:db_command+0x94(10f7144, 0, ffffffffffffffff, 12dbbcef8, 2, 73) fp = 12dbbc631
 netbsd:db_command_loop+0x118(1c16be0, 1c16c40, 0, 1c9b000, 1c16800, 16a3fe8) fp = 12dbbc771
 netbsd:db_trap+0x100(10f7148, 0, 18787e0, 1c19c00, 1c16be0, 1c9b000) fp = 12dbbc851
 netbsd:kdb_trap+0xdc(101, 0, 1838ac0, e0048000, 1cb0000, 0) fp = 12dbbc911
 netbsd:trap+0x4a0(101, 12dbbd3c0, 4, 1c19c00, 1c00000, 1cf3400) fp = 12dbbc9c1
 netbsd:1010e40+0(12dbbd3c0, 101, 10f7140, 441d0006, 14bdc60, 1cf36e0) fp = 12dbbcb11
 netbsd:vpanic+0x16c(18787e0, 1cf35b0, 1825548, e0048000, 1c19c00, 1c19c00) fp = 12dbbccf1
 netbsd:kern_assert+0x34(1825548, 12dbbd6e8, 1cf2000, 1cf35b0, 1cf3400, 104) fp = 12dbbcda1
 netbsd:ptree_insert_node_common+0x308(1825548, 1825580, 18c00c0, 18bfcb8, 1f5, 10109ef90) fp = 12dbbce61
 npf:npf_table_insert+0x198(100f1c908, 102e43e80, 0, 7fff, 2014000, 16203a0) fp = 12dbbcf41
 npf:npfctl_table+0xc8(100f1c908, 4, 12dbbdc94, ff, 0, 16) fp = 12dbbd001
 netbsd:cdev_ioctl+0x68(12dbbdc80, 80284e67, 12dbbdc80, 1, 102d3c960, 0) fp = 12dbbd0d1
 netbsd:VOP_IOCTL+0x38(c600, 80284e67, 12dbbdc80, 1, 102d3c960, 203bad0) fp = 12dbbd181
 netbsd:vn_ioctl+0xa4(1019553a0, 80284e67, 12dbbdc80, 1, 100ee3ec0, 0) fp = 12dbbd261
 netbsd:sys_ioctl+0x254(10270c400, 80284e67, 12dbbdc80, 12dbba000, 1, 1019553a0) fp = 12dbbd3c1
 netbsd:syscall+0x3a8(0, 12dbbdde0, 1020907d0, 0, 10270c400, 80284e67) fp = 12dbbd501
 netbsd:101106c+0(12dbbded0, 4e, fffffffffe559700, 36, 12dbbdf40, 102d3c960) fp = 12dbbd621
 netbsd:10cca0+0(3, 80284e67, ffffffffffffbac8, ffffffffffffbadc, 2c, ffffffffffffbadc) fp = ffffffffffffb1c1

dumping to dev 7,1 offset 2098887
dump succeeded
cpu0: rebooting
1) Boot server
2) Enable npf and fail2ban
3) Wait
4) After a few hours, the system has crashed
Workaround: Do not load the npf module.  This is not satisfactory.

Home | Main Index | Thread Index | Old Index