NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/57136: NPF panic probably on a NPF table list call



>Number:         57136
>Category:       kern
>Synopsis:       NPF panic probably on a NPF table list call
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 23 20:20:00 +0000 2022
>Originator:     brad%anduin.eldar.org@localhost
>Release:        NetBSD 10.0_BETA
>Organization:
	eldar.org
>Environment:
System: NetBSD anduin.eldar.org 10.0_BETA NetBSD 10.0_BETA (ANDUIN) #0: Thu Dec 22 11:07:33 EST 2022 brad%samwise.nat.eldar.org@localhost:/usr/src/sys/arch/amd64/compile/ANDUIN
Architecture: x86_64
Machine: amd64
>Description:

Sorry for the lack of detail.  This is probably a KASSERT() in the npf
code for a table list.  I have had it happen a couple of times in the
last couple of days.

The following was copied from a image as I could not get a kernel
dump, or otherwise save the panic.  I also don't exactly know what
assert (if that was what it was) may have fired as the screen scrolled
off before I could see that and it wasn't saved anywhere:

breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x177
kern_assert() at netbsd:kern_assert+0x4b
mi_switch() at netbsd:mi_switch+0x7e2
sleepq_block() at netbsd:sleepq_block+0x13a
mtsleep() at netbsd:mtsleep+0x17f
uvmfault_promote() at netbsd:uvmfault_promote+0x4b2
uvm_fault_internal() at netbsd:uvm_fault_internal+0x1488
trap() at netbsd:trap+0x46a
--- trap (number 6) ---
copyout() at netbsd:copyout+0x33
npf_table_list() at netbsd:npf_table_list+0x57
npfctl_table() at netbsd:npfctl_table+0xba
spec_ioctl() at netbsd:spec_ioctl+0x58
VOP_IOCTL() at netbsd:VOP_IOCTL+0x41
vn_ioctl() at netbsd:vn_ioctl+0xad
sys_ioctl() at netbsd:sys_ioctl+0x555
syscall() at netbsd:syscall+0x9c
--- syscall (number 54) ---

This panic occurred when the system was a DOMU pvh with pvshim enabled,
but has since been switched to a pure PVH guest for other reasons.

There is a cron job that runs pretty often on this system that pulls
the output from a particular npf table using npfctl, something like
"npfctl table badguys list > output_file" and compares this output to
a current list of badguys.  Changes to the table are then made with
"npfctl table badguys add ...." and remove.  After the changes have
been made, another "npfctl table badguys list" is done comparing that
output to the new list to make sure that they are the same.  From the
logs, it seems that the panic happened on this second list attempt.  I
can say with a pretty good certainty that nothing actually changed in
the table when this panic'ed.  So, this would have reduced to a table
list, a very short delay, and then another table list.

>How-To-Repeat:

I don't know what the situation is that triggers this.  The system is
pretty busy doing a LOT of other stuff all of the time (router, NFS
server, rabbitmq server, LDAP server, kerberos slave, etc...), and the
only unusual thing the last time was a copy of a bunch of big files
(well, a block-attach'ed thumb drive from DOM0).  The previous times
did not have anything unusual going on that I know of.

>Fix:

Don't know...



Home | Main Index | Thread Index | Old Index