Subject: kern/29839: kernel panic (with the help of pf) on diskless sun4m
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <fanch@enki.dyndns.org>
List: netbsd-bugs
Date: 03/30/2005 19:08:01
>Number:         29839
>Category:       kern
>Synopsis:       kernel panic (with the help of pf) on diskless sun4m
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 30 19:08:00 +0000 2005
>Originator:     fanch
>Release:        NetBSD 3.99.1 (-current 26/03/05)
>Organization:
>Environment:
System: NetBSD bean.reso 3.99.1 NetBSD 3.99.1 (SUN4MC) #1: Tue Mar 29 21:56:44 CEST 2005 root@ptitordi:/usr/tmp/src/usr/src/sys/arch/sparc/compile/SUN4MC
Architecture: sparc
Machine: sparc
>Description:
	On a diskless sun4m (SS10, SuperSparc II, 96Mo) used for setting up a
	firewall, kernel crash a few seconds/minutes after enabling pf. pf is
	disabled (pass in/out on le0 all).

	The firewall is intended to be a sun4c (SS1+, 32Mo), so the kernel is
	stripped to a minimun, sun4m/c compatible.

	A ddb session :

panic: lockmgr: no context
Stopped at      netbsd:cpu_Debugger+0x4:        or              %o7, %g0, %g1
db> bt
cpu_Debugger(0xf01dd528, 0xf020b634, 0xf04e7000, 0x2, 0x100, 0xf0215000) at netbsd:lockmgr+0x28c
lockmgr(0xf022e2bc, 0x1, 0x0, 0xf048b2a8, 0x0, 0xf052605c) at netbsd:uvmfault_lookup+0x18c
uvmfault_lookup(0xf020b780, 0x0, 0xf020b6b0, 0xf04ac8f0, 0x0, 0xf0526048) at netbsd:uvm_fault+0x58
uvm_fault(0xf022e2b8, 0x0, 0x0, 0x1, 0x1236, 0xeb0) at netbsd:mem_access_fault4m+0x350
mem_access_fault4m(0x9, 0x326, 0x14, 0xf020b8e8, 0x0, 0xf006ec00) at 0xf0006408
0xf0006408(0x0, 0xf04e7000, 0xf020ba04, 0x0, 0x6, 0xf020b8e0) at netbsd:pfil4_wrapper+0x34
pfil4_wrapper(0x0, 0xf020ba04, 0xf04e7000, 0x1, 0xf04c46e0, 0xffff) at netbsd:pfil_run_hooks+0x88
pfil_run_hooks(0xf02263c0, 0xf020bac4, 0xf04e7000, 0x1, 0x0, 0xf0480c64) at netbsd:ip_input+0x228
ip_input(0xf0480c00, 0xf0480c00, 0x440, 0xeedb, 0x100, 0xf0223a98) at netbsd:ipintr+0x88
ipintr(0x0, 0xfe029010, 0x0, 0x6a23, 0x100, 0x400) at netbsd:softnet+0x78
softnet(0xf020bbb0, 0xf019fd68, 0x100, 0x408000e7, 0x37, 0x424ae261) at 0xf0006870
0xf0006870(0x1, 0x0, 0xf00e6fa8, 0xf0232000, 0x200d2950, 0x60) at netbsd:cpu_exit+0xb8
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 436            479      436          0 2  0x4002    1              ksh   ttyin
 479            478      479       1000 2  0x4102    1               su    wait
 478            467      478       1000 2  0x4002    1              ksh   pause
 467            471      467          0 2  0x4103    1            login    wait
 471            441      441          0 2  0x4000    1          telnetd    poll
 438            315      438          0 2  0x4002    1          tcpdump     bpf
 315            428      315          0 2  0x4002    1              ksh   pause
 428              1      428          0 2  0x4103    1            login    wait
 405              1      405          0 2       0    1             cron nanosle
 441              1      441          0 2       0    1            inetd  kqread
 397              1      397          0 2   0x100    1         sendmail  select
 363              1      363          0 2       0    1             sshd  select
 317              1      317          0 2       0    1             ntpd   pause
 122              1      122          0 2       0    1         ifwatchd   netio
 120              1      120          0 2       0    1          syslogd  kqread
 9                0        0          0 2 0x20200    1         aiodoned aiodone
 8                0        0          0 2 0x20200    1          ioflush  syncer
 7                0        0          0 2 0x20200    1       pagedaemon pgdaemo
 6                0        0          0 2 0x20200    1            nfsio  nfsidl
 5                0        0          0 2 0x20200    1            nfsio  nfsidl
 4                0        0          0 2 0x20200    1            nfsio  nfsidl
 3                0        0          0 2 0x20200    1            nfsio  nfsidl
 2                0        0          0 2 0x20200    1         scsibus0  sccomp
 1                0        1          0 2  0x4000    1             init    wait
 0               -1        0          0 2 0x20200    1          swapper schedul
db> show uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  23293 VM pages: 4539 active, 0 inactive, 889 wired, 16380 free
  min  10% (25) anon, 10% (25) file, 5% (12) exec
  max  80% (204) anon, 50% (128) file, 30% (76) exec
  pages  2321 anon, 2028 file, 1417 exec
  freemin=64, free-target=85, inactive-target=0, wired-max=7764
  faults=83541, traps=42953, intrs=156516, ctxswitch=19434
  softint=19057, syscalls=43360, swapins=0, swapouts=0
  fault counts:
    noram=0, noanon=0, pgwait=0, pgrele=0
    ok relocks(total)=704(704), anget(retrys)=9669(0), amapcopy=5585
    neighbor anon/obj pg=8034/60898, gets(lock/unlock)=26345/704
    cases: anon=6264, anoncow=3405, obj=13427, prcopy=12918, przero=6190
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1, nanon=25929, nanonneeded=25929 nfreeanon=24000
    swpages=4096, swpginuse=0, swpgonly=0 paging=0
db>

When I take a look at the code of cpu_exit, I see a curlwp=NULL. The exact
condition which trigger the lockmgr message seen above. The offset in cpu_exit
is different at each panic, the others function offsets remain the same.

If it can help :
$ nm -n netbsd
[...]
f0006324 t memfault_sun4m
f0006444 t normal_mem_fault
[...]
f0006758 t softintr_common
f00068a0 T sparc_interrupt4m
[...]

>How-To-Repeat:
	Enable pf on a diskless host ?
>Fix:
	- intr not disabled ?
	- non-pageable text made pageable ?