Subject: port-sparc/5641: kernel fault on sun4c machines
To: None <gnats-bugs@gnats.netbsd.org>
From: Brad Spencer <brad@anduin.eldar.org>
List: netbsd-bugs
Date: 06/22/1998 17:00:53
>Number:         5641
>Category:       port-sparc
>Synopsis:       kernel fault on sun4c machines
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 22 14:05:00 1998
>Last-Modified:
>Originator:     Brad Spencer
>Organization:
	Sitting at home
>Release:        Mid to late May 1998
>Environment:
	
NetBSD valinor.eldar.org 1.3E NetBSD 1.3E (VALINOR) #3: Sat May 23 10:30:37 EDT 1998     brad@elrond.eldar.org:/usr/src/sys/arch/sparc/compile/VALINOR sparc

>Description:

I have a Sparc 2 which will panic with a 'kernel fault' under load.
The machine is being used to run Majordomo and typically has a number
of sendmail daemons running.  It is doing little else.  Prior to the
SS2, an IPX did the same task.

Here are a couple of crash dump outputs:

valinor% gdb /sys/arch/sparc/compile/VALINOR/netbsd.gdb 
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-netbsd), Copyright 1996 Free Software Foundation, Inc...
(gdb) target kcore netbsd.3.core
panic: kernel fault
#0  mi_switch () at ../../../../kern/kern_synch.c:631
631             cpu_switch(p);
(gdb) where
#0  mi_switch () at ../../../../kern/kern_synch.c:631
#1  0xf00261f0 in bpendtsleep () at ../../../../kern/kern_synch.c:370
#2  0xf009a098 in uvm_scheduler () at ../../../../uvm/uvm_glue.c:421
#3  0xf00192cc in main () at ../../../../kern/init_main.c:412
(gdb) print *p
$1 = {p_forw = 0x0, p_back = 0x0, p_list = {le_next = 0x0, 
    le_prev = 0xf01f7208}, p_cred = 0xf010d180, p_fd = 0xf00ff760, 
  p_stats = 0xf00e434c, p_limit = 0xf010cce0, p_vmspace = 0xf0105380, 
  p_sigacts = 0xf00e4220, p_flag = 516, p_unused = 0 '\000', 
  p_stat = 3 '\003', p_pad1 = "\000", p_pid = 0, p_hash = {le_next = 0x0, 
    le_prev = 0x0}, p_pglist = {le_next = 0xf01f7000, le_prev = 0xf010b458}, 
  p_pptr = 0x0, p_sibling = {le_next = 0x0, le_prev = 0x0}, p_children = {
    lh_first = 0xf01f7000}, p_oppid = 0, p_dupfd = 0, p_estcpu = 0, 
  p_cpticks = 0, p_pctcpu = 0, p_wchan = 0xf0107ff8, 
  p_wmesg = 0xf0099fa8 "scheduler", p_swtime = 41738, p_slptime = 7, 
  p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {
      tv_sec = 0, tv_usec = 0}}, p_rtime = {tv_sec = 0, tv_usec = 357198}, 
  p_uticks = 0, p_sticks = 35, p_iticks = 0, p_traceflag = 0, p_tracep = 0x0, 
  p_siglist = 0, p_textvp = 0x0, p_locks = 0, p_simple_locks = 0, 
  p_holdcnt = 0, p_emul = 0xf00e6ef4, p_spare = {0}, p_sigmask = 0, 
  p_sigignore = 407404544, p_sigcatch = 0, p_priority = 4 '\004', 
  p_usrpri = 50 '2', p_nice = 20 '\024', 
  p_comm = "swapper\000\000\000\000\000\000\000\000\000", p_pgrp = 0xf010b450, 
  p_thread = 0x0, p_addr = 0xf00e4000, p_md = {md_tf = 0x0, md_fpstate = 0x0, 
    md_flags = 0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0}
(gdb) quit


.... and ....


valinor% gdb /sys/arch/sparc/compile/VALINOR/netbsd.gdb
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-netbsd), Copyright 1996 Free Software Foundation, Inc...
(gdb) target kcore netbsd.4.core
panic: kernel fault
#0  mi_switch () at ../../../../kern/kern_synch.c:631
631             cpu_switch(p);
(gdb) where
#0  mi_switch () at ../../../../kern/kern_synch.c:631
#1  0xf00261f0 in bpendtsleep () at ../../../../kern/kern_synch.c:370
#2  0xf00410bc in biowait (bp=0xf02034d0) at ../../../../kern/vfs_bio.c:811
#3  0xf00a5eb8 in uvm_swap_io (pps=0x3, startslot=-268169216, npages=1, 
    flags=1048576) at ../../../../uvm/uvm_swap.c:1763
#4  0xf00a5c24 in uvm_swap_get (page=0xf0186b94, swslot=1007, flags=2)
    at ../../../../uvm/uvm_swap.c:1630
#5  0xf0097968 in uao_get (uobj=0xf00f06e8, offset=4028132244, pps=0xf00e5ce0, 
    npagesp=0xf00e5ce0, centeridx=0, access_type=0, advice=1, flags=0)
    at ../../../../uvm/uvm_aobj.c:929
#6  0xf00993d8 in uvm_fault (orig_map=0x4, vaddr=4052602880, fault_type=2, 
    access_type=7) at ../../../../uvm/uvm_fault.c:1281
#7  0xf0099a2c in uvm_fault_wire (map=0xf00f0730, start=4052598784, 
    end=4052606976) at ../../../../uvm/uvm_fault.c:1692
#8  0xf0099f2c in uvm_swapin (p=0xf0311600) at ../../../../uvm/uvm_glue.c:360
#9  0xf009a0c8 in uvm_scheduler () at ../../../../uvm/uvm_glue.c:438
#10 0xf00192cc in main () at ../../../../kern/init_main.c:412
(gdb) print *p
$1 = {p_forw = 0xf0401a00, p_back = 0xf0105b30, p_list = {le_next = 0x0, 
    le_prev = 0xf01f7208}, p_cred = 0xf010d180, p_fd = 0xf00ff760, 
  p_stats = 0xf00e434c, p_limit = 0xf010cce0, p_vmspace = 0xf0105380, 
  p_sigacts = 0xf00e4220, p_flag = 516, p_unused = 0 '\000', 
  p_stat = 2 '\002', p_pad1 = "\000", p_pid = 0, p_hash = {le_next = 0x0, 
    le_prev = 0x0}, p_pglist = {le_next = 0xf01f7000, le_prev = 0xf010b458}, 
  p_pptr = 0x0, p_sibling = {le_next = 0x0, le_prev = 0x0}, p_children = {
    lh_first = 0xf01f7000}, p_oppid = 0, p_dupfd = 0, p_estcpu = 0, 
  p_cpticks = 0, p_pctcpu = 0, p_wchan = 0x0, p_wmesg = 0xf0041068 "biowait", 
  p_swtime = 24807, p_slptime = 0, p_realtimer = {it_interval = {tv_sec = 0, 
      tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_rtime = {
    tv_sec = 0, tv_usec = 453156}, p_uticks = 0, p_sticks = 62, p_iticks = 0, 
  p_traceflag = 0, p_tracep = 0x0, p_siglist = 0, p_textvp = 0x0, p_locks = 0, 
  p_simple_locks = 0, p_holdcnt = 0, p_emul = 0xf00e6ef4, p_spare = {0}, 
  p_sigmask = 0, p_sigignore = 407404544, p_sigcatch = 0, 
  p_priority = 17 '\021', p_usrpri = 50 '2', p_nice = 20 '\024', 
  p_comm = "swapper\000\000\000\000\000\000\000\000\000", p_pgrp = 0xf010b450, 
  p_thread = 0x0, p_addr = 0xf00e4000, p_md = {md_tf = 0x0, md_fpstate = 0x0, 
    md_flags = 0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0}
(gdb) quit


The machine usually panics once every couple of days pretty much in
the same way.  From every dump I have seen it appears that the
"swapper" process was being scheduled to run.

>How-To-Repeat:

It is a little tough to say.  I am not sure that it is enough for
things to be loaded down, but it certainly only seems to panic when it
is busy working.

>Fix:

Don't know...  However, kernel configs, kernel dumps, or access to the
machine is available upon request.

>Audit-Trail:
>Unformatted: