Subject: Re: Ultra 5 / 2.0 / panic: lockmgr: no context
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Gert Doering <gert@greenie.muc.de>
List: port-sparc64
Date: 01/12/2005 07:39:52
Hi,

On Tue, Jan 11, 2005 at 11:13:45PM +0100, Gert Doering wrote:
> I've just built and booted a new kernel, and will report tomorrow whether
> it crashed (and if yes, whether I could type "bt" in ddb and get a back
> trace, as has been recommended to me today).
> 
> Also I'm running a "sysstat vmstat 1" in a ssh window right now, so
> maybe that will tell me what the machine did before it died.  "top" didn't...

OK, here we go.  Kernel is CVS "netbsd-2-0" as of yesterday evening,
and the crash looks *different*.

Machine crashed at 05:24 tonight, right in the middle of an amanda dump
(so it was not "idle" today while crashing).

Console showed the following messages:
--------------------------------------------------------------------
data fault: pc=11b0684 addr=0
kernel fault 30: data access exception
Stopped in pid 7266,1 (dump) at netbsd:unsleep:0x34:    ldx  [%g1 + %g0], %g1
db>
db> bt
kpsignal2(c8c64e0, cbcfc50, 1, 0, 0, 0) at netbsd:kpsignal2+0x334
sys_kill(c8c6270, 1f, cbcfdc0, 0, cbcfdd0, 40530ff8) at netbsd:sys_kill+0x114
syscall(cbcfed0, 25, 40531304, cbcfdd0, 40531304, 40531308) at netbsd:syscall+0x
d4
?(2f6d, 1f, 0, 20e000, 2, 21fc00) at 0x1008cb8
db> cont
panic: kernel fault
Begin traceback...
End traceback...
syncing disks... hme0: status=30003<GOTFRAME,RCNTEXP,RXTOHOST,NORXD>
<nothing happened>
~#
kdb breakpoint at 1277b68
Stopped in pid 19.1 (raidio0) at        netbsd:cpu_Debugger+0x4:        nop
db> bt
sab_intr(24c7700, 0, e0017ed0, 8000000000000000, 12482bc, ffffffffffffffff) at n
etbsd:sab_intr+0xa4
sparc64_ipi_flush_all(1, 5, e0017ed0, ffffffffffffffff, 1245804, 1) at netbsd:sp
arc64_ipi_flush_all+0x23c
sparc64_ipi_flush_all(26ba0f8, 10, 136ec80, 0, 26ba108, 18aa538) at netbsd:sparc
64_ipi_flush_all+0x23c
rf_RaidIOThread(26ba000, 0, 7e0, 41e44512, a, 188d800) at netbsd:rf_RaidIOThread
+0xb0
proc_trampoline(0, 0, 0, 0, 0, 0) at netbsd:proc_trampoline+0x4
db> sync
Frame pointer is at 0xe0016b21
Call traceback:
126ccd0(d, ffffffffffffffff, 0, 0, 0, d, e0016be1) fp = e0016be1
117a6f4(100, 0, 0, c719e7c, 8, e0017748, e0016ca1) fp = e0016ca1
1179ffc(1277b6c, 0, ffffffffffffffff, e0017630, 0, 4, e0016d61) fp = e0016d61
1179ce4(18101f8, 0, 0, 0, 0, 1a, e0016ec1) fp = e0016ec1
117dac0(1277b70, 0, 1, c6f8481, 0, 0, e0016fa1) fp = e0016fa1
127790c(0, 0, 0, 0, 0, 1000000, e0017071) fp = e0017071
1274bac(101, e0017b50, 28, 28, 0, 348b140, e0017131) fp = e0017131
1008b64(e0017b50, 101, 1277b68, 820006, 348b140, 3497a94, e00172a1) fp = e00172a1
1248a5c(24e2800, 1840c00, a, 183cdd8, 1d, 4d0, e0017481) fp = e0017481
1248360(24e2800, e0017e0c, ba2a7186000, 8000000000000000, 2, 1, e0017551) fp = e0017551
1009038(24c7700, 0, e0017ed0, 8000000000000000, 12482bc, ffffffffffffffff, e0017621) fp = e0017621
1009038(1, 5, e0017ed0, ffffffffffffffff, 1245804, 1, c7194a1) fp = c7194a1

dumping to dev 12,1 offset 749
dump cmdide0:0: unable to load xfer table DMA map for drive 0, error=-1
wddump: DMA error
starting dump, blkno 752
device not ready
rebooting

Resetting ... 


Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 400MHz), No Keyboard
OpenBoot 3.25, 512 MB (50 ns) memory installed, Serial #16075582.
Ethernet address 8:0:20:f5:4b:3e, Host ID: 80f54b3e.



Initializing Memory
Rebooting with command: boot
Boot device: disk:a  File and args: 
NetBSD IEEE 1275 Bootblock
..>> NetBSD/sparc64 OpenFirmware Boot, Revision 1.5
>> (autobuild@cs20.apochromatic.org, Sun Sep  8 11:34:12 UTC 2002)
loadfile: reading header
elf64_exec: Booting /pci@1f,0/pci@1,1/ide@3/disk@0,0:a/netbsd
3843888@0x1000000+142528@0x1800000+4051776@0x1822cc0 
symbols @ 0xfef88400 149+303336+181925 start=0x1000000
chain: calling OF_chain(800000, e478, 1000000, fffa9a80, 18)
console is /pci@1f,0/pci@1,1/ebus@1/se@14,400000:a
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 2.0 (KIRK) #2: Tue Jan 11 20:49:13 CET 2005
	gert@kirk:/home/sparc64/obj/home/src-2.0/sys/arch/sparc64/compile/KIRK
total memory = 512 MB
avail memory = 491 MB
bootpath: /pci@1f,0/pci@1,1/ide@3,0/disk@0,0
mainbus0 (root): SUNW,Ultra-5_10: hostid 80f54b3e
cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 400 MHz, version 0 FPU
cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 2048K external (64 b/l)
psycho0 at mainbus0 addr 0xfffc4000
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 2; PCI bus 0
DVMA map: c0000000 to e0000000
IOTSB: d06000 to d86000
pci0 at psycho0
pci0: i/o space, memory space enabled
ppb0 at pci0 dev 1 function 1: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ebus0 at pci1 dev 1 function 0
ebus0: Sun Microsystems, Inc. PCIO Ebus2, revision 0x01
auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, 72c000-72c003, 72f000-72f003
power at ebus0 addr 724000-724003 ipl 37 not configured
SUNW,pll at ebus0 addr 504000-504002 not configured
sab0 at ebus0 addr 400000-40007f ipl 43: rev 3.2
sabtty0 at sab0 port : console i/o
sabtty1 at sab0 port 1
com0 at ebus0 addr 3083f8-3083ff ipl 41: ns16550a, working fifo
kbd0 at com0
com1 at ebus0 addr 3062f8-3062ff ipl 42: ns16550a, working fifo
ms0 at com1
lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 700000-70000f ipl 34
fdthree at ebus0 addr 3023f0-3023f7, 706000-70600f, 720000-720003 ipl 39 not configured
clock0 at ebus0 addr 0-1fff: mk48t59
flashprom at ebus0 addr 0-fffff not configured
audiocs0 at ebus0 addr 200000-2000ff, 702000-70200f, 704000-70400f, 722000-722003 ipl 35 ipl 36: CS4231A
audio0 at audiocs0: full duplex
0hme0 at pci1 dev 1 function 1: Sun Happy Meal Ethernet, rev. 1
hme0: interrupting at ivec 3021
hme0: Ethernet address 08:00:20:f5:4b:3e
nsphy0 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ATI Technologies 3D Rage Pro (VGA display, revision 0x5c) at pci1 dev 2 function 0 not configured
cmdide0 at pci1 dev 3 function 0
cmdide0: CMD Technology PCI0646 (rev. 0x03)
cmdide0: bus-master DMA support present
cmdide0: primary channel configured to native-PCI mode
cmdide0: using ivec 1820 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
atabus1 at cmdide0 channel 1
ppb1 at pci0 dev 1 function 0: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
pcons at mainbus0 not configured
No counter-timer -- using %tick at 400MHz as system clock.
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
wd0 at atabus0 drive 0: <ST340015A>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1 at atabus0 drive 1: <Maxtor 6Y160P0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 152 GB, 317632 cyl, 16 head, 63 sec, 512 bytes/sect x 320173056 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(cmdide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
wd1(cmdide0:0:1): using PIO mode 4, DMA mode 2 (using DMA data transfers)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <GCR-8523B, , 1.00> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2
cd0(cmdide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
root on wd0a dumps on wd0b
root file system type: ffs
Wed Jan 12 06:22:47 GMT 2005
raid0: Component /dev/wd0h being configured at col: 0
         Column: 0 Num Columns: 2
         Version: 2 Serial Number: 20031222 Mod Counter: 379
         Clean: No Status: 0
/dev/wd0h is not clean!
raid0: Component /dev/wd1a being configured at col: 1
         Column: 1 Num Columns: 2
         Version: 2 Serial Number: 20031222 Mod Counter: 379
         Clean: No Status: 0
/dev/wd1a is not clean!
raid0: RAID Level 1
raid0: Components: /dev/wd0h /dev/wd1a
raid0: Total Sectors: 20479360 (9999 MB)
swapctl: adding /dev/wd0b as swap device at priority 0
Checking for botched superblock upgrades: done.
Starting file system checks:
[..]
--------------------------------------------------------------------

systat vmstat 1 showed this:
--------------------------------------------------------------------
Proc:r  d  s  w    Csw   Trp   Sys  Int  Sof   Flt            PAGING   SWAPPING
        1 19       899   317  2103  437        430            in  out   in  out
                                                      ops                     2
   6.9% Sy  41.6% Us   0.0% Ni   3.0% In  48.5% Id    pages
|    |    |    |    |    |    |    |    |    |    |
===>>>>>>>>>>>>>>>>>>>>>%%                                                forks
                                                                          fkppw
          memory totals (in kB)              437 Interrupts               fksvm
         real   virtual    free              125 intr lev1                pwait
Active 251248    305096    1472              209 intr lev5                relck
All    506312    560160  472280                3 intr lev6                rlkok
                                             100 intr clock               noram
Namei         Sys-cache     Proc-cache                                    ndcpy
    Calls     hits    %     hits     %                                    fltcp
        6        6  100                                                   zfod
                                                                          cow
Disks:   md0   wd0   wd1   cd0 raid0                                   32 fmin
 seeks                                                                 42 ftarg
 xfers         209                                                  15766 itarg
 bytes       1409K                                                    577 wired
 %busy        54.5                                                        pdfre
--------------------------------------------------------------------

so, what to try next?

gert

-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert@greenie.muc.de
fax: +49-89-35655025                        gert@net.informatik.tu-muenchen.de