Subject: kern/13827: fatal page fault in supervisor mode and hang in 1.5.1
To: None <gnats-bugs@gnats.netbsd.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 08/29/2001 22:58:32
>Number:         13827
>Category:       kern
>Synopsis:       Kernel panics with fatal page fault in supervisor mode and reboot hangs while syncing disks
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 29 19:54:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Andreas Wrede
>Release:        <NetBSD-current source date>1.5.1-release
>Organization:
Planix, Inc.
>Environment:
	
System:NetBSD idefix.tvo.org 1.5.1 NetBSD 1.5.1 (TVO) #0: Wed Aug 29 09:25:13 EDT 2001     root@tube.tvo.org:/usr/src/sys/arch/i386/compile/TVO i386


>Description:
After upgrading two (nearly) identical Compaq Proliant 3000 servers 
from NetBSD 1.4.1 to 1.5.1, both servers will sometimes panic during 
periods of high I/O load, ie. during /etc/daily runs and/or 
amanda/dump backups. The kernel is a GENERIC kernel minus some 
fs-types and drivers, plus IPsec.

The filesystem were mounted with and without softdep at the time of 
the various crashes. 

After the panic, during 'syncing disk' the system hangs after 
printing 'command timeout' messages for the SCSI devices. Breaking 
into the debugger running sync will rest the scsibus and proceed to 
dump memory to disk but savecore does not find the core dump in the 
swap partition.


------ BOOT -----
>How-To-Repeat:
	Build custom kernel(?). Run high I/O load. 
>Fix:

unknown
>Release-Note:
>Audit-Trail:
>Unformatted:
 >> NetBSD/i386 BIOS Boot, Revision 2.7
 >> (he@nsa.uninett.no, Mon Jun 18 01:32:10 CEST 2001)
 >> Memory: 639/261120 k
 Use hd1a:netbsd to boot sd0 when wd0 is also installed
 Press return to boot now, any other key for boot menu
 booting wd0a:netbsd - starting in 0 
 3134862+306440+263396 [65+168880+148937]=0x3d7310
 [ preserving 318340 bytes of netbsd ELF symbol table ]
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001
     The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.
 
 NetBSD 1.5.1 (TVO) #0: Wed Aug 29 09:25:13 EDT 2001
     root@tube.tvo.org:/usr/src/sys/arch/i386/compile/TVO
 cpu0: Intel Pentium III (Katmai) (686-class), 498.72 MHz
 total memory = 255 MB
 avail memory = 232 MB
 using 3297 buffers containing 13188 KB of memory
 BIOS32 rev. 0 found at 0xf0000
 mainbus0 (root)
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled
 pchb0 at pci0 dev 0 function 0
 pchb0: Intel 82443BX Host Bridge/Controller (AGP disabled) (rev. 0x03)
 vga1 at pci0 dev 11 function 0: Cirrus Logic CL-GD5446 (rev. 0x45)
 wsdisplay0 at vga1
 ppb0 at pci0 dev 13 function 0: Digital Equipment DECchip 21150 PCI-PCI Bridge (rev. 0x04)
 pci1 at ppb0 bus 1
 pci1: i/o space, memory space enabled
 tl0 at pci1 dev 7 function 0
 tl0: Compaq ProLiant Integrated Netelligent 10/100 TX
 tl0: Ethernet address 00:50:8b:8b:48:e2
 tl0: interrupting at irq 5
 nsphy0 at tl0 phy 1: DP83840 10/100 media interface, rev. 1
 nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 tlphy0 at tl0 phy 31: ThunderLAN 10baseT media interface, rev. 5
 tlphy0: 10base2
 siop0 at pci1 dev 9 function 0: Symbios Logic 53c875 (ultra-wide scsi)
 siop0: using on-board RAM
 siop0: interrupting at irq 9
 scsibus0 at siop0: 16 targets, 8 luns per target
 siop1 at pci1 dev 9 function 1: Symbios Logic 53c875 (ultra-wide scsi)
 siop1: using on-board RAM
 siop1: interrupting at irq 10
 scsibus1 at siop1: 16 targets, 8 luns per target
 Compaq product 0xa0f0 (miscellaneous system) at pci0 dev 14 function 0 not configured
 fxp0 at pci0 dev 15 function 0: Intel i82557 Ethernet, rev 5
 fxp0: interrupting at irq 11
 fxp0: Ethernet address 00:50:8b:65:19:22, 10/100 Mb/s
 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 pcib0 at pci0 dev 20 function 0
 pcib0: Intel 82371AB PCI-to-ISA Bridge (PIIX4) (rev. 0x02)
 pciide0 at pci0 dev 20 function 1: Intel 82371AB IDE controller (PIIX4) (rev. 0x01)
 pciide0: bus-master DMA support present
 pciide0: primary channel wired to compatibility mode
 atapibus0 at pciide0 channel 0
 cd0 at atapibus0 drive 0: <COMPAQ XM-6402B, , 1723> type 5 cdrom removable
 cd0: 32-bit data port
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
 pciide0: primary channel interrupting at irq 14
 cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
 pciide0: secondary channel wired to compatibility mode
 pciide0: secondary channel ignored (disabled)
 uhci0 at pci0 dev 20 function 2: Intel 82371AB USB Host Controller (PIIX4) (rev. 0x01)
 uhci0: can't map i/o space
 Intel 82371AB Power Management Controller (PIIX4) (miscellaneous bridge, revision 0x02) at pci0 dev 20 function 3 not configured
 isa0 at pcib0
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 com0: console
 com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
 pckbc0 at isa0 port 0x60-0x64
 pckbd0 at pckbc0 (kbd slot)
 pckbc0: using irq 1 for kbd slot
 wskbd0 at pckbd0
 pcppi0 at isa0 port 0x61
 sysbeep0 at pcppi0
 isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
 npx0 at isa0 port 0xf0-0xff: using exception 16
 fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
 fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
 isapnp0: no ISA Plug 'n Play devices found
 biomask f7c5 netmask ffe5 ttymask ffe7
 scsibus0: waiting 2 seconds for devices to settle...
 scsibus1: waiting 2 seconds for devices to settle...
 siop1: target 0 using tagged queuing
 sd0 at scsibus1 target 0 lun 0: <COMPAQ, BD009122C6, B016> SCSI2 0/direct fixed
 siop1: target 0 using 16bit transfers
 siop1: target 0 now synchronous at 20.0Mhz, offset 16
 sd0: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
 siop1: target 1 using tagged queuing
 sd1 at scsibus1 target 1 lun 0: <COMPAQ, BD00911934, 3B00> SCSI2 0/direct fixed
 siop1: target 1 using 16bit transfers
 siop1: target 1 now synchronous at 20.0Mhz, offset 15
 sd1: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
 siop1: target 2 using tagged queuing
 sd2 at scsibus1 target 2 lun 0: <COMPAQ, BD00911934, 3B00> SCSI2 0/direct fixed
 siop1: target 2 using 16bit transfers
 siop1: target 2 now synchronous at 20.0Mhz, offset 15
 sd2: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
 IPsec: Initialized Security Association Processing.
 boot device: sd0
 root on sd0a dumps on sd0b
 root file system type: ffs
 swapctl: adding /dev/sd0b as swap device at priority 0
 
 
 
 --- CRASH---
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c0183574 cs 8 eflags 10246 cr2 3c cpl d000f7c4
 panic: trap
 Begin traceback...
 trap() at trap+0x1ed
 --- trap (number 6) ---
 lockmgr(c04502e4,10012,c0450368) at lockmgr+0x78
 uvm_map(c04502e0,d2a96a6c,1000,c0450280,ffffffff) at uvm_map+0x79
 uvm_km_valloc(c04502e0,1000,c0440c80,c09fe680,c0a7ff00) at uvm_km_valloc+0x37
 _bus_dmamem_map(c0440c80,d2a96ae0,1,1000,c09fe68c) at _bus_dmamem_map+0x2e
 siop_morecbd(c095ba00) at siop_morecbd+0xf9
 siop_scsicmd(c097e094) at siop_scsicmd+0x52
 scsipi_execute_xs(c097e094,0,1009,c0959480,d2a96b98) at scsipi_execute_xs+0x36
 scsi_scsipi_cmd(c0959480,d2a96bec,a,ca65a000,2000) at scsi_scsipi_cmd+0xd3
 scsipi_command(c0959480,d2a96bec,a,ca65a000,2000) at scsipi_command+0x59
 sdstart(c0979400,d000f7c4,c4edc834,d2a96c2c,c02b072b) at sdstart+0x1ea
 scsipi_free_xs(c097e094,1) at scsipi_free_xs+0x8b
 scsipi_done(c097e094,c095ba00,ff00,1,1009) at scsipi_done+0x123
 siop_scsicmd_end(c097f800,c0965c60,d2a874bc,d2a874bc,c095ba00) at siop_scsicmd_end+0x35d
 siop_intr(c095ba00) at siop_intr+0x1370
 Xintr10() at Xintr10+0x7c
 --- interrupt ---
 idle(d2a874bc) at idle+0x21
 bpendtsleep(c4ebbaa8,11,c035fc43,0,0) at bpendtsleep
 getblk(d2a8f680,34b00,2000,0,0) at getblk+0x8c
 bread(d2a8f680,34b00,2000,ffffffff,d2a96df8) at bread+0x2d
 ffs_update(d2a96e2c,d2a8fea0,d2a96ef4,d2a8fdd0,0) at ffs_update+0x1bc
 ffs_full_fsync(d2a96ef4,d2a8fea0,d2a96ef4,d2a8fdd0,1) at ffs_full_fsync+0x224
 ffs_fsync(d2a96ef4) at ffs_fsync+0x3a
 ffs_sync(c098d200,3,c0959f80,d2a874bc) at ffs_sync+0xf3
 sync_fsync(d2a96f68) at sync_fsync+0x53
 sched_sync(d2a874bc) at sched_sync+0x119
 End traceback...
 syncing disks...sd2(siop1:2:0): command timeout
 sd2(siop1:2:0): command timeout
 sd2(siop1:2:0): command timeout
 sd1(siop1:1:0): command timeout
 sd1(siop1:1:0): command timeout
 sd1(siop1:1:0): command timeout
 sd1(siop1:1:0): command timeout
 sd0(siop1:0:0): command timeout
 sd0(siop1:0:0): command timeout
 sd0(siop1:0:0): command timeout
 sd0(siop1:0:0): command timeout
 [...many hours pass...]
 Stopped at      cpu_Debugger+0x4:       leave
 db> 
 db> 
 db> trace
 cpu_Debugger(c0965920,11,ffffffff,2c,c0a0f160) at cpu_Debugger+0x4
 comintr(c095b600) at comintr+0xcd
 Xintr4() at Xintr4+0x78
 --- interrupt ---
 ltsleep(c4ecaae8,11,c035fc9a,0,0) at ltsleep+0x4e
 biowait(c4ecaae8,20d5c0,d2ab0db0,c099b000,c040b9a0) at biowait+0x31
 bread(d2ab1698,20d5c0,2000,ffffffff,d2a967bc) at bread+0x95
 ffs_update(d2a967f0,d2caa924,d2a968b8,d2caa5e4,0) at ffs_update+0x1bc
 ffs_full_fsync(d2a968b8,d2caa924,d2a968b8,d2caa5e4,4) at ffs_full_fsync+0x224
 ffs_fsync(d2a968b8) at ffs_fsync+0x3a
 ffs_sync(c09eae00,2,c0959f80,c0464420,c09eae00) at ffs_sync+0xf3
 sys_sync(c0464420,0,0,100,c03749fb) at sys_sync+0x5c
 vfs_shutdown(d2a9696c,d2a96960,c0190635,100,0) at vfs_shutdown+0x64
 cpu_reboot(100,0,d2a969b0,0,6) at cpu_reboot+0x3b
 panic(c03749fb,c04502e4,0,10012,c02a4261) at panic+0xcd
 trap() at trap+0x1ed
 --- trap (number 6) ---
 lockmgr(c04502e4,10012,c0450368) at lockmgr+0x78
 uvm_map(c04502e0,d2a96a6c,1000,c0450280,ffffffff) at uvm_map+0x79
 uvm_km_valloc(c04502e0,1000,c0440c80,c09fe680,c0a7ff00) at uvm_km_valloc+0x37
 _bus_dmamem_map(c0440c80,d2a96ae0,1,1000,c09fe68c) at _bus_dmamem_map+0x2e
 siop_morecbd(c095ba00) at siop_morecbd+0xf9
 siop_scsicmd(c097e094) at siop_scsicmd+0x52
 scsipi_execute_xs(c097e094,0,1009,c0959480,d2a96b98) at scsipi_execute_xs+0x36
 scsi_scsipi_cmd(c0959480,d2a96bec,a,ca65a000,2000) at scsi_scsipi_cmd+0xd3
 scsipi_command(c0959480,d2a96bec,a,ca65a000,2000) at scsipi_command+0x59
 sdstart(c0979400,d000f7c4,c4edc834,d2a96c2c,c02b072b) at sdstart+0x1ea
 scsipi_free_xs(c097e094,1) at scsipi_free_xs+0x8b
 scsipi_done(c097e094,c095ba00,ff00,1,1009) at scsipi_done+0x123
 siop_scsicmd_end(c097f800,c0965c60,d2a874bc,d2a874bc,c095ba00) at siop_scsicmd_e
 nd+0x35d
 siop_intr(c095ba00) at siop_intr+0x1370
 Xintr10() at Xintr10+0x7c
 --- interrupt ---
 idle(d2a874bc) at idle+0x21
 bpendtsleep(c4ebbaa8,11,c035fc43,0,0) at bpendtsleep
 getblk(d2a8f680,34b00,2000,0,0) at getblk+0x8c
 bread(d2a8f680,34b00,2000,ffffffff,d2a96df8) at bread+0x2d
 ffs_update(d2a96e2c,d2a8fea0,d2a96ef4,d2a8fdd0,0) at ffs_update+0x1bc
 ffs_full_fsync(d2a96ef4,d2a8fea0,d2a96ef4,d2a8fdd0,1) at ffs_full_fsync+0x224
 ffs_fsync(d2a96ef4) at ffs_fsync+0x3a
 ffs_sync(c098d200,3,c0959f80,d2a874bc) at ffs_sync+0xf3
 sync_fsync(d2a96f68) at sync_fsync+0x53
 sched_sync(d2a874bc) at sched_sync+0x119
 db> sync
 
 dumping to dev 4,1 offset 500487
 dump siop1: scsi bus reset
 cmd 0xc097fa80 (target 0:0) in reset list
 cmd 0xc097f840 (target 0:0) in reset list
 cmd 0xc097f980 (target 0:0) in reset list
 cmd 0xc097f900 (target 0:0) in reset list
 cmd 0xc0aca000 (target 0:0) in reset list
 cmd 0xc097f8c0 (target 1:0) in reset list
 cmd 0xc097f9c0 (target 1:0) in reset list
 cmd 0xc097fa00 (target 1:0) in reset list
 cmd 0xc097fac0 (target 1:0) in reset list
 cmd 0xc097f940 (target 2:0) in reset list
 cmd 0xc097f880 (target 2:0) in reset list
 cmd 0xc097fa40 (target 2:0) in reset list
 cmd 0xc097f800 (target 2:0) in reset list
 cmd 0xc097fa80 (status 2) about to be processed
 cmd 0xc097f840 (status 2) about to be processed
 cmd 0xc097f980 (status 2) about to be processed
 cmd 0xc097f900 (status 2) about to be processed
 cmd 0xc0aca000 (status 2) about to be processed
 cmd 0xc097f8c0 (status 2) about to be processed
 cmd 0xc097f9c0 (status 2) about to be processed
 cmd 0xc097fa00 (status 2) about to be processed
 cmd 0xc097fac0 (status 2) about to be processed
 cmd 0xc097f940 (status 2) about to be processed
 cmd 0xc097f880 (status 2) about to be processed
 cmd 0xc097fa40 (status 2) about to be processed
 cmd 0xc097f800 (status 0) about to be processed
 siop1: target 0 using 16bit transfers
 siop1: target 0 now synchronous at 20.0Mhz, offset 16
 siop1: target 1 using 16bit transfers
 siop1: target 1 now synchronous at 20.0Mhz, offset 15
 siop1: target 2 using 16bit transfers
 siop1: target 2 now synchronous at 20.0Mhz, offset 15
 255 254 .................
 
 rebooting
 -------