Subject: kern/22363: Recent NetBSD-current kernels lockup or freeze frequently
To: None <gnats-bugs@gnats.netbsd.org>
From: Matthias Scheler <tron@colwyn.zhadum.de>
List: netbsd-bugs
Date: 08/05/2003 18:21:30
>Number:         22363
>Category:       kern
>Synopsis:       Recent NetBSD-current kernels lockup or freeze frequently
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 05 16:22:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     
>Release:        NetBSD 1.6W 2003-08-05 sources
>Organization:
Matthias Scheler                                  http://scheler.de/~matthias/
>Environment:
System: NetBSD lyssa.zhadum.de 1.6W NetBSD 1.6W (LYSSA) #0: Tue Aug 5 16:36:07 CEST 2003 tron@lyssa.zhadum.de:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
After upgrading my system from 2003-07-27 to 2003-08-03 sources it started
to panic or lockup frequently. In two cases I was able to get a crash
dump. Here are the panic strings and stack traces:

panic: pool_get(%s): free list modified: magic=%x; page %p; item addr %p

#0  0x1 in ?? ()
(gdb) where
#0  0x1 in ?? ()
#1  0xc02d523b in cpu_reboot ()
#2  0xc025ce53 in panic ()
#3  0xc025b49a in pool_get ()
#4  0xc0278b1b in cache_enter ()
#5  0xc01d9aeb in nfs_lookup ()
#6  0xc0283595 in VOP_LOOKUP ()
#7  0xc027abd4 in lookup ()
#8  0xc027a823 in namei ()
#9  0xc028117e in sys___stat13 ()
#10 0xc02dc6af in syscall_plain ()
#11 0xc0100ab3 in syscall1 ()
can not access 0xbfbff0bc, invalid translation (invalid PDE)
can not access 0xbfbff0bc, invalid translation (invalid PDE)
Cannot access memory at address 0xbfbff0bc


panic: pool_get(%s): free list modified: magic=%x; page %p; item addr %p

#0  0x1 in ?? ()
(gdb) where
#0  0x1 in ?? ()
#1  0xc02d523b in cpu_reboot ()
#2  0xc025ce53 in panic ()
#3  0xc02460cd in lockmgr ()
#4  0xc0284e58 in genfs_lock ()
#5  0xc0283cf6 in VOP_LOCK ()
#6  0xc0283489 in vn_lock ()
#7  0xc027c501 in vget ()
#8  0xc020582b in ffs_sync ()
#9  0xc027ee7e in sys_sync ()
#10 0xc027da79 in vfs_shutdown ()
#11 0xc02d5207 in cpu_reboot ()
#12 0xc025ce53 in panic ()
#13 0xc025b49a in pool_get ()
#14 0xc0278b1b in cache_enter ()
#15 0xc021f25a in ufs_lookup ()
#16 0xc0283595 in VOP_LOOKUP ()
#17 0xc027abd4 in lookup ()
#18 0xc027a823 in namei ()
#19 0xc028117e in sys___stat13 ()
#20 0xc02dc6af in syscall_plain ()
#21 0xc0100ab3 in syscall1 ()
can not access 0xbfbff9a4, invalid translation (invalid PDE)
can not access 0xbfbff9a4, invalid translation (invalid PDE)
Cannot access memory at address 0xbfbff9a4

Here is one more panic for which I wasn't able to get a crash dump:

uvm_fault(0xc0463120, 0xdeadb000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0278721 cs 8 eflags 10282 cr2 deadbf0f ilevel 0
panic: trap
Begin traceback...
trap() at netbsd:trap+0x216
--- trap (number 6) ---
cache_lookup(e47950a4,e4f93ea8,e4f93ebc,e4f4cb90,e47950a4) at netbsd:cache_lookup+0x8d
ufs_lookup(e4f93da0,30002,e4f93db0,c0283489,e4f93e98) at netbsd:ufs_lookup+0xec
VOP_LOOKUP(e47950a4,e4f93ea8,e4f93ebc,c024ef4f,e47950a4) at netbsd:VOP_LOOKUP+0x35
lookup(e4f93e98,e47ac000,400,e4f93eb0,e4ea9a84) at netbsd:lookup+0x2a4
namei(e4f93e98,8eb20000,e4f91828,0,e4ea9a84) at netbsd:namei+0x31b
sys___stat13(e4ea9a84,e4f93f80,e4f93f78,c02dcda7,0) at netbsd:sys___stat13+0x52
syscall_plain(4836001f,1f,1f,bfbf001f,bfbffce4) at netbsd:syscall_plain+0xab
End traceback...
syncing disks...

It looks to me like some change checked in after the 27th of July causes
random kernel memory corruption. This is *not* a hardware problem because
the system works rock stable if I downgrade to the old kernel and userland.
Here is the "dmesg" output in case it matters:

 NetBSD 1.6W (LYSSA) #0: Tue Aug  5 16:36:07 CEST 2003
	tron@lyssa.zhadum.de:/src/sys/compile/LYSSA
total memory = 1022 MB
avail memory = 944 MB
using 6144 buffers containing 52468 KB of memory
BIOS32 rev. 0 found at 0xf0010
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium 4 (686-class), 2394.11 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu0: L2 cache 512 KB 64b/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: 16 page colors
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20030228
acpi0: X/RSDT: OemId <INTEL ,D865PERL,20030715>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
pckbc0 at acpi0 (PNP0303): kbd port
pckbc0: io 0x60,0x64 irq 1
pckbc1 at acpi0 (PNP0F03): aux port
pckbc1: irq 12
PNP0800 [AT-style speaker sound] at acpi0 not configured
npx0 at acpi0 (PNP0C04)
npx0: io 0xf0-0xff irq 13
npx0: using exception 16
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
INT0800 at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
acpibut0 at acpi0 (PNP0C0E-29): ACPI Sleep Button
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82865 Host (rev. 0x02)
pchb0: random number generator enabled
agp at pchb0 not configured
ppb0 at pci0 dev 1 function 0: Intel 82865 AGP (rev. 0x02)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga0 at pci1 dev 0 function 0: ATI Technologies Radeon 8500/8500LE (rev. 0x00)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
ppb1 at pci0 dev 3 function 0: Intel 82801EB Hub-to-PCI Bridge (rev. 0x02)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
Intel i82547EI Gigabit Ethernet (ethernet network) at pci2 dev 1 function 0 not configured
uhci0 at pci0 dev 29 function 0: Intel 82801EB/ER USB UHCI Controller #0 (rev. 0x02)
uhci0: interrupting at irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801EB/ER USB UHCI Controller #1 (rev. 0x02)
uhci1: interrupting at irq 5
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801EB/ER USB UHCI Controller #2 (rev. 0x02)
uhci2: interrupting at irq 10
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3 at pci0 dev 29 function 3: Intel 82801EB/ER USB UHCI Controller #3 (rev. 0x02)
uhci3: interrupting at irq 11
usb3 at uhci3: USB revision 1.0
uhub3 at usb3
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: interrupting at irq 9
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
uhub4 at usb4
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
ppb2 at pci0 dev 30 function 0: Intel 82801BA Hub-to-PCI Bridge (rev. 0xc2)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
fpa0 at pci3 dev 1 function 0: DEC DEFPA PCI FDDI DAS Controller
fpa0: FDDI address 08:00:2b:b4:15:d0, FW=2.46, HW=0, SMT V7.2
fpa0: FDDI Port[A] = A (PMD = ANSI Multi-Mode), FDDI Port[B] = B (PMD = ANSI Multi-Mode)
fpa0: interrupting at irq 10
wm0 at pci3 dev 2 function 0: Intel i82540EM 1000BASE-T Ethernet, rev. 2
wm0: interrupting at irq 3
wm0: Ethernet address 00:07:e9:0e:b9:a8
makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 3
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
emuxki0 at pci3 dev 3 function 0: Creative Labs SBLive! EMU 10000 (audio multimedia, revision 0x07)
emuxki0: interrupting at irq 5
emuxki0: SigmaTel STAC9708 codec; 18 bit DAC, 18 bit ADC, SigmaTel 3D
emuxki0: surround DAC
audio0 at emuxki0: full duplex, mmap, independent
joy0 at pci3 dev 3 function 1: Creative Labs PCI Gameport Joystick (rev 0x07)
joy0: joystick not connected
bktr0 at pci3 dev 4 function 0
bktr0: interrupting at irq 10
bktr0: Hauppauge Model 60114 C VM
bktr0: Detected a MSP3410D-B4 at 0x80
bktr0: Hauppauge WinCast/TV, Philips PAL I tuner, msp3400c stereo.
fwohci0 at pci3 dev 7 function 0: Lucent Technologies FW322/323 IEEE 1394 OHCI Controller (rev. 0x04)
fwohci0: interrupting at irq 3
fwohci0: OHCI 1.0, 00:07:e9:00:00:67:18:7b, 400Mb/s, 2048 max_rec, 8 ir_ctx, 8 it_ctx
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801EB LPC Interface Bridge (rev. 0x02)
pciide0 at pci0 dev 31 function 1: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <IC35L040AVER07-0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 39266 MB, 79780 cyl, 16 head, 63 sec, 512 bytes/sect x 80418240 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1 at pciide0 channel 0 drive 1: <IC35L060AVER07-0>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 58644 MB, 119150 cyl, 16 head, 63 sec, 512 bytes/sect x 120103200 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
pciide0: primary channel interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 0: <Pioneer DVD-ROM ATAPIModel DVD-120S, , 1.01> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
cd1 at atapibus0 drive 1: <PLEXTOR CD-R   PX-W2410A, 379089, 1.04> cdrom removable
cd1: 32-bit data port
cd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
pciide0: secondary channel interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
cd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
Intel 82801ER Serial ATA Controller (RAID mass storage, interface 0x8f, revision 0x02) at pci0 dev 31 function 2 not configured
Intel 82801EB/ER SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
fw0 at fwohci0: 00:07:e9:00:00:67:18:7b:0a:02:ff:ff:f0:01:00:00
ehci0: handing over full speed device on port 1 to uhci0
uhub4: port 1, device disappeared after reset
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
uhub5 at uhub0 port 1
uhub5: Texas Instruments TUSB2046 hub, class 9/0, rev 1.10/1.25, addr 2
uhub5: 4 ports with 4 removable, self powered
umass0 at uhub5 port 1 configuration 1 interface 0
umass0: Neodio Multi-format Flash Controller, rev 1.10/1.00, addr 3
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 4 luns per target
scsibus0: waiting 2 seconds for devices to settle...
uscanner0 at uhub5 port 2
uscanner0: EPSON EPSON Scanner, rev 1.10/1.00, addr 4
sd0 at scsibus0 target 1 lun 0: <Generic, USB Storage-SMC, 0180> disk removable
sd0: drive offline
sd1 at scsibus0 target 1 lun 1: <Generic, USB Storage-CFC, 0180> disk removable
sd1: drive offline
sd2 at scsibus0 target 1 lun 2: <Generic, USB Storage-MMC, 0180> disk removable
sd2: drive offline
sd3 at scsibus0 target 1 lun 3: <Generic, USB Storage-MSC, 0180> disk removable
sd3: drive offline
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
wsdisplay0: screen 5 added (80x25, vt100 emulation)
wsdisplay0: screen 6 added (80x25, vt100 emulation)
wsdisplay0: screen 7 added (80x25, vt100 emulation)

>How-To-Repeat:
Put serious load on your system e.g. a NetBSD or "qt3-libs" package rebuild.
The system is using NIS and NFS which might matter.

>Fix:
Downgrading to 2003-07-27 sources fixes the problem.
>Release-Note:
>Audit-Trail:
>Unformatted: