Subject: port-alpha/35448: memory management fault trap during heavy network I/O
To: None <port-alpha-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: None <agrier@poofygoof.com>
List: netbsd-bugs
Date: 01/20/2007 03:55:00
>Number: 35448
>Category: port-alpha
>Synopsis: memory management fault trap during heavy network I/O
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: port-alpha-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 20 03:55:00 +0000 2007
>Originator: agrier@poofygoof.com
>Release: NetBSD 4.99.8
>Organization:
Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com
>Environment:
System: NetBSD arwen.poofy.goof.com 4.99.8 NetBSD 4.99.8 (ARWEN) #0: Thu Jan 18 23:03:09 PST 2007 agrier@arwen.poofy.goof.com:/var/obj/ARWEN alpha
Architecture: alpha
Machine: alpha
ARWEN is an alphaserver 1000A 5/400.
the ARWEN kernel is GENERIC with hardcoded line to attach root at ld0.
>Description:
- the trap:
CPU 0: fatal kernel trap:
CPU 0 trap entry = 0x2 (memory management fault)
CPU 0 a0 = 0xfffffe0108266000
CPU 0 a1 = 0x1
CPU 0 a2 = 0x0
CPU 0 pc = 0xfffffc00007ecde0
CPU 0 ra = 0xfffffc000035f9ac
CPU 0 pv = 0x0
CPU 0 curlwp = 0xfffffc000fcd2660
CPU 0 pid = 335, comm = nfsio
panic: trap
Begin traceback...
alpha trace requires known PC =eject=
End traceback...
syncing disks... 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 giving up
- the backtrace:
(gdb) bt
#0 0xfffffc00007df888 in dumpsys ()
at /projects/NetBSD/src/sys/arch/alpha/alpha/machdep.c:1229
#1 0xfffffc00007dfdb0 in cpu_reboot ()
at /projects/NetBSD/src/sys/arch/alpha/alpha/machdep.c:1048
#2 0xfffffc0000644a50 in panic ()
at /projects/NetBSD/src/sys/kern/subr_prf.c:246
#3 0xfffffc00007e7248 in trap ()
at /projects/NetBSD/src/sys/arch/alpha/alpha/trap.c:601
#4 0xfffffc00003003e8 in XentMM ()
at /projects/NetBSD/src/sys/arch/alpha/alpha/locore.s:492
#5 0xfffffc000035f9ac in in_delayed_cksum ()
at /projects/NetBSD/src/sys/netinet/ip_output.c:1123
can not access 0xfffffffd, invalid translation (invalid L1 PTE)
can not access 0xfffffffd, invalid translation (invalid L1 PTE)
Cannot access memory at address 0xfffffffffffffffd
- some poking:
(gdb) frame 5
#5 0xfffffc000035f9ac in in_delayed_cksum ()
at /projects/NetBSD/src/sys/netinet/ip_output.c:1123
1123 csum = in4_cksum(m, 0, offset, ntohs(ip->ip_len) - offset);
(gdb) proc 0xfffffc000fcd2660 # curlwp from the trap
(gdb) bt
#0 0xfffffc000062a730 in mi_switch ()
at /projects/NetBSD/src/sys/kern/kern_synch.c:997
(gdb) list *0xfffffc00007ecde0 # pc from the trap
0xfffffc00007ecde0 is in in4_cksum
(/projects/NetBSD/src/sys/netinet/in4_cksum.c:175).
- dmesg
NetBSD 4.99.8 (ARWEN) #0: Thu Jan 18 23:03:09 PST 2007
agrier@arwen.poofy.goof.com:/var/obj/ARWEN
AlphaServer 1000A 5/400, 400MHz, s/n
8192 byte page size, 1 processor.
total memory = 256 MB
(2016 KB reserved for PROM, 254 MB used by NetBSD)
avail memory = 241 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-2
cpu0: Architecture extensions: 1<BWX>
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (ALCOR/ALCOR2), pass 3
cia0: extended capabilities: 21<DWEN,BWEN>
cia0: using BWX for PCI config access
pci0 at cia0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pceb0 at pci0 dev 7 function 0: Intel 82375EB/SB PCI-EISA Bridge (rev. 0x05)
ppb0 at pci0 dev 8 function 0: Digital Equipment DC21050 PCI-PCI Bridge (rev. 0x02)
pci1 at ppb0 bus 2
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
isp0 at pci1 dev 0 function 0: QLogic 1020 Fast Wide SCSI HBA
isp0: interrupting at dec_1000a irq 0
scsibus0 at isp0: 16 targets, 8 luns per target
tlp0 at pci0 dev 11 function 0: DECchip 21140 Ethernet, pass 1.2
tlp0: interrupting at dec_1000a irq 1
tlp0: DEC DE500-XA, Ethernet address 00:00:f8:02:06:a5
tlp0: 10baseT, 100baseTX, 100baseTX-FDX, 10baseT-FDX
mlx0 at pci0 dev 12 function 0: Mylex RAID (v2 interface)
mlx0: interrupting at dec_1000a irq 3
mlx0: DAC960P/PD, 3 channels, firmware 2.70-0-00, 32MB RAM
ld0 at mlx0 unit 0: RAID5, online
ld0: 16380 MB, 8320 cyl, 64 head, 63 sec, 512 bytes/sect x 33546240 sectors
ld1 at mlx0 unit 1: RAID5, online
ld1: 32768 MB, 8322 cyl, 128 head, 63 sec, 512 bytes/sect x 67108864 sectors
ld2 at mlx0 unit 2: RAID5, online
ld2: 32768 MB, 8322 cyl, 128 head, 63 sec, 512 bytes/sect x 67108864 sectors
ld3 at mlx0 unit 3: RAID5, online
ld3: 4536 MB, 2304 cyl, 64 head, 63 sec, 512 bytes/sect x 9289728 sectors
eisa0 at pceb0
eisa0: can't map I/O space for slot 9
isa0 at pceb0
lpt0 at isa0 port 0x3bc-0x3bf irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
attimer0 at isa0 port 0x40-0x43: AT Timer
vga0 at isa0 port 0x3b0-0x3df iomem 0xa0000-0xbffff
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker (CPU-intensive output)
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
pcppi0: attached to attimer0
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <DEC, RZ28M (C) DEC, 0568> disk fixed
sd0: async, 8-bit transfers
sd0: 2007 MB, 3045 cyl, 16 head, 84 sec, 512 bytes/sect x 4110480 sectors
sd0: sync (100.00ns offset 12), 8-bit (10.000MB/s) transfers, tagged queueing
cd0 at scsibus0 target 4 lun 0: <DEC, RRD45 (C) DEC, 1645> cdrom removable
cd0: async, 8-bit transfers
WARNING: can't figure what device matches "RAID 0 12 0 0 0 0 0"
root on ld0a dumps on sd0b
- other misc foo
ps won't grok the coredump:
arwen$ ps -N netbsd.gdb -M /var/crash/netbsd.0.core
ps: can't read proc credentials at 0xfffffc000ade3480: Undefined error: 0
>How-To-Repeat:
it seems to be triggered by syncing a remotely mounted mailbox from
within pine or mutt.
>Fix:
figure out what is causing the trap? maybe a stack smash, based on
previous port-alpha mailing list entries. perhaps
options KSTACK_CHECK_MAGIC
is in order?
>Unformatted:
sources CVSed 2007-01-18