Subject: DDB wierdness and pseudo-crash
To: None <port-i386@NetBSD.ORG>
From: Simon Burge <simonb@telstra.com.au>
List: port-i386
Date: 04/27/1998 23:19:44
Hi folks,
I've got a AMD 586 at home running NetBSD 1.3. It lives downstairs,
usually with no screen attached. I don't run an X server on the machine
- I log in to it from a machine upstairs. Twice now I've started a
program called xcopilot (a PalmPilot emulator) and the machine has
crashed out to the db> prompt with messages like:
vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
kernel: page fault trap, code=0
Stopped at 0x402b000:vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
kernel: page fault trap, code=0
Stopped at _db_read_bytes+0x10: movb 0(%ecx),%al
db>
The first time as 11 days ago (16th April), and the second time just
now. The first time, I just hit the reset button (it was late) and
thought nothing of it. This time around, I had a bit of a play with ddb
(which I've never used before) and then decided to try a "reboot 0x100"
to try and get a crash dump. This is what I saw:
db> reboot 0x100
syncing disks... #
# marks the cursor position.
I waited a few minutes - nothing happened. So I decided to go upstairs
and look for some food. When I walked past the X term upstairs, I
noticed that the window manager had popped up an outline of the xcopilot
window. I positioned the window and all seems fine. Netscape and all
the xterms died with "Connection reset by peer" errors, but I had an
rlogin which was still ok. Now I telnet to the box, but rlogin gives me
a "Connection timed out" error.
I went back downstairs, and the console is still sitting at the same
spot (syncing disks), and doesn't respond to anything (but I haven't
tried ctl-alt-esc yet).
Here's a dmesg from the machine as it is now:
Copyright (c) 1996, 1997 The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 1.3 (THOREAU) #2: Sat Feb 7 17:57:23 EST 1998
root@:/usr/src/sys/arch/i386/compile/THOREAU
cpu0: family 4 model f step 4
cpu0: AMD Am5x86 W/B 133/160 (486-class)
real mem = 33157120
avail mem = 30818304
using 430 buffers containing 1761280 bytes of memory
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
Silicon Integrated System 85C496 (host bridge, revision 0x02) at pci0 dev 5 function 0 not configured
ncr0 at pci0 dev 11 function 0: ncr 53c810a fast10 scsi
ncr0: interrupting at irq 9
ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus0 at ncr0: 8 targets
sd0 at scsibus0 targ 0 lun 0: <DEC, RZ28 (C) DEC, T436> SCSI2 0/direct fixed
sd0: sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
2007MB, 3045 cyl, 16 head, 84 sec, 512 bytes/sect x 4110480 sectors
sd1 at scsibus0 targ 1 lun 0: <DEC, RZ28 (C) DEC, T436> SCSI2 0/direct fixed
sd1: sd1(ncr0:1:0): 10.0 MB/s (100 ns, offset 8)
2007MB, 3045 cyl, 16 head, 84 sec, 512 bytes/sect x 4110480 sectors
cd0 at scsibus0 targ 4 lun 0: <DEC, RRD42 (C) DEC, 4.3d> SCSI2 5/cdrom removable
probe(ncr0:4:1): M_REJECT received (4:8).
probe(ncr0:4:1): M_REJECT received (4:8).
probe(ncr0:4:1): M_REJECT received (4:8).
probe(ncr0:4:1): M_REJECT received (4:8).
probe(ncr0:4:2): M_REJECT received (4:8).
probe(ncr0:4:2): M_REJECT received (4:8).
probe(ncr0:4:2): M_REJECT received (4:8).
probe(ncr0:4:2): M_REJECT received (4:8).
probe(ncr0:4:3): M_REJECT received (4:8).
probe(ncr0:4:3): M_REJECT received (4:8).
probe(ncr0:4:3): M_REJECT received (4:8).
probe(ncr0:4:3): M_REJECT received (4:8).
probe(ncr0:4:4): M_REJECT received (4:8).
probe(ncr0:4:4): M_REJECT received (4:8).
probe(ncr0:4:4): M_REJECT received (4:8).
probe(ncr0:4:4): M_REJECT received (4:8).
probe(ncr0:4:5): M_REJECT received (4:8).
probe(ncr0:4:5): M_REJECT received (4:8).
probe(ncr0:4:5): M_REJECT received (4:8).
probe(ncr0:4:5): M_REJECT received (4:8).
probe(ncr0:4:6): M_REJECT received (4:8).
probe(ncr0:4:6): M_REJECT received (4:8).
probe(ncr0:4:6): M_REJECT received (4:8).
probe(ncr0:4:6): M_REJECT received (4:8).
probe(ncr0:4:7): M_REJECT received (4:8).
probe(ncr0:4:7): M_REJECT received (4:8).
probe(ncr0:4:7): M_REJECT received (4:8).
probe(ncr0:4:7): M_REJECT received (4:8).
st0 at scsibus0 targ 6 lun 0: <DEC, TLZ04 1989(C)DEC, 1615> SCSI2 1/sequential removable
st0: st0(ncr0:6:0): asynchronous.
drive empty
isa0 at mainbus0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
we0 at isa0 port 0x300-0x31f iomem 0xd0000-0xd3fff irq 10
we0: WD8013WC Ethernet (16-bit)
we0: Ethernet address 00:00:c0:4e:2a:43
npx0 at isa0 port 0xf0-0xff: using exception 16
vt0 at isa0 port 0x60-0x6f irq 1
vt0: et4000, 80/132 col, color, 8 scr, mf2-kbd, [R3.32]
spkr0 at vt0 port 0x61
vt0: console
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
biomask 240 netmask 640 ttymask 642
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
vm_fault(0xf086cc00, 0, 1, 0) -> 1
vm_fault(0xf086cc00, 45c70000, 1, 0) -> 1
vm_fault(0xf0301000, ffff0000, 1, 0) -> 1
vm_fault(0xf086cc00, 0, 1, 0) -> 1
vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
syncing disks... <4>we0: warning - receiver ring buffer overrun
The machine has been working fine for maybe 18 months running 1.2 - I
only recently upgraded to 1.3.
Any hints, ideas or suggestions? Should I send-pr this? The machine
is still up at the moment, if that's going to help... As soon as this
little ordeal is finished I'll probably put 1.3.1 on the box.
Simon.