Subject: DDB wierdness and pseudo-crash
To: None <port-i386@NetBSD.ORG>
From: Simon Burge <simonb@telstra.com.au>
List: port-i386
Date: 04/27/1998 23:19:44
Hi folks,

I've got a AMD 586 at home running NetBSD 1.3.  It lives downstairs,
usually with no screen attached.  I don't run an X server on the machine
- I log in to it from a machine upstairs.  Twice now I've started a
program called xcopilot (a PalmPilot emulator) and the machine has
crashed out to the db> prompt with messages like:

	vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	kernel: page fault trap, code=0
	Stopped at      0x402b000:vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	     kernel: page fault trap, code=0
	Stopped at      _db_read_bytes+0x10:    movb    0(%ecx),%al
	db> 

The first time as 11 days ago (16th April), and the second time just
now.  The first time, I just hit the reset button (it was late) and
thought nothing of it.  This time around, I had a bit of a play with ddb
(which I've never used before) and then decided to try a "reboot 0x100"
to try and get a crash dump.  This is what I saw:

	db> reboot 0x100
	syncing disks... #

	# marks the cursor position.

I waited a few minutes - nothing happened.  So I decided to go upstairs
and look for some food.  When I walked past the X term upstairs, I
noticed that the window manager had popped up an outline of the xcopilot
window.  I positioned the window and all seems fine.  Netscape and all
the xterms died with "Connection reset by peer" errors, but I had an
rlogin which was still ok.  Now I telnet to the box, but rlogin gives me
a "Connection timed out" error.

I went back downstairs, and the console is still sitting at the same
spot (syncing disks), and doesn't respond to anything (but I haven't
tried ctl-alt-esc yet).

Here's a dmesg from the machine as it is now:


	Copyright (c) 1996, 1997 The NetBSD Foundation, Inc.  All rights reserved.
	Copyright (c) 1982, 1986, 1989, 1991, 1993
	    The Regents of the University of California.  All rights reserved.

	NetBSD 1.3 (THOREAU) #2: Sat Feb  7 17:57:23 EST 1998
	    root@:/usr/src/sys/arch/i386/compile/THOREAU
	cpu0: family 4 model f step 4
	cpu0: AMD Am5x86 W/B 133/160 (486-class)
	real mem  = 33157120
	avail mem = 30818304
	using 430 buffers containing 1761280 bytes of memory
	mainbus0 (root)
	pci0 at mainbus0 bus 0: configuration mode 1
	Silicon Integrated System 85C496 (host bridge, revision 0x02) at pci0 dev 5 function 0 not configured
	ncr0 at pci0 dev 11 function 0: ncr 53c810a fast10 scsi
	ncr0: interrupting at irq 9
	ncr0: minsync=25, maxsync=206, maxoffs=8, 16 dwords burst, normal dma fifo
	ncr0: single-ended, open drain IRQ driver
	ncr0: restart (scsi reset).
	scsibus0 at ncr0: 8 targets
	sd0 at scsibus0 targ 0 lun 0: <DEC, RZ28     (C) DEC, T436> SCSI2 0/direct fixed
	sd0: sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
	2007MB, 3045 cyl, 16 head, 84 sec, 512 bytes/sect x 4110480 sectors
	sd1 at scsibus0 targ 1 lun 0: <DEC, RZ28     (C) DEC, T436> SCSI2 0/direct fixed
	sd1: sd1(ncr0:1:0): 10.0 MB/s (100 ns, offset 8)
	2007MB, 3045 cyl, 16 head, 84 sec, 512 bytes/sect x 4110480 sectors
	cd0 at scsibus0 targ 4 lun 0: <DEC, RRD42   (C) DEC, 4.3d> SCSI2 5/cdrom removable
	probe(ncr0:4:1): M_REJECT received (4:8).
	probe(ncr0:4:1): M_REJECT received (4:8).
	probe(ncr0:4:1): M_REJECT received (4:8).
	probe(ncr0:4:1): M_REJECT received (4:8).
	probe(ncr0:4:2): M_REJECT received (4:8).
	probe(ncr0:4:2): M_REJECT received (4:8).
	probe(ncr0:4:2): M_REJECT received (4:8).
	probe(ncr0:4:2): M_REJECT received (4:8).
	probe(ncr0:4:3): M_REJECT received (4:8).
	probe(ncr0:4:3): M_REJECT received (4:8).
	probe(ncr0:4:3): M_REJECT received (4:8).
	probe(ncr0:4:3): M_REJECT received (4:8).
	probe(ncr0:4:4): M_REJECT received (4:8).
	probe(ncr0:4:4): M_REJECT received (4:8).
	probe(ncr0:4:4): M_REJECT received (4:8).
	probe(ncr0:4:4): M_REJECT received (4:8).
	probe(ncr0:4:5): M_REJECT received (4:8).
	probe(ncr0:4:5): M_REJECT received (4:8).
	probe(ncr0:4:5): M_REJECT received (4:8).
	probe(ncr0:4:5): M_REJECT received (4:8).
	probe(ncr0:4:6): M_REJECT received (4:8).
	probe(ncr0:4:6): M_REJECT received (4:8).
	probe(ncr0:4:6): M_REJECT received (4:8).
	probe(ncr0:4:6): M_REJECT received (4:8).
	probe(ncr0:4:7): M_REJECT received (4:8).
	probe(ncr0:4:7): M_REJECT received (4:8).
	probe(ncr0:4:7): M_REJECT received (4:8).
	probe(ncr0:4:7): M_REJECT received (4:8).
	st0 at scsibus0 targ 6 lun 0: <DEC, TLZ04 1989(C)DEC, 1615> SCSI2 1/sequential removable
	st0: st0(ncr0:6:0): asynchronous.
	drive empty
	isa0 at mainbus0
	com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
	com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
	we0 at isa0 port 0x300-0x31f iomem 0xd0000-0xd3fff irq 10
	we0: WD8013WC Ethernet (16-bit)
	we0: Ethernet address 00:00:c0:4e:2a:43
	npx0 at isa0 port 0xf0-0xff: using exception 16
	vt0 at isa0 port 0x60-0x6f irq 1
	vt0: et4000, 80/132 col, color, 8 scr, mf2-kbd, [R3.32]
	spkr0 at vt0 port 0x61
	vt0: console
	fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
	fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
	biomask 240 netmask 640 ttymask 642
	boot device: sd0
	root on sd0a dumps on sd0b
	root file system type: ffs
	vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	vm_fault(0xf086cc00, 0, 1, 0) -> 1
	vm_fault(0xf086cc00, 45c70000, 1, 0) -> 1
	vm_fault(0xf0301000, ffff0000, 1, 0) -> 1
	vm_fault(0xf086cc00, 0, 1, 0) -> 1
	vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	vm_fault(0xf086cc00, 402b0000, 1, 0) -> 2
	syncing disks... <4>we0: warning - receiver ring buffer overrun

The machine has been working fine for maybe 18 months running 1.2 - I
only recently upgraded to 1.3.

Any hints, ideas or suggestions?  Should I send-pr this?  The machine
is still up at the moment, if that's going to help...  As soon as this
little ordeal is finished I'll probably put 1.3.1 on the box.

Simon.