Subject: kernel panic in comstart()
To: None <port-i386@NetBSD.ORG>
From: Marc Unangst <mju@cs.cmu.edu>
List: port-i386
Date: 01/12/1995 01:05:06
I'm running NetBSD/i386 on a 386DX/40 machine, acting as a PPP router.
The Ethernet card is a WD8003, and the serial card is a Hayes ESP card
configured in 16550-emulation mode.  When the system boots up, it
recognizes the card without any problems as a 16550, and enables the
fifos.  I'm running the serial port at 115.2Kbps, and so far haven't
seen any serial overruns.

Under heavy load, the system panics regularly (usually about once
every hour or two, sometimes twice in a period of 20 minutes).
[Aside: why don't kernel crash dumps work on the 386?  I had a bitch
of a time tracking down this problem until I compiled a kernel with
DDB...]
After one of the panics, I poked around a bit with DDB.  Here's what I
found:

vm_fault(f8195f34, f7c01000, 1, 0) -> 1
kernel: page fault trap, code=0
Stopped at      _comstart+0x9c: movb    0(%ebx),%al

db> trace
_comstart(f8667900) at _comstart+0x9c
_pppstart(f8667900) at _pppstart+0x67
_comintr(f8657f80) at _comintr+0x215
_Xintr4() at _Xintr4+0x63

The code it crashed in is this loop:

_comstart+0x9c: movb    0(%ebx),%al
                incl    %ebx
                movl    %edi,%edx
                outb    %al,%dx
                decl    %ecx
                jnz     _comstart+0x9c

which appears to correspond to the loop

	do {
		outb(iobase + com_data, *cp++);
	} while(--n);

around line 660 of arch/i386/isa/com.c.  The relevant registers are:

ebx     0xf7c01000
edx     0xf81903f8
ecx     0xffffee20
eax     0x0

edx looks right (the ESP card is configured as COM1, 0x3F8), but ebx
points off somewhere in east hyperspace.  The only plausable
explanation I can think of is that q_to_b (called on the line before
the do loop) returned a negative value, which caused n to start out
negative.  Because of this, the kernel happily walked along outputting
bytes and decrementing ecx, until it fell off the end of the page and
generated a page fault.  But I can't see any reason why q_to_b would
return a negative number.

Any ideas?  Has anyone else seen problems with kernel panics under
heavy system load?  I'm thinking of modifying comstart() to log an
error if q_to_b ever returns a negative number, but before I start
trying further experiments I figured I'd see if this was a known (and
maybe fixed) problem.