Subject: Re: MMU fault
To: Peter Radcliffe <pir@pir.net>
From: Michael Wolfson <mw34@cornell.edu>
List: port-hp300
Date: 07/18/1998 00:09:55
At 2:03 PM -0400 7/17/98, you wrote:
:)=pt%A1bit dma, async, scsi id 7t9=x%x=M
:) ^^^^^ ^^^^^^^ corruption here ?
:)[...]
:)trap: bad kernel read access at 0x71
:)trap type 8, code = 0x4020755, v = 0x71
Well, looking thru the archives, Jason Thorpe said something to the effect
that if you're using a serial console you might try turning off xon/xoff
flow control. Then again the person he said this to tried it and it didn't
help (Elmar Kolkman kolkmae@apd.dec.com). I'm not sure if Jason
implemented the fix for this.
Another thing that apparently leads to this is use of a Quantum SCSI hard
drive. Apparently they time out poorly and that mixes in a bad way with
the NetBSD/hp300 SCSI driver (which hasn't been brought up to the same
machine independent driver all the other platforms use).
And yet another thing reported by Chris Jantzen <chris@cutecute.ml.org> is:
:)Wow. I actually have something to report. Anyways: This happens anytime
:)there is a keystroke in the buffer while starting up (pressing enter twice
:)as the bootloader, press enter too late, etc.). It happens exactly like
:)this every time. This is, of course, one of those "does it hurt when you
:)do that? then don't do that" kind of things, but maybe it will help fix
:)other bugs.
And I had yet another MMU fault (reported a while ago) wherein one of my
two 400s motherboards could never boot a kernel I compiled in one 400s
chassis. But when I tried the same motherboard in a different chassis with
the same kernel, it had no problems.
And various other folks have reported seeing these, but haven't been able
to nail anything down. Seems to be a problem that's been plagueing us all
for a while now, but seems to be caused by a host of different things.
It'd be cool if we could lick this thing. Here's some advice Jason Thorpe
gives on reporting your MMU fault problems to the list:
:)FWIW, when folks report "MMU fault" bugs, it's vitally important that you
:)include ALL messages that were displayed when the panic occurred. What
:)"MMU fault" means is that there was a fatal page fault, i.e. the MMU
:)encountered an invalid mapping for a virtual address, and the kernel was
:)not able to recover. MMU faults are vital to the proper functioning
:)of demand-paged executables, for example, but when you encounter a panic
:)as a result of one, it usally means a NULL pointer.
:)
:)Can you please provide more information? Build a kernel with DDB,
:)and when this happens, use the "trace" command to discover what function
:)it's crashing in.
And don't forget, it's much easier to use a serial console to grab the info
than to type it in by hand. Also, if you don't have to powercycle, the
messages should still be available if you run dmesg.
-- MW