Subject: Re: MMU fault
To: Peter Radcliffe <pir@pir.net>
From: Michael Wolfson <mw34@cornell.edu>
List: port-hp300
Date: 07/18/1998 00:09:55
At 2:03 PM -0400 7/17/98, you wrote:

:)=pt%A1bit dma, async, scsi id 7t9=x%x=M
:) ^^^^^                          ^^^^^^^   corruption here ?
:)[...]
:)trap: bad kernel read access at 0x71
:)trap type 8, code = 0x4020755, v = 0x71

Well, looking thru the archives, Jason Thorpe said something to the effect
that if you're using a serial console you might try turning off xon/xoff
flow control.  Then again the person he said this to tried it and it didn't
help (Elmar Kolkman kolkmae@apd.dec.com).  I'm not sure if Jason
implemented the fix for this.

Another thing that apparently leads to this is use of a Quantum SCSI hard
drive.  Apparently they time out poorly and that mixes in a bad way with
the NetBSD/hp300 SCSI driver (which hasn't been brought up to the same
machine independent driver all the other platforms use).

And yet another thing reported by Chris Jantzen <chris@cutecute.ml.org> is:
:)Wow. I actually have something to report. Anyways: This happens anytime
:)there is a keystroke in the buffer while starting up (pressing enter twice
:)as the bootloader, press enter too late, etc.). It happens exactly like
:)this every time. This is, of course, one of those "does it hurt when you
:)do that? then don't do that" kind of things, but maybe it will help fix
:)other bugs.

And I had yet another MMU fault (reported a while ago) wherein one of my
two 400s motherboards could never boot a kernel I compiled in one 400s
chassis.  But when I tried the same motherboard in a different chassis with
the same kernel, it had no problems.

And various other folks have reported seeing these, but haven't been able
to nail anything down.  Seems to be a problem that's been plagueing us all
for a while now, but seems to be caused by a host of different things.

It'd be cool if we could lick this thing.  Here's some advice Jason Thorpe
gives on reporting your MMU fault problems to the list:

:)FWIW, when folks report "MMU fault" bugs, it's vitally important that you
:)include ALL messages that were displayed when the panic occurred.  What
:)"MMU fault" means is that there was a fatal page fault, i.e. the MMU
:)encountered an invalid mapping for a virtual address, and the kernel was
:)not able to recover.  MMU faults are vital to the proper functioning
:)of demand-paged executables, for example, but when you encounter a panic
:)as a result of one, it usally means a NULL pointer.
:)
:)Can you please provide more information?  Build a kernel with DDB,
:)and when this happens, use the "trace" command to discover what function
:)it's crashing in.

And don't forget, it's much easier to use a serial console to grab the info
than to type it in by hand.  Also, if you don't have to powercycle, the
messages should still be available if you run dmesg.

  -- MW