port-hp300: Re: MMU fault

Subject: Re: MMU fault
To: None <port-hp300@NetBSD.ORG>
From: Peter Radcliffe <pir@pir.net>
List: port-hp300
Date: 07/19/1998 01:50:12
Michael Wolfson <mw34@cornell.edu> probably said:
> Well, looking thru the archives, Jason Thorpe said something to the effect
> that if you're using a serial console you might try turning off xon/xoff
> flow control.  Then again the person he said this to tried it and it didn't
> help (Elmar Kolkman kolkmae@apd.dec.com).  I'm not sure if Jason
> implemented the fix for this.

I just tried, it doesn't help.

> Another thing that apparently leads to this is use of a Quantum SCSI hard
> drive.  Apparently they time out poorly and that mixes in a bad way with
> the NetBSD/hp300 SCSI driver (which hasn't been brought up to the same
> machine independent driver all the other platforms use).

I'm using a HP origional 300mb disk which came with the machine and boots
fine when I use a graphics console.

Thanks to various bits of help from people (including Michael, winter and
Stan Brown) the box is booting as a diskless client, I have the right
disklabel information (a comment to the effect of "much of the disklabel
information is in /etc/disktab, with is in the etc.tgz set" would have
saved me a fair bit of futzing with that disk) and can do a real install
once I'm sure Michael doesn't want me to test more things for him.

> And yet another thing reported by Chris Jantzen <chris@cutecute.ml.org> is:
> :)there is a keystroke in the buffer while starting up (pressing enter twice

> And I had yet another MMU fault (reported a while ago) wherein one of my
> two 400s motherboards could never boot a kernel I compiled in one 400s
> chassis.  But when I tried the same motherboard in a different chassis with
> the same kernel, it had no problems.

Bizarre.

> It'd be cool if we could lick this thing.  Here's some advice Jason Thorpe
> gives on reporting your MMU fault problems to the list:

> :)FWIW, when folks report "MMU fault" bugs, it's vitally important that you
> :)include ALL messages that were displayed when the panic occurred.  What
> :)"MMU fault" means is that there was a fatal page fault, i.e. the MMU
> :)encountered an invalid mapping for a virtual address, and the kernel was
> :)not able to recover.  MMU faults are vital to the proper functioning
> :)of demand-paged executables, for example, but when you encounter a panic
> :)as a result of one, it usally means a NULL pointer.

=pt%A1bit dma, async, scsi id 7t9=x%x=M

Anyone got an idea about this corruption ?
Looks fine on a gfx console.

sd0 at oscsi0 targ 0 lun 0: <HP, 7959S, 8819>
sd0: 1663 cylinders, 12 heads, 630912 blocks, 512 bytes/block
le0 at dio0 scode 21 ipl 5: address 08:00:09:06:11:28
le0: 8 receive buffers, 2 transmit buffers
trap: bad kernel read access at 0x71
trap type 8, code = 0x4020755, v = 0x71
kernel program counter = 0xb7e42
kernel: MMU fault trap
pid = 0, pc = 000B7E42, ps = 2500, sfc = 1, dfc = 1
Registers:
             0        1        2        3        4        5        6        7
dreg: 0000001A 0000006D 00000000 00000000 0000000C 00000000 000D9000 00000000
areg: 000B7DE2 00B40000 00000000 00A00000 0200E180 FF002000 00158F0C FFEFFFFC

Kernel stack (00158E00):
158E00: 000CCDDC 00158E50 00000080 00000000 00000000 0000000C 00000000 000D9000
158E20: 00000000 00000000 00A00000 0200E180 FF002000 00000000 00000000 00158F0C
158E40: 00001904 00000008 04020755 00000071 0000001A 0000006D 00000000 00000000
158E60: 0000000C 00000000 000D9000 00000000 000B7DE2 00B40000 00000000 00A00000
158E80: 0200E180 FF002000 00158F0C FFEFFFFC 00000000 2500000B 7E42B008 0EEC0755
158EA0: 671E4280 00000071 00000071 0000006D 082A0071 000B7E4C 000B7E4A 000B7E48
158EC0: 0000000B 0005FF0D 000FF69F 00050071 000246BA 00000005 00000005 80040000
158EE0: 00A00011 00000000 00000074 000B7E3C 00000000 00000005 00000007 02011E40
158F00: 00158F74 FF150000 FF002000 00158F28 000C9FD4 0200E180 00000224 00000001
158F20: 00000002 00158F78 00158F50 00001D40 00000074 00002704 00000142 00000CBF
158F40: 00B40000 20000000 256A0074 000C6E20 00158F7C 000C8A7E 0000000C FF002000
158F60: FFFFFFFC 0014E000 A0000006 FFEFFFFC 000F83CC 04190000 005DC000 00158FAC
158F80: 000188B0 000DC0DC 0000000C FF002000 FFFFFFFC FFEFFFFC FF003810 0015D000
158FA0: 00000000 00000000 FFFFFFFC FFFFEF20 A0000006 00000000 FFEFFFFC FF003810
158FC0: FF150000 FF002000 00000001 00000000 00001000 FFEFFFFC FF003810 00158FF4
158FE0: 000C8678 0015D000 00001000 0000000C 00000000 00000000 00000000 00000000
panic: MMU fault
Stopped at      _Debugger+0x6:  unlk    a6

> :)Can you please provide more information?  Build a kernel with DDB,
> :)and when this happens, use the "trace" command to discover what function
> :)it's crashing in.

db> trace
_Debugger(2504,158e3c,ccdfa,cc9df,0) + 6
_panic(cc9df,0,0,c,0) + 40
_trap(8,4020755,71) + 236
faultstkadj(200e180,224,1,2,158f78) + 0
_intr_dispatch(74) + 7e
_intrhand(?)
_configure(c,ff002000,fffffffc,14e000,a0000006) + a
_cpu_startup(dc0dc,c,ff002000,fffffffc,ffeffffc) + 29e
vm_fault(0x166000, a0000000, 1, 0) -> 1
  type 8, code [mmu,,ssw]: 401074d
trap type 8, code = 0x401074d, v = 0xa0000000
kernel program counter = 0xc847e
kernel: MMU fault trap
Caught exception in ddb.
_main() + 4a
_main() + 4a
db>

This is reproducable and the numbers look exactly the same each time.

> And don't forget, it's much easier to use a serial console to grab the info

am doing so :)
If theres more testing (db info, recompiling kernels with different options,
etc) I can do let me know.
The box boots fine with monitor/keyboard so this one does look like a
specificly serial console problem with the hp :/
I don't want to use the box with a physical console, though, don't have the
monitor where the real netlink is.

Thanks,
P.

-- 
pir               pir@pir.net      pir@shore.net      pir@leftbank.com