Subject: Re: tlb_handler() fixup
To: Marcus Comstedt <marcus@idonex.se>
From: Jason R Thorpe <thorpej@zembu.com>
List: port-sh3
Date: 02/02/2001 17:19:11
On Fri, Feb 02, 2001 at 10:38:22PM +0100, Marcus Comstedt wrote:

 > Ok, after applying your patch, I no longer get a reboot when I do
 > 
 >   cat /usr/bin/* >/dev/null
 > 
 > Instead I get
 > 
 >   tlb_handler: va 0x0, spc 0x8c0bba80, exptype 0x40 (pagedaemon)
 >   Stopped in pid 2 (pagedaemon) at        0x8c0c1996:     mov     r14,r15
 >   db>

Yah, before it was just printing an error message and then continuing
to use the bad pointer -- "oops".

 > I'm not familiar with the BSD kernel debugger, so I'm not sure how to
 > proceed here.  0x8c0c1996 is in _cpu_Debugger, so it's not
 > interresting.  0x8c0bba80 is in _uvm_pagefree:
 > 
 >   8c0bba70 <_uvm_pagefree>:
 >   8c0bba70:       86 2f           mov.l   r8,@-r15
 >   8c0bba72:       96 2f           mov.l   r9,@-r15
 >   8c0bba74:       a6 2f           mov.l   r10,@-r15
 >   8c0bba76:       b6 2f           mov.l   r11,@-r15
 >   8c0bba78:       e6 2f           mov.l   r14,@-r15
 >   8c0bba7a:       22 4f           sts.l   pr,@-r15
 >   8c0bba7c:       f3 6e           mov     r15,r14
 >   8c0bba7e:       43 6a           mov     r4,r10
 >   8c0bba80:       a7 52           mov.l   @(28,r10),r2    <-----
 >   8c0bba82:       69 d1           mov.l   8c0bbc28 <_uvm_pagefree+0x1b8>,r1 ! 0xdeadbeef
 >   8c0bba84:       10 32           cmp/eq  r1,r2
 >   8c0bba86:       0b 8f           bf.s    8c0bbaa0 <_uvm_pagefree+0x30>
 > 
 > Does it mean that r10 was NULL (i.e. that the pg argument to
 > uvm_pagefree was NULL)?  How do I get a backtrace?

Yah, either that or something else went wrong.  Every fatal fault I've
seen has been "va == 0".  I'm suspecting that because traps are reenabled
before tlb_handler() is called, we have some sort of race condition.  It
seems to happen to me randomly while doing a "make obj" in the source
tree, causing user programs to die.  I suspect the problem surfaces when
you have a lot of TLB invalidation traffic (e.g. lots of fork/exec/exit
operations, lots of UBC operations doing the `cat' in your example).

Probably what needs to happen is the TLB handler needs to be a bit more
intelligent:

	* T_TLBMISSR or T_TLBMISSW -- this should be a light-weight
	  handler, implemented in assembly, running with traps disabled,
	  use P1SEG to access the page tables, so that traps don't *have*
	  to be enabled.  If this handler finds a valid PTE and services
	  the fault, load the TLB and return.  Otherwise, enable traps,
	  push the trap code, call the heavy-weight fault handler.

	  NOTE: On a write-fault of a read-only TLB, the miss handler
	  doesn't care -- when the faulting insn restarts, we'll just
	  get a T_TLBPRIVW, which can be handled in a more intelligent
	  way.

	* For T_TLBPRIVR and T_TLBPRIVW, call a heavy-weight handler
	  that does the uvm_fault() dance (i.e. the latter half of the
	  current tlb_handler()).

I'm thinking that the "level 1 page table" would actually just be an
array of P1SEG addresses of the actual PTE tables... the level 1 table
doesn't actually have to be valid PTEs.  (FWIW, my "spare time" project
right now is to write a new SuperH pmap module...)

I'm envisioning the code to look something like this:

NENTRY(exphandler)
	mov.l	XL_SHREG_EXPEVT, r0
	mov.l	@r0, r0
	cmp/eq	#T_TLBINVALIDR, r0	/* TLB miss on read -- most common */
	bf	1f
3:
	mov.l	XL_tlbmisshandler, r0
	jmp	@r0

2:
	mov.l	XL_tlbfaulthandler, r0
	jmp	@r0

1:
	cmp/eq	#T_TLBINVALIDW, r0	/* TLB miss on write -- more common */
	bt	3b			/* than prot. faults. */

	mov.l	XL_TLBPRIVW, r1		/* TLB write fault -- more common */
	cmp/eq	r0, r1			/* than read faults. */
	bt	2b

	mov.l	XL_TLBPRIVR, r1		/* TLB read fault */
	cmp/eq	r0, r1
	bt	2b

	.
	.
	.

	.align 2
NENTRY(tlbmisshandler)
	INTRENTRY
	ECLI
	mov.l	XL_SHREG_TEA, r2
	mov.l	@r2, r2			/* r2 = va (will be PD offset) */
	mov.l	r2, r3			/* r2 will be the PT offset */

	/* Get the PD offset. */
	mov.l	XL_PD_MASK, r4
	mov.l	@r4, r4
	and	r4, r2			/* r2 &= PD_MASK */
	mov.l	XL_PDOFF_NEG, r4
	mov.l	@r4, r4
	shld	r4, r2			/* r2 >> (PDSHIFT - 2) */

	/* Get the PT offset. */
	mov.l	XL_PT_MASK, r4
	mov.l	@r4, r4
	and	r4, r3			/* r3 &= PT_MASK */
	mov.l	XL_PTOFF_NEG, r4
	mov.l	@r4, r4
	shld	r4, r3			/* r3 >> (PGSHIFT - 2) */

	/* Load the Page Directory base, add the offset, load the PDE. */
	mov.l	_C_LABEL(PDBASE), r1
	add	r2, r1
	mov.l	@r1, r1			/* r1 = PDE */
	cmp/eq	#0, r1			/* r1 == NULL? */
	bt	1f			/* yes, heavyweight fault */

	/* Add the PT offset, load the PTE. */
	add	r3, r1
	mov.l	@r1, r1			/* r1 = PTE */
	mov.l	XL_PG_V, r4
	mov.l	@r4, r4
	tst	r4, r1			/* (r1 & PG_V) == 0? */
	bt	1f			/* yes, heavyweight fault */

	/* Load the PTE into SHREG_PTEL. */
	mov.l	XL_PTEL_VALIDBITS, r4
	and	r4, r1			/* r1 &= PTEL_VALIDBITS */
	mov.l	XL_SHREG_PTEL, r4
	mov.l	r1, @r4			/* SHREG_PTEL = r1 */

	/*
	 * SHREG_PTEH already contains the VPN and ASID.  Load the TLB!
	 */
	bra	2f			/* jump to return block... */
	ldtlb				/* load the TLB in delay slot */
1:
	/* Enable traps to enable address translation. */
	STI
	mov.l	r0, @-r15		/* push trap code */
	mov.l	XL_tlb_fault
	jsr	@r0			/* call tlb_fault() */
	nop
	add	#4			/* pop trap code */
	CLI
	/*
	 * Don't load the TLB here -- the only safe way to do this
	 * is to allow another TLB miss fault.
	 */
2:
	INTRFASTEXIT

	.align 2
NENTRY(tlbfaulthandler)
	INTRENTRY
	ECLI
	/* Enable traps to enable address translation. */
	STI
	mov.l	r0, @-r15	/* push trap code */
	mov.l	XL_tlb_fault_handler, r0
	jsr	@r0
	nop
	add	#4, r15		/* pop trap code */
	CLI
	/*
	 * Don't load the TLB here -- the only safe way to do this
	 * is to allow another TLB miss fault.
	 */
	INTRFASTEXIT

Now, I'm not a SuperH expert (in fact, I'm only just learning the
processor), so I would appreciate a sanity-check on this code (esp
my register usage, etc.).

Also, seems like the trap dispatch could be done more efficiently...
instead of all the comparisons, seems like it could be done with a table,
instead, shifting the trap number right by 5 bits, and using it as an index
into the table... so, actually, shift it right by 3 bits, add it to the base
of the table, load the value at the resulting address, and jump to it.  The
table would only have 45 entires in it right now... that's only 360 bytes.

Comments?

-- 
        -- Jason R. Thorpe <thorpej@zembu.com>