Subject: More on locore.s optimizations...
To: None <port-sparc@NetBSD.ORG>
From: David S. Miller <davem@caip.rutgers.edu>
List: port-sparc
Date: 09/16/1995 21:11:37
Ok, I've been in think mode some more on this issue.  And the few
people that did respond to my previous ideas were all positive.  Here
are some more thoughts...

First, netbsd uses two values to keep track of the live user windows
in the file at all times, this is rediculious.  More efficient would
be the usage of one 32-bit datum which contains the mask of all live
user windows currently.  For example, on an 8-window Sparc, if windows
2 thru 5 were live user windows the mask would look like

        0 0 0 1 1 1 1 0 0  --  pcb->uwindow_mask
        -----------------
        8 7 6 5 4 3 2 1 0  --  window number

The idea being that we have the mask of the window we would need to
set/reset in this uwindow_mask value available already in the trap
handler (the old %wim and the %wim about to be set).

Another optimization would be in the calculation of the new %wim
during a spill/fill window trap.  The only requirement is to have the
constant (calculated at boot time) 'nwindows - 1' around at trap time
somehow.  Then one could do:

1) During a spill, assuming %l3 contains the trap time %wim

	srl	%l3, 0x1, %l4
	sll	%l3, 7, %l5
	or	%l4, %l5, %l4

2) During a fill, assuming %l3 contains the trap time %wim

	sll	%l3, 0x1, %l4
	srl	%l3, 7, %l5
	or	%l4, %l5, %l4

I belive Sprite and/or Xinu also employed this technique.  I have yet
to come up with a quicker calculation method for which the pipeline
slots work out correctly in the handlers themselves.  Although I have
a hunch that it can be done via one logical/arith insn then one
'xor %reg, const, %newwim'... I've come close but there was always one
case where my bitmasks did not work... oh well

The constant '7' is the (nwindows - 1) I was speaking of earlier, you
could easily patch these instructions on a 7 window sparc at boot
time.  Regardless the value now contained in %l4 can be used later on
in the trap handler to set the new %wim *and* calculate the new user
live window mask mentioned earlier.

For example, for a spill once we determine that the stack is sane.

	wr	%l4, 0x0, %wim
	nop; nop; nop;
	set	%(_curpcb), %l5  /* Or whatever it is */
	ld	[%l5 + PCB_UWINMASK], %l6
	andn	%l6, %l4, %l6
	st	%l6, [%l5 + PCB_UWINMASK]

Next, the calculate of whether the %sp is in the sun4c address space
hole could be done more quickly via:

	srl	%sp, 29, %l3
	add	%l3, 0x1, %l3
	andncc	%l3, 0x1, %l3
	bne	user_stack_is_in_vma_hole
	nop

Finally, a good way to do SRMMU stack checking is to do no checking at
all.  (Note that this technique will work when netbsd becomes SMP
capable if ever because there are no races at all).  You dump the
registed onto the stack (or read them from the stack) no matter what,
only that you set the no_fault bit in the mmu control register prior
to doing the loads/stores.  something like:

	set	SFSR, %g3
	lda	[%g3] ASI_MMU_CONTROL, %g0	! clear fault status
	lda	[%g0] ASI_MMU_CONTROL, %g3
	or	%g3, 0x2, %g3			! turn on no_fault
	sta	%g3, [%g0] ASI_MMU_CONTROL
	STORE_WINDOW(%sp)			! or LOAD_WINDOW
	andn	%g3, 0x2, %g3			! clear no_fault bit	
	sta	%g3, [%g0] ASI_MMU_CONTROL
	set	SFAR, %g3
	lda	[%g3] ASI_MMU_CONTROL, %g0
	set	SFSR, %g3
	lda	[%g3] ASI_MMU_CONTROL, %g3
	andcc	%g3, 0x2, %g3
	bne	user_stack_is_trashed
	nop

These are just some of the possible optimizations I have come up with,
there are tons more...  the netbsd window handling is very inefficient
now that I have taken a good look at things.  Also, it appears that
there are many instances where netbsd leaves kernel information around
in the register windows when returning back to userland, this is the
classic Sparc operating system security hole.  Does sparc-netbsd honor
the 'clean windows for me' software trap at all?  It does not appear
to me that it does, at least not completely, although I did see proper
AST handling.

Later,
David S. Miller
davem@caip.rutgers.edu