Subject: FP performance
To: None <port-arm32@NetBSD.ORG>
From: Mark Brinicombe <amb@physig4.ph.kcl.ac.uk>
List: port-arm32
Date: 07/02/1996 14:30:51
Hi,
  Over the weekend there were a few Q's about FP performance and why although
using the same FPE the benchmarks said RiscBSD was slower.

The answer ...

Cause RiscBSD is a pre-emptive multitaking operating system.
With RiscOS the FPE emulates the instruction and returns to the user program.
With RiscBSD there is an added stage that I just tend to call userret.

All exit points from the kernel to user land call a function called userret().
This function checks for pending signals and delivers then as required.
It also checks to see if the kernel has decided that the current process as
been running for long enough and that it is time to switch to a new process. If
this is then case userret() will call mi_switch() to switch to another runnable
process.

Now this has to be done on every exit to user space and thus on every exit from
the FPE. This adds an additional overhead onto every FP instruction.

Now for the FPE exit to user space rather than calling the standard userret() a
custom version is used that is optimised specifically for this case rather than
just using the generic one.

Also another thing that has to be considered is that during userret()
mi_switch() could be called. The context switcher must be called in SVC32 mode.
The FPE runs in UND32 mode. This means that the FPE post proc handler has to
copy a trap frame from UND32 mode to SVC32 mode, do the userret stuff and then
copy the resulting trapframe back form SVC32 mode to UND32 mode for return to
the FPE and then userland.

As one can imagin this adds quite an overhead.

For people who are interested in looking have a look at fpe-arm/armfpe_glue.S
and fpe-arm/armfpe_init.c

Now there is one major optimisation that could be done that would speed things
up a lot...

If there are no pending signals and a reschedule is not pending then the post
proc handler is transferring the trapframe to and from SVC32 mode for no
reason.
Now I have not bothered to fix this as there are some many other things that
need doing. Currently it is left as an exercise for the user. If someone wants
to do it please feed the change back to me. Basically the post proc handler
needs to check for pending signals or a resched and only copy the trapframe and
go through the full userret() if necessary.

Anyone interested can mail me for more info ...

Cheers,
				Mark

-- 
Mark Brinicombe				amb@physig.ph.kcl.ac.uk
Research Associate			http://www.ph.kcl.ac.uk/~amb/
Department of Physics			tel: 0171 873 2894
King's College London			fax: 0171 873 2716