port-arm32: Re: Speeding up RiscBSD and some other questions

Subject: Re: Speeding up RiscBSD and some other questions
To: None <port-arm32@NetBSD.ORG>
From: Robert Black <r.black@ic.ac.uk>
List: port-arm32
Date: 02/14/1997 12:49:02
On Feb 14, 10:50am, Marc Theisen wrote:
> Subject: Speeding up RiscBSD and some other questions
> Hi Freaks!
>
> I am now "using" (which means installing and testing!) RiscBSD four about
> four weeks. I upgraded my SA-driven RiscPC with a Quantum Fireball TM 2.1GB
> and added 32MB DRAM. So RiscBSD seems to run fine in comparison to these
> old Sun's here at my University. But when I compare the system performance
> in some cases RiscBSD really needs to be improved, I think. Just look at
> Suse's Linux for IBM-PC's: After typing "xterm &" there is a delay of, well
> say just a half of a second and there it is! Under RiscBSD this procedure
takes
> several seconds, with a SA and 40MB RAM! I also tested the performance by
> comparing the execution times of unzipping large files with gunzip. There the
> difference was much bigger and I think that the lack of a Math-Coproc is not
> a sensible reason for this. So I assume that there are a lot of routines in
the
> kernel which can be improved, or am I wrong?

Okay, there are currently 3 things I know of that hit performance badly.

1) One is lack of FP. This makes quite an incredible difference to anything
under X, particularly font rendering. This has been conclusively demonstrated
by comparing the perfomrance of a 700 with FPA, with that of a Strongarm. About
half the cost of processing a floating point instruction is in calling
overheads so we are hoping that the performance can be improved here.

2) Shared libraries. Lack of shared libraries means that there is a lot of
common code (all the C-library for starters) which gets loaded in with each
program. This makes programs take a long time to start and wastes disk I/O
bandwidth. Shared libraries either require extremely nasty hacks (a la Linux)
or for a proper implementation the compiler must be able to output position
independant code. The ARM backend for gcc 2.7.2.1 (the latest released gcc)
does not have this implemented. Support for additional relocation types is also
required in the assembler and linker (again, the latest released versions do
not do this).

3) Cache-coherency on the StrongARM. There are probably a lot of places where
unnecessary cache-cleaning is taking place or the caches are being switched
off. Sorting this out requires a rewrite of the memory mapping functions
(pmap). It is not clear how much performance increase this will give.

> Okay, I am not experienced with UNIX-systems but with writing effective
> machine-code which has to run under RISC-OS. To shorten this message:
>
> Could anybody summarize the procedures to build new kernels?
> Do these simple utilities like 'cp' or 'gunzip' take advantage of these
> improved routines or do they have to be recompiled?
> Why does RiscBSD not take advantage of a kind of a "SharedCLibrary" like it
is
> used under RISCOS? (As I look at the code-size of the utilities above I
> understand why using RiscBSD needs upgrading your hardware...)

cp uses almost exclusively kernel routines. gunzip uses mainly c-library
routines. Both sets are currently mainly in C and could certainly benefit from
a rewrite in assembler.

To compile a kernel:

1) Download a kernel source (either from a standard NetBSD mirror or one of the
source trees Mark occasionally puts on the ftp site).

2) Change directory to sys/arch/arm32/conf

3) Copy GENERIC to the name you want to call your kernel (typically hostname in
upper case).

4) Edit the file to taste (this is your config file).

5) Type 'config <name of config file>'

6) cd ../compile/<name of config file>

7) make depend

8) make

We don't use an approach like the SharedCLibrary because we don't use RiscOS
format executables. To make it work properly under RiscBSD requires position
independant code. Note that ELF support has absolutely nothing to do with it
one way or the other (a popular misconception).

> Well, I would be happy if I could give a little of my knowledge to the
> porting team in order to make life easier using RiscBSD. So if there are
> some code sequences which need to be optimised using Assembler code, here
> I am!

Take a look at the routines in the NetBSD source for libc. Most of these could
do with rewriting in assembler. The procedure I normally follow (for libc -
this doesn't work so well for the kernel) is:

1) Write a test harness which produces known results with the routine I'm going
to rewrite.

2) Rewrite the routine and compile it to produce a .o file.

NB: Where possible use NetBSD/RiscBSD source code style-conventions.

3) Run the .o file through the test harness and compare the results. Pay
particular attention to fenceposts, etc.

4) Write and use a benchmark harness to check that there is a performance
increase under normal operating conditions.

5) When I am happy with the replacement I link it into a few normal programs to
test (by including the .o file before the c-library on the link line).

6) I then give it to Mark for inclusion in the distribution c-library.

Cheers

Rob