Subject: Re: Request for testers.
To: None <olly@muscat.co.uk, cg110@york.ac.uk>
From: Robert Black <r.black@ic.ac.uk>
List: port-arm32
Date: 04/21/1997 17:09:14
On Apr 21,  2:10pm, olly@muscat.co.uk wrote:
> Subject: Re: Request for testers.
> In article <Pine.SGI.3.95L.970417182500.25801B-100000@tower.york.ac.uk>,
> 	Chris Gilbert wrote:
> >ffs.S		find first set bit, fixed time routine used.
> >		This routine does not loop, and uses conditional
> >		execution.  Faster in cases where the first set bit
> >		is the 5th or higher place.
>
> Here's a version of ffs David Seal posted to comp.sys.acorn.tech (csa.tech
> and armltd.uucp eh?  Ah, those were the days) sometime in Feb 1994:
>
> In article <32975@armltd.uucp> dseal@armltd.co.uk (David Seal) writes:
> >I produced a better variant of the same theme last night:
> >
> >  ; Operand in R0, register R1 is free, R2 addresses a byte table 'Table'
> >
> >    RSB     R1,R0,#0            ;Standard trick to isolate bottom bit in R1,
> >    AND     R1,R1,R0            ; or produce zero in R1 if R0 = 0.
> >
> >    ORR     R1,R1,R1,LSL #4     ;If R1=X with 0 or 1 bits set, R0 = X * &11
> >    ORR     R1,R1,R1,LSL #6     ;R0 = X * &451
> >    RSB     R1,R1,R1,LSL #16    ;R1 = X * &0450FBAF
> >
> >    LDRB    R1,[R2,R1,LSR #26]  ;Look up table entry indexed by top 6 bits
> >                                ; of R1
> >
> >  ; Result in R1
>
> This needs a 64 byte table, which I don't seem to have a copy of.  It
> shouldn't be very hard to generate though (just run enough values through).
> IIRC there may be an unused entry in the table, so simply trying values
> until it's full might take quite a while.

That is truely sick, but I like it and its even faster than my optimised
loopless ffs.S if the first set bit is in 5th or higher place. Of course it
could be subjected to the same tweaking as my ffs.S routine (in which case it
would beat mine on all cases). Whether this is worth doing depends on the
distribution of bits it is usually called with.

> >If there are any routines people have a particular desire to have faster
> >version of let me know.  When finished with the string funs, and libkern,
> >I'm going to start on the in_cksum routine.
>
> The obvious candidate is memcpy/memove, but I wanted to get some stats on
> what usage it gets before putting lots of work into it.  However, I've yet
> to get a kernel source tree on my Risc PC.  Hopefully sometime this week...
> If someone feels keen and wants to generate me some, what I wanted was the
> relative frequencies of each ( src & 3, dest & 3, length ) triplet.

You might want to talk to Mark Brinicombe. Recently he and Neil Carson have
been going through the kernel profiling it and writing optimised assembler
versions of various routines.

Cheers

Rob Black