Subject: Re: Request for testers.
To: Robert Black <r.black@ic.ac.uk>
From: Chris Gilbert <cg110@york.ac.uk>
List: port-arm32
Date: 04/22/1997 11:57:36
hi,

On Mon, 21 Apr 1997, Robert Black wrote:

> On Apr 21,  2:10pm, olly@muscat.co.uk wrote:
> > Subject: Re: Request for testers.
> > In article <Pine.SGI.3.95L.970417182500.25801B-100000@tower.york.ac.uk>,
> > 	Chris Gilbert wrote:
> > >ffs.S		find first set bit, fixed time routine used.
> > >		This routine does not loop, and uses conditional
> > >		execution.  Faster in cases where the first set bit
> > >		is the 5th or higher place.
> >
> > Here's a version of ffs David Seal posted to comp.sys.acorn.tech (csa.tech
> > and armltd.uucp eh?  Ah, those were the days) sometime in Feb 1994:

Are there any more hidden gems like this in some archive somewhere I
wonder...

> > In article <32975@armltd.uucp> dseal@armltd.co.uk (David Seal) writes:
> > >I produced a better variant of the same theme last night:
> > >
> > >  ; Operand in R0, register R1 is free, R2 addresses a byte table 'Table'
> > >
> > >    RSB     R1,R0,#0            ;Standard trick to isolate bottom bit in R1,
> > >    AND     R1,R1,R0            ; or produce zero in R1 if R0 = 0.
> > >
> > >    ORR     R1,R1,R1,LSL #4     ;If R1=X with 0 or 1 bits set, R0 = X * &11
> > >    ORR     R1,R1,R1,LSL #6     ;R0 = X * &451
> > >    RSB     R1,R1,R1,LSL #16    ;R1 = X * &0450FBAF
> > >
> > >    LDRB    R1,[R2,R1,LSR #26]  ;Look up table entry indexed by top 6 bits
> > >                                ; of R1
> > >
> > >  ; Result in R1
> >
> > This needs a 64 byte table, which I don't seem to have a copy of.  It
> > shouldn't be very hard to generate though (just run enough values through).
> > IIRC there may be an unused entry in the table, so simply trying values
> > until it's full might take quite a while.

I'd think that it'd be a case of using a present ffs routine and running
the value through both and storing the result instead of loading, only one
way to find out though, I'll try this later today, see what I can come up
with.

> That is truely sick, but I like it and its even faster than my optimised
> loopless ffs.S if the first set bit is in 5th or higher place. Of course it
> could be subjected to the same tweaking as my ffs.S routine (in which case it
> would beat mine on all cases). Whether this is worth doing depends on the
> distribution of bits it is usually called with.

It's faster than mine, and still maintains a fixed time to execute, I'll
be putting my routines up on the net later today, for people to
examine.

> > >If there are any routines people have a particular desire to have faster
> > >version of let me know.  When finished with the string funs, and libkern,
> > >I'm going to start on the in_cksum routine.
> >
> > The obvious candidate is memcpy/memove, but I wanted to get some stats on
> > what usage it gets before putting lots of work into it.  However, I've yet
> > to get a kernel source tree on my Risc PC.  Hopefully sometime this week...
> > If someone feels keen and wants to generate me some, what I wanted was the
> > relative frequencies of each ( src & 3, dest & 3, length ) triplet.

I'm not sure how to generate these values, but I've not had chance to look
too closely.

> You might want to talk to Mark Brinicombe. Recently he and Neil Carson have
> been going through the kernel profiling it and writing optimised assembler
> versions of various routines.

I hope we've not copied some of their work, most of the routines we've
done are for libkern.

More details on later when I've got the patches onto the net.

Chris G