Subject: Re: Request for testers.
To: Robert Black <r.black@ic.ac.uk>
From: Chris Gilbert <cg110@york.ac.uk>
List: port-arm32
Date: 04/22/1997 11:57:36
hi,
On Mon, 21 Apr 1997, Robert Black wrote:
> On Apr 21, 2:10pm, olly@muscat.co.uk wrote:
> > Subject: Re: Request for testers.
> > In article <Pine.SGI.3.95L.970417182500.25801B-100000@tower.york.ac.uk>,
> > Chris Gilbert wrote:
> > >ffs.S find first set bit, fixed time routine used.
> > > This routine does not loop, and uses conditional
> > > execution. Faster in cases where the first set bit
> > > is the 5th or higher place.
> >
> > Here's a version of ffs David Seal posted to comp.sys.acorn.tech (csa.tech
> > and armltd.uucp eh? Ah, those were the days) sometime in Feb 1994:
Are there any more hidden gems like this in some archive somewhere I
wonder...
> > In article <32975@armltd.uucp> dseal@armltd.co.uk (David Seal) writes:
> > >I produced a better variant of the same theme last night:
> > >
> > > ; Operand in R0, register R1 is free, R2 addresses a byte table 'Table'
> > >
> > > RSB R1,R0,#0 ;Standard trick to isolate bottom bit in R1,
> > > AND R1,R1,R0 ; or produce zero in R1 if R0 = 0.
> > >
> > > ORR R1,R1,R1,LSL #4 ;If R1=X with 0 or 1 bits set, R0 = X * &11
> > > ORR R1,R1,R1,LSL #6 ;R0 = X * &451
> > > RSB R1,R1,R1,LSL #16 ;R1 = X * &0450FBAF
> > >
> > > LDRB R1,[R2,R1,LSR #26] ;Look up table entry indexed by top 6 bits
> > > ; of R1
> > >
> > > ; Result in R1
> >
> > This needs a 64 byte table, which I don't seem to have a copy of. It
> > shouldn't be very hard to generate though (just run enough values through).
> > IIRC there may be an unused entry in the table, so simply trying values
> > until it's full might take quite a while.
I'd think that it'd be a case of using a present ffs routine and running
the value through both and storing the result instead of loading, only one
way to find out though, I'll try this later today, see what I can come up
with.
> That is truely sick, but I like it and its even faster than my optimised
> loopless ffs.S if the first set bit is in 5th or higher place. Of course it
> could be subjected to the same tweaking as my ffs.S routine (in which case it
> would beat mine on all cases). Whether this is worth doing depends on the
> distribution of bits it is usually called with.
It's faster than mine, and still maintains a fixed time to execute, I'll
be putting my routines up on the net later today, for people to
examine.
> > >If there are any routines people have a particular desire to have faster
> > >version of let me know. When finished with the string funs, and libkern,
> > >I'm going to start on the in_cksum routine.
> >
> > The obvious candidate is memcpy/memove, but I wanted to get some stats on
> > what usage it gets before putting lots of work into it. However, I've yet
> > to get a kernel source tree on my Risc PC. Hopefully sometime this week...
> > If someone feels keen and wants to generate me some, what I wanted was the
> > relative frequencies of each ( src & 3, dest & 3, length ) triplet.
I'm not sure how to generate these values, but I've not had chance to look
too closely.
> You might want to talk to Mark Brinicombe. Recently he and Neil Carson have
> been going through the kernel profiling it and writing optimised assembler
> versions of various routines.
I hope we've not copied some of their work, most of the routines we've
done are for libkern.
More details on later when I've got the patches onto the net.
Chris G