Subject: Re: diff to speed up fdalloc using two-level bitmaps
To: Niels Provos <provos@citi.umich.edu>
From: Matt Thomas <matt@3am-software.com>
List: tech-perform
Date: 10/29/2003 00:44:44
On Tuesday, October 28, 2003, at 10:59 PM, Niels Provos wrote:

> On Tue, Oct 28, 2003 at 11:25:59AM -0500, Niels Provos wrote:
>> It could need some fine tuning
>>
>>  http://www.citi.umich.edu/u/provos/benchmark/netbsd-fdalloc-zoom.jpg
>>
>> An inline assembly function for finding the first zero bit in a
>> word would be a great thing to have across architectures.
>
> The same benchmark with the following inline assembly
>
> static __inline__ uint32_t ffz(uint32_t word)
> {
>         __asm__("bsfl %1,%0"
>                 :"=r" (word)
>                 :"r" (~word));
>         return word;
> }
>
> looks much better:
>
>   http://www.citi.umich.edu/u/provos/benchmark/netbsd-fdalloc-zoom.jpg
>
> Now, it is faster than before in all cases.  It would be great if we
> could have a
>
>  <machine/bitops.h>
>
> include file that would define similar inline assembly for all
> architectures or the equivalent C code if there is no instruction
> support.

Note that some architectures (like PowerPC) count bits differently
(bit 0 is the "leftmost" bit) and the cntlzw (count leading zeroes
word) count from left to right.

ffs(3) has not implied bit ordering.  To use the more efficient
instructions, the bitops must deal with both bit orderings.
-- 
Matt Thomas                     email: matt@3am-software.com
3am Software Foundry              www: http://3am-software.com/bio/matt/
Cupertino, CA              disclaimer: I avow all knowledge of this 
message.