Subject: Re: diff to speed up fdalloc using two-level bitmaps
To: None <tech-perform@netbsd.org>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-perform
Date: 10/29/2003 21:33:15
On Wed, Oct 29, 2003 at 08:48:56PM +0900, Bang Jun-Young wrote:
> On Wed, Oct 29, 2003 at 01:59:13AM -0500, Niels Provos wrote:
> > On Tue, Oct 28, 2003 at 11:25:59AM -0500, Niels Provos wrote:
> > > It could need some fine tuning
> > > 
> > >  http://www.citi.umich.edu/u/provos/benchmark/netbsd-fdalloc-zoom.jpg
> > > 
> > > An inline assembly function for finding the first zero bit in a
> > > word would be a great thing to have across architectures.
> > 
> > The same benchmark with the following inline assembly
> > 
> > static __inline__ uint32_t ffz(uint32_t word)
> > {
> >         __asm__("bsfl %1,%0"
> >                 :"=r" (word)
> >                 :"r" (~word));
> >         return word;
> > }
> 
> One caveat is, if the argument word is passed as 0, the result of
> bsfl is undefined (i.e., unpredictable). So there's needed additional
> code to check if it's 0. 

Secondly, unlike ffs(), your ffz() counts from 0. This is not a
real bug, but if consistency mattered...

> 
> BTW, how about making ffs(9) inline as well? Calling overhead seems
> to be quite high on i386...

On second look, I found that ffs() is defined as __builtin_ffs()
for all platforms but vax in sys/systm.h. I'm wondering when
__builtin_ffs() is called and when the real ffs() called, but it might
be an off-topic.

Jun-Young

-- 
Bang Jun-Young <junyoung@NetBSD.org>