Subject: Re: diff to speed up fdalloc using two-level bitmaps
To: None <tech-perform@netbsd.org>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-perform
Date: 10/29/2003 21:33:15
On Wed, Oct 29, 2003 at 08:48:56PM +0900, Bang Jun-Young wrote:
> On Wed, Oct 29, 2003 at 01:59:13AM -0500, Niels Provos wrote:
> > On Tue, Oct 28, 2003 at 11:25:59AM -0500, Niels Provos wrote:
> > > It could need some fine tuning
> > >
> > > http://www.citi.umich.edu/u/provos/benchmark/netbsd-fdalloc-zoom.jpg
> > >
> > > An inline assembly function for finding the first zero bit in a
> > > word would be a great thing to have across architectures.
> >
> > The same benchmark with the following inline assembly
> >
> > static __inline__ uint32_t ffz(uint32_t word)
> > {
> > __asm__("bsfl %1,%0"
> > :"=r" (word)
> > :"r" (~word));
> > return word;
> > }
>
> One caveat is, if the argument word is passed as 0, the result of
> bsfl is undefined (i.e., unpredictable). So there's needed additional
> code to check if it's 0.
Secondly, unlike ffs(), your ffz() counts from 0. This is not a
real bug, but if consistency mattered...
>
> BTW, how about making ffs(9) inline as well? Calling overhead seems
> to be quite high on i386...
On second look, I found that ffs() is defined as __builtin_ffs()
for all platforms but vax in sys/systm.h. I'm wondering when
__builtin_ffs() is called and when the real ffs() called, but it might
be an off-topic.
Jun-Young
--
Bang Jun-Young <junyoung@NetBSD.org>