Subject: Re: diff to speed up fdalloc using two-level bitmaps
To: Jason Thorpe <thorpej@wasabisystems.com>
From: Bang Jun-Young <junyoung@netbsd.org>
List: tech-perform
Date: 10/30/2003 01:04:49
On Wed, Oct 29, 2003 at 06:37:10AM -0800, Jason Thorpe wrote:
> 
> On Wednesday, October 29, 2003, at 03:48  AM, Bang Jun-Young wrote:
> 
> >BTW, how about making ffs(9) inline as well? Calling overhead seems
> >to be quite high on i386...
> 
> GCC will already inline ffs() if the CPU back-end provides the 
> appropriate pattern.  The right answer, if GCC is not doing it on i386, 
> would be to add that pattern to i386.md.

I looked further and found that ffs() was properly inlined in the
kernel:

c01c02a4 <fdalloc>:
[snip]
c01c0348:       83 fa ff                cmp    $0xffffffff,%edx
c01c034b:       0f 84 5e 01 00 00       je     c01c04af <fdalloc+0x20b>
c01c0351:       f7 d2                   not    %edx
c01c0353:       c1 e3 05                shl    $0x5,%ebx
c01c0356:       31 c0                   xor    %eax,%eax
c01c0358:       0f bc d2                bsf    %edx,%edx
c01c035b:       0f 94 c0                sete   %al
c01c035e:       f7 d8                   neg    %eax

So it's clear that a little speedup with inlined ffz() as shown in
provos' graph was due to incomplete implementation, isn't it?

Jun-Young

-- 
Bang Jun-Young <junyoung@NetBSD.org>