Subject: RE: namei caching of newly created files?
To: Jason Thorpe <>
From: Gordon Waidhofer <>
List: tech-kern
Date: 01/22/2005 17:02:04
> -----Original Message-----
> From: []On
> Behalf Of Thor Lancelot Simon
> Sent: Saturday, January 22, 2005 4:44 PM
> To: Jason Thorpe
> Cc:;
> Subject: Re: namei caching of newly created files?
> On Sat, Jan 22, 2005 at 03:06:28PM -0800, Jason Thorpe wrote:
> > 
> > On Jan 22, 2005, at 2:26 PM, Havard Eidnes wrote:
> > 
> > >>I brought FreeBSD's ufs_dirhash into my -current tree and have
> > >>seen some good improvements.
> > >>...
> > >
> > >Whee, this makes a huge difference in my tests, as you indicated
> > >it would.
> > 
> > Might as well check it in, then?
> I dunno -- Gordon's approach seemed a lot better to me -- even if it
> would really give the benefit only for new filesystems.  I also wonder
> if its expanding hash wouldn't give better performance for the truly
> gigantic directories where this version of the dirhash code seems to
> fall down.
> Thor

It should scale very well. And it should lack any
first-access cost. It also has relatively low
memory pressure. Nice for very small machines
wanting to support very large directories.

It's good to have a target. We use 1,000,000 entries.
That might be an insane target when considering a 16-bit
inode link count (we have 32-bit, and I believe so
does FFSv2).

Pick a number for a tyipcal dirent size. Say, 64.
That works out to 64mb. Then, pick a block size
for a directory. Say, 8kb (that's probably dated).
Anyway, you get to the number of dir blocks to
worry about for the target, say 8192.

The bucket sizes we use "bulge" for the target. I don't
recall our exact numbers off hand, but it isn't important.

int	buckets[] = { 1, 8, 64, 256, 2048, 4096, 1024, 256, 8, 8, 8, 1, 1, .... };

It keeps "spills" over the target from creating nearly unused
directory blocks.

OBTW, the simple example I posted earlier left boundary
conditions as an exercise for the reader. When you reach
a hole (bmap(d,lb)==0), you stop searching. When the "base"
exceeds the number of directory blocks (d.size/bsize), you
stop. The last wrinkle is teach getdents() to hop over holes,
which is really easy. All in all, when we first did it it took
a few days of noodling and about one day to implement.

All that said, support for large directories is nice
to have but matters much less than it used to. Because
of the 16-bit inode link count, /home with over 32000 users
became problematic for most systems. Hence, things at large
installations were changed from /home/george to /home/ge/george
and from /var/mail/george to /var/mail/ge/george, or something