Subject: Re: RFC: VOP_LOOKUP() speedup
To: None <tech-kern@netbsd.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 05/05/2004 09:12:28
--qMm9M+Fa2AknHoGS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 05, 2004 at 12:47:47AM +0200, Reinoud Zandijk wrote:
> Dera Bill,
>=20
> On Tue, May 04, 2004 at 12:13:28PM -0700, Bill Studenmund wrote:
> > On Tue, May 04, 2004 at 01:07:10AM +0200, Reinoud Zandijk wrote:
> > > 1) before a VOP_CREATE() / VOP_MKDIR() / ... is called, VOP_LOOKUP() =
is=20
> > >    called first to check if the file opr directory name is allready p=
resent=20
> > >    or not.
> > > 2) VOP_LOOKLUP() uses the namecache to check if its there...
> > > 3) if the namecache fails, VOP_LOOKUP() needs to search the directory=
 from=20
> > >    disc for the name. _hopefully_ its still in the cache....
>=20
> So this is correct? i dont miss out a thing?

With the second cache =3D=3D block cache, yes, I think that's right.

> > > 1) every vnode/inode describing a directory gets a `number of directo=
ry
> > > entries in cache' counter.
> > >=20
> > > 2) on addition or removal of an entry from the cache, this number is=
=20
> > > maintained. Also when an entry is purged from the LRU.
>=20
> i've implemented this part in my test-kernel on my Alpha, and it works=20
> fine.

I still don't understand why you want a number. Isn't a boolean (all here=
=20
or not all here) enough?

> > > 3) with this provision, ,VOP_LOOKUP() now can check the number of dir=
ectory
> > > entries in the directory with the `number of directory entries in cac=
he'
> > > and see that the answer of the namecache is authorative if the two ar=
e the
> > > same.
>=20
> Only filing systems that want to use this feature need explicit coding;=
=20
> othres just ignore and go for the normal way.
>=20
> > However, I think you can do everything you want in your own file system.
>=20
> :)

Perhaps I'm misunderstanding you, but my objection is to changing struct=20
vnode or the name cache structures. I think all you need to do is either=20
keep a flag or count in your fs's structure off the vnode.

> > All you need is one extra bit per node. When you do a linear search of =
the=20
> > dir, create vnodes and name cache entries for all files. Then flag the =
dir=20
> > as all-present. Then in vop_revoke, in addition to blowing away cache=
=20
> > entries, clear the "All present" bit in the parent vnode.
>=20
> but wouldn't this give a `false hit' when say a file is deleted? since th=
e=20
> VOP_REVOKE() will be called after the VOP_REMOVE()?

No. You won't have a parent after the VOP_REMOVE(), so the VOP_REVOKE()=20
won't clear anything.

Also, if we find issues in the proposal that need tweaking, we can tweak=20
things. ;-)

> > One thing you might need is a way to search the name cache for all pare=
nt=20
> > vnodes of a given vnode. Not sure if we have this interface. If we don'=
t,=20
> > it'd be a fine thing to add to the general code. Then on revoke, you cl=
ear=20
> > the bit in all present parents.
>=20
> I dont get this honestly; if one focuses on the namecache itself and let =
it
> handle the counters itself, then the filingsystem doesn't need to search=
=20
> around etc. and is the definition of the reference count allways correct.

The problem is that using the namecache means having vnodes for all of=20
these entities. So just doing an ls on a directory creates vnodes for all=
=20
the contents.

Given the costs of reading a directory for UDF, especially while writing,=
=20
I think the cost of the extra vnodes is less than the cost of not having=20
them. But for all other file systems, I do not expect the cost of the=20
directory reads is worth the extra vnodes.

As an example, consider my build machines. I keep a few kernel builds=20
around. If we used the name cache like you describe, each time I do an ls,=
=20
we gain a lot of vnodes. Those vnodes push out the vnodes of files that we=
=20
were using. If I'm not actually compiling a kernel, I just gained a lot of=
=20
useless vnodes.

The fact that we don't always create vnodes when we read a directory is a
feature for a lot of workloads. I think we need to keep it (for everything
other than UDF).

Take care,

Bill

--qMm9M+Fa2AknHoGS
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAmRJsWz+3JHUci9cRArqVAJ0Rfw6UK0rCNrs3O8SMoYAVbCwZUQCgi2z2
n2LP2Hs52wMsS8NWFQwf0Rs=
=Cxdg
-----END PGP SIGNATURE-----

--qMm9M+Fa2AknHoGS--