Subject: changing VOP_REMOVE(), was Re: ufs-ism in lookup(9)
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 03/30/2004 17:45:24
--cmJC7u66zC7hs+87
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Mar 31, 2004 at 08:53:05AM +0900, YAMAMOTO Takashi wrote:
> hi,
>=20
> > > > However that would mean that each mount point sits on two vnodes, o=
ne for=20
> > > > the node and one for the parent.
> > >=20
> > > is it a problem?
> >=20
> > It seems wasteful to me. Removing vnodes is not a common event, so I do=
n't=20
> > see why we need to have an extra vnode lying around just to see if we a=
re=20
> > removing a mount point.
>=20
> removing vnodes is a common event, IMO.

How so? While I agree there are work loads that remove a lot of files, I=20
look at the files on my system, and most of them sit around.

Also, even if we decide removing files is a common event (which I won't=20
object to much), I don't think the (dvp, component) cache will work.

> > Well, we still have to teach the NFS server to scan the mount point cac=
he.
>=20
> exactly.

[snip]

> > While we could accept NFS's attempt to remove this directory (we'd trig=
ger=20
> > silly-rename issues as the mount structure has a reference to the=20
> > directory, so it's not a free vnode), I think that would be sub-optimal=
.=20
> > So to stay consistent, we have to move a test into nfs_remove to check =
and=20
> > see if we have a vnode in-core (which we'd need anyway as we have to za=
p=20
> > said node), and check to see if it's mounted-on.
>=20
> do you really want to support such a weird situation? :-)
> if so, i think you want to eliminate f_mntonname and do getcwd-like thing
> to get the path of the mountpoint.

I certainly don't want to get bitten by it. :-) I'm concerned it'd be a=20
new way to make the kernel fall over.. So in that way, I want to "support"=
=20
not dying.

Also, when exactly would we be doing this getcwd-like thing to get the=20
mountpoint path? How will we know when to (and not to) do the parent=20
directory lookup?

My point is that we are talking about (with the (dvp, component) cache)
doing mount-point detection using something that is fragile for a remote
file system, unless we can put some sort of watch (or lock or guard) on
each path component.  Otherwise either the server or another client can
come in and do a rename that will invalidate the usefullness of our cache.
And we wouldn't even know it.

> > Oh, another issue with keeping the path component is that we break an=
=20
> > abstraction we have now. At present, only a file system knows how to=20
> > compare path component names for files on it. Making the (dvp, componen=
t)=20
> > cache would also require a way to have the upper layers request compone=
nt=20
> > comparison. For instance, ffs is 8-bit ascii case sensitive. HFS is cas=
e=20
> > insenitive, as are FAT and NTFS. Short and long FAT file names add anot=
her=20
> > twist to the mix.
>=20
> hm, a good point.
>=20
> > Sounds like the best thing to do for now is leave VOP_RENAME() alone.
>=20
> the current "VOP_LOOKUP for dirop" method is wasteful even for ufs.
> ie. in-core inode is bloated unnecessarily to store a result of VOP_LOOKU=
P.
> i guess it's worse for more complicated directory implementations.

[sorry I mentioned VOP_RENAME(); we're talking about VOP_REMOVE()]

Huh?

For ufs, AFAICT we add two int32_t's to hold slot information. We also=20
use these values for file creation. So it doesn't seem like a waste to me.

Also, how exactly do we eliminate the lookup for NFS? Yes, we could do=20
something like NOLEAF to the lookup, and we could push the parent and=20
component into VOP_REMOVE(). But fundamentally, as part of the remove=20
process, we have to know if we have an in-kernel vnode. How do we find=20
that vnode without doing a lookup? "Lookup" after all is the way we map a=
=20
name to an inode.

And since we have to do the lookup to find the vnode, what's wrong with
what we're doing now, other than that we will always load the vnode into
core even if it's not in-core?

To recap: This thread started a while ago with a proposal to change how we
do caching for DELETE and RENAME operations, so that remote file systems
can use the namei cache correctly. I think that's fine. And assuming it's
the version that changes vfs_cache.c not vfs_lookup.c, I think it should
be checked in (and pulled into 2.0).

This thread then moved onto changing VOP_RENAME() so that you don't have
to do a lookup on the to-be-deleted vnode before calling the rename. Since
you fundamentally have to know if you have an in-core vnode for the
to-be-deleted file and a lookup is the fundamental way to find a vnode
given a name, I don't see how we can eliminate the VOP_LOOKUP(). We can
move it around, but we can't eliminate it.

Given that we can't eliminate the lookup, I don't see the push to change
our current vnode layer.

Take care,

Bill

--cmJC7u66zC7hs+87
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAaiK0Wz+3JHUci9cRAvZ9AJ9aWzKg0wsKCif5w8b6vpuvkx1ykQCfS67p
KnKZvmItvEB+GXOe3m/xhDU=
=+ZE3
-----END PGP SIGNATURE-----

--cmJC7u66zC7hs+87--