Subject: Re: ufs-ism in lookup(9)
To: None <wrstuden@netbsd.org>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-kern
Date: 03/31/2004 08:53:05
hi,

> > > However that would mean that each mount point sits on two vnodes, one for 
> > > the node and one for the parent.
> > 
> > is it a problem?
> 
> It seems wasteful to me. Removing vnodes is not a common event, so I don't 
> see why we need to have an extra vnode lying around just to see if we are 
> removing a mount point.

removing vnodes is a common event, IMO.

> > > Also, is there any way NFS Exporting a file system could cause a mount 
> > > point to get renamed, thus invalidating the above cache?
> > 
> > i see no reason to keep the current behaviour for such a weird usage.
> > it'd be enough to return eg. EPERM.
> 
> Well, we still have to teach the NFS server to scan the mount point cache.

exactly.

> Actually this discussion has reminded me why we really can't change
> things. Fundamentally we mount file systems on vnodes, not directory-
> parent-plus-name tuples. So while we can come up with hint caches, at some 
> point we positively have to look at the vnode.
> 
> Consider a system with an NFS file system mounted, and a file system 
> mounted over one of the subdirectories. root-on-nfs and either /usr and/or 
> /var on separate file systems would be an example. Now say someone on the 
> NFS server renames the directory on which we have this other fs mounted. 
> Since the local file system is mounted on the vnode, it just moved.
> 
> Now say someone on this system goes to remove the newly-named directory.
> The current code will do the lookup, find the NFS client vnode on which we
> have the mount point, and complain. If however we went with parent dir + 
> name in rename, we would not get a match. Thus we would get into the NFS 
> code's remove routine and attempt to remove a mounted-on vnode.
> 
> While we could accept NFS's attempt to remove this directory (we'd trigger 
> silly-rename issues as the mount structure has a reference to the 
> directory, so it's not a free vnode), I think that would be sub-optimal. 
> So to stay consistent, we have to move a test into nfs_remove to check and 
> see if we have a vnode in-core (which we'd need anyway as we have to zap 
> said node), and check to see if it's mounted-on.

do you really want to support such a weird situation? :-)
if so, i think you want to eliminate f_mntonname and do getcwd-like thing
to get the path of the mountpoint.

> Oh, another issue with keeping the path component is that we break an 
> abstraction we have now. At present, only a file system knows how to 
> compare path component names for files on it. Making the (dvp, component) 
> cache would also require a way to have the upper layers request component 
> comparison. For instance, ffs is 8-bit ascii case sensitive. HFS is case 
> insenitive, as are FAT and NTFS. Short and long FAT file names add another 
> twist to the mix.

hm, a good point.

> Sounds like the best thing to do for now is leave VOP_RENAME() alone.

the current "VOP_LOOKUP for dirop" method is wasteful even for ufs.
ie. in-core inode is bloated unnecessarily to store a result of VOP_LOOKUP.
i guess it's worse for more complicated directory implementations.

YAMAMOTO Takashi