Subject: Re: cache_*() case-insensitive searching
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
From: Jaromir Dolecek <dolecek@ics.muni.cz>
List: tech-kern
Date: 08/08/1999 08:24:03
Bill Sommerfeld wrote:
> Also, cache_lookup uses a hash table; it's somewhat hard to fold a
> case-insensitive string compare into that as you'd be looking in the
> wrong hash bucket most of the time.

If I rean the code correctly, the hash is over the directory vnode
capability number, so that it shouldn't affect case-[in]sensitivness
of the search.

> You can deal with this in a monocase filesystem by canonicalizing the
> case of the component name before calling cache_lookup() and
> cache_enter(); reverse lookups in the cache (for sys_getcwd) will get
> the canonical form of the name, which is presumably the right thing..

Monocase fs in tree is filecorefs only, right ?

> also, creates would have to keep around the canonical and
> non-canonical forms. 

Why ?

> However *case-preserving* case-insensitive filesystems makes life even
> more interesting..  and would require a mod to the lookup cache to
> store the canonical form and the form to return via the reverse-lookup
> code;

Something like:

--- sys/namei.h	Thu Jul  8 13:26:49 1999
+++ sys/namei.h.new	Sun Aug  8 07:21:56 1999
@@ -87,6 +87,7 @@ struct nameidata {
 		 */
 		char	*cn_pnbuf;	/* pathname buffer */
 		const char *cn_nameptr;	/* pointer to looked up name */
+		const char *cn_cnameptr; /* pointer to canonified name */
 		long	cn_namelen;	/* length of looked up component */
 		u_long	cn_hash;	/* hash value of looked up name */
 		long	cn_consume;	/* chars to consume in lookup() */
@@ -171,6 +172,7 @@ struct	namecache {
 	u_long	nc_vpid;		/* capability number of nc_vp */
 	char	nc_nlen;		/* length of name */
 	char	nc_name[NCHNAMLEN];	/* segment name */
+	char	*nc_cname;		/* canonical segment name */
 };
 
 #ifdef _KERNEL
--- kern/vfs_cache.c	Tue Mar 23 13:15:24 1999
+++ kern/vfs_cache.c.new	Sun Aug  8 07:49:57 1999
@@ -102,6 +102,7 @@ cache_lookup(dvp, vpp, cnp)
 {
 	register struct namecache *ncp;
 	register struct nchashhead *ncpp;
+	const char *cname;
 
 	if (!doingcache) {
 		cnp->cn_flags &= ~MAKEENTRY;
@@ -113,11 +114,12 @@ cache_lookup(dvp, vpp, cnp)
 		return (0);
 	}
 	ncpp = &nchashtbl[(cnp->cn_hash ^ dvp->v_id) & nchash];
+	cname = (cnp->cn_cnameptr) ? cnp->cn_cnameptr : cnp->cn_nameptr;
 	for (ncp = ncpp->lh_first; ncp != 0; ncp = ncp->nc_hash.le_next) {
 		if (ncp->nc_dvp == dvp &&
 		    ncp->nc_dvpid == dvp->v_id &&
 		    ncp->nc_nlen == cnp->cn_namelen &&
-		    !memcmp(ncp->nc_name, cnp->cn_nameptr, (u_int)ncp->nc_nlen))
+		    !memcmp(ncp->nc_cname, cnname, (u_int)ncp->nc_nlen))
 			break;
 	}
 	if (ncp == 0) {
@@ -281,6 +283,8 @@ cache_enter(dvp, vp, cnp)
 			LIST_REMOVE(ncp, nc_vhash);
 			ncp->nc_vhash.le_prev = 0;
 		}
+		if (ncp->nc_cname != ncp->nc_name)
+			FREE(ncp->nc_cname, M_NAMEI);
 	} else
 		return;
 	/* grab the vnode we just found */
@@ -299,6 +303,15 @@ cache_enter(dvp, vp, cnp)
 	ncp->nc_dvpid = dvp->v_id;
 	ncp->nc_nlen = cnp->cn_namelen;
 	memcpy(ncp->nc_name, cnp->cn_nameptr, (unsigned)ncp->nc_nlen);
+	/* if canonical name has been specified, copy it into dynamically
+	 * allocated buffer; otherwise, just make nc_cname point to nc_name */
+	if (cnp->cn_cnameptr) {
+		char *cname = MALLOC(ncp->nc_nlen, M_NAMEI);
+		memcpy(cname, cnp->cn_cnameptr, (unsigned int)ncp->nc_nlen);
+		ncp->nc_cname = cname;
+	}
+	else
+		ncp->nc_cname = ncp->nc_name;
 	TAILQ_INSERT_TAIL(&nclruhead, ncp, nc_lru);
 	ncpp = &nchashtbl[(cnp->cn_hash ^ dvp->v_id) & nchash];
 	LIST_INSERT_HEAD(ncpp, ncp, nc_hash);
@@ -391,6 +404,8 @@ cache_purgevfs(mp)
 			LIST_REMOVE(ncp, nc_vhash);
 			ncp->nc_vhash.le_prev = 0;
 		}
+		if (ncp->nc_cname != ncp->nc_name)
+			FREE(ncp->nc_cname, M_NAMEI);
 		/* cause rescan of list, it may have altered */
 		nxtcp = nclruhead.tqh_first;
 		TAILQ_INSERT_HEAD(&nclruhead, ncp, nc_lru);

Well, the fixed size string for nc_cname (or whatever, I'm not
attached to that name) may be more comfortable to work with, but
wastes space for very common case when the canonified name is same
as the non-canonified.

Abother one: I've seen a comment in sys/namei.h over definition
of struct namecache:
/*
 * This structure describes the elements in the cache of recent
 * names looked up by namei. NCHNAMLEN is sized to make structure
 * size a power of two to optimize malloc's. Minimum reasonable
 * size is 15.
 */
That comment doesn't seem to be correct even for 32bit archs
(size of that structure is 56+1+31 == 88) and even more wrong
for 64bit ones (112+1+31 == 154, assuming u_long is 8 bytes long).
The NCHNAMLEN should be adjusted so that the extra space is not wasted.
Optionally, the now-wasted-space can be used to store the canonified
name.

Opinions ?

Jaromir