Subject: bin/5224: locate.updatedb could really use an exclusion feature
To: None <gnats-bugs@gnats.netbsd.org>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: netbsd-bugs
Date: 03/28/1998 10:01:04
>Number:         5224
>Category:       bin
>Synopsis:       locate.updatedb could really use an exclusion feature
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 28 07:05:00 1998
>Last-Modified:
>Originator:     John F. Woods
>Organization:
Misanthropes-R-Us
>Release:        NetBSD 1.3E
>Environment:
	
System: NetBSD jfwhome.funhouse.com 1.3E NetBSD 1.3E (JFW) #1: Sat Mar 14 21:56:11 EST 1998 jfw@jfwhome.funhouse.com:/usr/src/sys/arch/i386/compile/JFW i386


>Description:
	Last night, my console displayed two disturbing messages:
Mar 28 03:48:00 jfwhome /netbsd: Accounting suspended
Mar 28 03:48:15 jfwhome /netbsd: Accounting resumed
Eventually, I tracked it down to /usr/libexec/locate.updatedb having almost
but not quite used up the 40MB of free space on my /var partition.  How did
it manage that?  I have an archive of talk.bizarre articles which currently
has about 240000 article files, so a "find / -print" on my system generates
a *lot* of output.
I suspect there are a lot of sites which have filesystems with lots of files
which no one is ever likely to want to locate (certainly any news partition
qualifies); it would be convenient if there were a defined mechanism to tell
locate.updatedb not to index certain files or hierarchies.  (Shutting off
world-access to my news archive would mean that my HTTP server would have
to be privileged in order to access the files.)

>How-To-Repeat:
	Run netnews with a lot of newsgroups.  Leave 40MB free on /var.
	Let /etc/weekly run.  Lather, rinse, repeat.
>Fix:
	The obvious insertion point for such a feature is the find command
of locate.updatedb:

  find ${SRCHPATHS} \( ! -fstype local -o -fstype fdesc -o -fstype kernfs
                       -o -path '/var/spool/news/archive/*' \) -a \
  		-prune -o -print | \

but it's annoying to have to edit the actual script (and re-edit it the next
time I install an updated build).  Perhaps some script could be devised to
take a list of exception directories from some file, massage it into a
"-o -path" list, and slam it into the find command.  It might be more flexible,
though, to allow for inserting an arbitrary filter command between find and
tr (someone might well want to be able to locate files under /var/spool/news
whose names are non-numeric).
>Audit-Trail:
>Unformatted: