Subject: Re: Recursive grep (where is limfree defined?)
To: Charles M. Hannum <mycroft@NetBSD.ORG>
From: Don Lewis <gdonl@gv.ssi1.com>
List: current-users
Date: 02/02/1996 03:12:35
On Jan 30,  3:00am, "Charles M. Hannum" wrote:
} Subject: Re: Recursive grep (where is limfree defined?)
} 
}    If you're going to port something, _port_ it.  Make it work EVERYwhere.
}    Otherwise it's as useless to the general populace as the scripts I keep
}    around for doing my job.
} 
} By that logic, we would be compelled to remove large chunks of our
} source tree.  Such an attitude is why it often takes years for a
} better tool to be adopted (c.f. compress vs. gzip wars), and if
} anything only stunts progress.
} 
} On the other hand, I have mixed feeling about a `recursive grep'.
} This is not an endorsement.

I'm also not sure that `recursive grep' is an improvement.

  1) It is argued that beginning users won't know that they can use
     find to execute grep recursively, whereas the grep man page
     would tell them about -R.  Ok, beginning users get off easy until
     they need to run some other command recursively at which time
     they need to learn about find.

  2) The new options to grep that control recursion are slightly different
     than the recursion options on chmod, chown, etc. due to a collision
     with an existing grep option.  If other commands have recursion
     added, how many different variations will there be?  Is it easier
     to memorize N variations on the options for recursion (or look
     them up each time), or just learn find which always works the same
     way.

  3) There's no way to just grep *.c files or whatever (unless you
     add another option to grep to specify a glob pattern).  If you
     want to do this, you have to learn how to use find.

  4) Even with the new grep option to exclude binary files, a recursive
     grep of a directory tree containing a substantial percentage of
     binary files will run slower than the find/xargs method if the
     file type can be distinguished with a glob pattern because the
     recursive grep will open and read a portion of each binary file
     while the find/xargs method will skip these files.

The disadvantages of find/xargs are:

  1) It's not obvious to new users.

  2) It's cumbersome to type.

  3) The standard versions of find/xargs are not safe and robust.

  4) grep -R probably is more efficient than find/xargs if you truly
     want to grep everything in the tree, since its filesystem
     references will have better locality.

Problem 3 can be fixed with the -print0 and -0 flags at the expense of
worsening problems 2 and 1.

I would propose that in addition to the -print0 and -0 flags, that
a flag be added to find that is similar to -exec, but which gathers
a number of file names together before running the command.  This
would have the efficiency of xargs, but could be made safe since the
arguments could be passed directly to the exec() call rather than
being parsed, avoiding problems with nasty characters embedded in
the arguments.  This enhancement to find would help a bit with
problem 2.

Unfortunately, neither of these fixes helps if you want to
	find whatever -print | filter | xargs command
since most off the shelf filter commands don't operate on NUL delimited
strings.

			---  Truck