Subject: Re: AW: Recursive grep (where is limfree defined?)
To: None <current-users@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: current-users
Date: 02/02/1996 08:45:53
> The whole matter might be moot, however.  I have been trying to crash
> grep (rgrep, without the -a argument) on large binary files, and I
> can't do it.  The largest binary file that I have on my system is a
> 30M gzipped dump file, and grep happily searches through that without
> problems.  In fact, it searches through the whole directory tree
> containing 170M of similar files.

> Maybe I am just lucky, or maybe grep isn't as broken as people think.

I think you may be lucky.  Try cloning that tree and running each file
through tr to delete every newline, then try it again.  I just now
tried an experiment:

	[Daily-Planet] 143> cd /usr/src
	[Daily-Planet] 144> tar cf - [a-z]* | gzip --fast | tr -d \\n > .foo &
...wait until .foo is up to about 25 megabytes...
	[Daily-Planet] 145> grep SUBDIR Makefile lib/Makefile
	Makefile:SUBDIR+= lib include bin libexec sbin usr.bin usr.sbin share games
	Makefile:SUBDIR+= gnu
	Makefile:SUBDIR+= sys
	Makefile:SUBDIR+= domestic
	Makefile:SUBDIR+= regress
	lib/Makefile:SUBDIR=	csu libarch libc libcompat libcrypt libcurses libedit libkvm libl \
	[Daily-Planet] 146> grep SUBDIR Makefile .foo lib/Makefile
	Makefile:SUBDIR+= lib include bin libexec sbin usr.bin usr.sbin share games
	Makefile:SUBDIR+= gnu
	Makefile:SUBDIR+= sys
	Makefile:SUBDIR+= domestic
	Makefile:SUBDIR+= regress
	grep: memory exhausted
	[Daily-Planet] 147> 

Note that it did not find SUBDIR in lib/Makefile the second time, which
counts as broken in the original context - using grep with find|xargs
to search an entire hierarchy, where there almost certainly will be
more filenames after the putative problem file.

As for being lucky, not much luck is needed.  Assuming gzip's output is
high-entropy (which is the goal of any compression program), it will
contain a newline approximately every 256 bytes, average.  Runs of more
than, say, 64K without a newline will be extremely rare.  And grep
doesn't get upset until well above that, unless you're quite short of
virtual memory.

This isn't much of a danger with gzip output, because it tends to be
fairly random and thus has a reasonable density of newlines.  But if
you have files generated some way that makes them very newline-sparse,
this can be a problem.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu