Subject: Re: bin/8681: grep may bomb out with "memory exhausted"
To: Simon Burge <simonb@netbsd.org>
From: R. C. Dowdeswell <elric@mabelode.imrryr.org>
List: netbsd-bugs
Date: 10/27/1999 22:32:20
On 941071178 seconds since the Beginning of the UNIX epoch
Simon Burge wrote:
>
>This is what the maintainer of grep has to say:
>
>	GNU grep needs to buffer a line of input in main memory.  If your
>	input lines are too long to fit into your main memory, then the right
>	thing to do is to get more memory.  See the section `Memory Usage' in
>	the GNU coding standards for more.

But be that as it may, why is grep using all of that memory, I
wonder.  Why not mmap(2) the file MAP_SHARED and PROT_READ.  It's
not like grep is going to modify the pages, and shared pages
shouldn't be counted towards memory usage as they are only backed
on disk.  Perhaps this gives odd characteristics if someone else
modifies the file, but those'll already exist with read(2) and
write(2).

Even if you just do read/write

$ grep foo big_file
grep: mem exhausted or some such.

and got the message myself.  This is a pretty simple state machine,
I can't see any reason that with read and write this would need to
use any memory at all -- just save the position of the beginning
of the line and run the line through the state machine, if it
matches and your buffer is too small, seek to the start of the line
and output from there to the end.  Sure that could be slow, but...

Perhaps for some of the more complicated expressions with back
references might be hard to do that way.  I haven't given it enough
thought.

In short, I don't think that this should be a problem.  Wasn't there
someone writing a BSD grep to replace GNU grep?  What came of that?
Does that have this limitation?

 == Roland Dowdeswell                      http://www.Imrryr.ORG/~elric/  ==
 == The Unofficial NetBSD Web Pages        http://www.Imrryr.ORG/NetBSD/  ==
 == The NetBSD Project                            http://www.NetBSD.ORG/  ==