Subject: Re: Suggestion: keep binary data out of /etc
To: Bill Studenmund <skippy@macro.stanford.edu>
From: Steven J. Dovich <dovich@lethe.tiac.net>
List: current-users
Date: 02/09/1999 14:19:49
Bill Studenmund wrote:
> On Mon, 8 Feb 1999, Steven J. Dovich wrote:
> 
> > It seems that using grep as a filtering process in a pipeline would
> > be severely hampered by using this "binary file" interpretation.
> > Consider the following case:
> > 
> > 	grep foo ./* | grep bar | ...
> 
> I'm curius, how does that work with binary files? I mean what info do you
> get out of it?
> 
> grep's ideas of what is a "line" is weird for a binary file. So I'm unsure
> what good the second grep will be? And what is the end tool going to do
> when it's fed machine code?

The second grep is shorthand for any meaningful processing on the
data filtered by the first grep. Let me make this a bit more concrete.
Imagine an event logging system, similar to syslog, where events are
described by a record that consists of binary data followed by a textual
log message. It is traditional behavior for grep to produce all matching
records (even if they contain binary "junk"), and this filtered data
can then be processed by subsequent tools. As I understand the proposed
behavior, this filtered data stream would be replaced by a single text
line noting that the data source was considered "binary" and that
something matched. This would be a rather drastic change in function.

My concern is not with what happens with non-line-structured files which
contain binary data. Cases where current behavior would indeed be
meaningful stand to be broken if we adopt this sort of change. I worry
about tools that try too hard to second guess the user, particularly
when they use simple-minded heuristics.

> Just to reitterate: as I understand the proposal, you'll still be able to
> grep "binary" files (as well as "text" files). You'll still get a message
> when a match is found, I assume for each match. If it's a "text" file, you
> see the whole line, just like now. If it's a "binary" file, you get a
> match message.

I guess my puzzlement is over when grep made the transition from a
filtering tool to a logical match detector. Granted, the filtering
capability presumed a certain structure to the data. Any data not in
that structure gets degraded function, though still useful.

I would also think that relevant standards (POSIX.2, SVID, SUS) have
something to say about the behavior expected from grep. Particularly
in the default case where no alternate behavior has been specified.
The binary matching message would be acceptable if it were dependent
on an option switch (but not as the default behavior).

> I think the message you'd get was mentioned early in the thread, but I'd
> hope it includes both the filename with a colon (like normal multi-file
> grep) and the matched string. So that if I grep for [aA]untie, I see which
> files have auntie, which Auntie, and which occurences of both.

This is reasonable, though I might suggest that a match message be
emitted for each matching occurrence, so tallying in a pipeline will
continue to be meaningful. And... (loose cannonball rolls past)  What
about grep -v on binary data w/ binary match message in effect?

I admit, I have torched my term state with binary output, I just
never made the leap to preventing it by changing the behavior of
grep. Reductio ad absurdium: You know, I have wedged my term state
more frequently with cat than with grep maybe if we...

Anyway, thanks Bill for prodding me to clarify.

/sjd