tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

grep and options that print matches

Hi all,
while looking again at the BSD grep code, I stumbled over the
inconsistent behavior of GNU grep when there are overlapping matches.
If either --color or --only-matching are specified, it becomes important
to decide what part of a line matches and when using more than one
expression, in which order. To give a few examples to show this issue:

echo abcde | grep -o -e 'ab' -e 'cde'

This prints two lines, "ab" and "cde". This is expected behavior.

echo abcde | grep -o -e 'abc' -e 'cde'

This prints one line, "abc". IMO this is wrong -- the second pattern
certainly matches the input line and should get output.

echo abcdeabc | grep -o -e 'ab' -e 'cde'

This prints three lines. The newer grep versions justify this by a
change in the man page (-o prints each match on a separate line). It
doesn't exactly explain the order ("ab", "cde", "ab") though. I would
consider "ab", "ab", "cde" as output quite a bit more logical.

echo abc | grep -o -e '..'

This prints one line, "ab". This means the match is greedy, even though
it is documented nowhere.

echo abcd | grep -o -e '..' -e '.*'
This prints one line, "abcd". So the longest match wins.

echo abcd | grep -o -e '..' -e 'b.*'
...but only, if they start at the same place.

Color output uses the same rules.

To summarize, match selection for GNU grep is a earliest longest
match with no overlap. Now the important question: does that make sense
and do we want that behavior for BSD grep.


Home | Main Index | Thread Index | Old Index