Subject: Re: bin/30294
To: None <jdolecek@netbsd.org>
From: John Darrow <John.P.Darrow@wheaton.edu>
List: netbsd-bugs
Date: 07/09/2005 14:01:53
On Sat, Jul 02, 2005 at 08:51:52PM +0000, jdolecek@netbsd.org wrote:
> It behaves according to it's documentation - RS is described as
> 'input record separator' countrary to e.g. FS 'regular expression used
> to separate fields'. Apparently it's not supposed to be a RE, so
> this doesn't seem to be a bug.

I beg to differ:

1. It's a feature regression.  A script written according to the man
page of "the system awk shipped with 1.6.2 and earlier" no longer
works with "the system awk shipped with 2.0 and later".  _IF_ this
sort of feature regression is acceptable, it should be marked with
BIG WARNINGS in the man page, and the 2.0+ awk should _at least_ print
a warning (if not exit with an error) if a program attempts to assign
more than one character to RS.

2. The phrase "input record separator" does not provide any semantic
information as to the format of such a separator.  I cannot find
anywhere in the 2.0+ awk man page that specifies that RS (or ORS) is
limited to a single character.  Given its position shortly after FS in
the man page, it becomes a very reasonable assumption that the
semantics are identical, simply omitted the second time to avoid
longwindedness and redundancy in the man page.

3. It violates the POLA that the similarly-named FS "Field Separator"
and RS "Record Separator" would have such very different semantics.

4. While RS being a regular expression may be considered an
"extension" by purists, awk already implements other "extensions",
such as causing every character to be a separate field if FS (or the
third argument to the split function) is NULL.  (The 1.6.2/gawk man
page explicitly explains that such behavior is an extension, and
disables it with --traditional.)

5. awk already implements special case handling for the null RS
case (RS="").  Such special handling is not mentioned in the man page,
though it _was_ in the 1.6.2/gawk man page.  If awk can special case
for that non-single-character RS case, why shouldn't it also be able
to handle other non-single-character RS cases, _especially_ when it
already does so for FS?