Re: bin/59766: awk does not handle RS="\0"

To: gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,dholland%NetBSD.org@localhost
Subject: Re: bin/59766: awk does not handle RS="\0"
From: "Martin Neitzel via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Mon, 17 Nov 2025 16:00:02 +0000 (UTC)

The following reply was made to PR bin/59766; it has been noted by GNATS.

From: Martin Neitzel <neitzel%hackett.marshlabs.gaertner.de@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/59766: awk does not handle RS="\0"
Date: Mon, 17 Nov 2025 16:49:44 +0100 (CET)

 I'm not at all against fixing things which are easily fixed
 but for this report the following context should be kept
 in mind:

 (1) POSIX/Single Unix Specification says (in "Shell & Utilies", "awk"):

 	Input files to the awk program from any of the following sources
 	shall be text files:

 with "text files" having a specific meaning, clarified in the "Base
 Definitions", Definitions", "Text File":

 	A file that contains characters organized into one or more
 	lines.  The lines do not contain NUL characters [...]

 That is, awk(1) is strictly speaking not the proper tool to deal
 with "find -print0" output in first place, and any support for
 that would be a (non-portable) extension.

 (2) awk's RS has a special meaning when it "is NULL":  paragraphs
 (separated by empty lines) become the records, lines the fields.
 This was historically new with nawk ("the one true awk"), and
 POSIX demands it, too:

 RS
 	The first character of the string value of RS shall be the
 	input record separator; a <newline> by default. If RS
 	contains more than one character, the results are unspecified.
 	If RS is null, then records are separated by sequences
 	consisting of a <newline> plus one or more blank lines,
 	leading or trailing blank lines shall not result in empty
 	records at the beginning or end of the input, and a <newline>
 	shall always be a field separator, no matter what the value
 	of FS is.

 The NetBSD-9-stable awk(1) man page is failing to point this out but
 implements it just nicely:

 $ man awk | awk -v RS= '/split/ {print NR, $0 "\n"}'
 man: Formatting manual page...
 14      An input line is normally made up of fields separated by white space, or
      by regular expression FS.  The fields are denoted $1, $2, ..., while $0
      refers to the entire line.  If FS is null, the input line is split into
      one field per character.

 49      split(s, a, [fs])
              splits the string s into array elements a[1], a[2], ..., a[n],
              and returns n.  The separation is done with the regular
              expression fs or with the field separator FS if fs is not given.
              An empty string as field separator splits the string into one
              array element per character.

 $ 

 An   RS=""   is the canonical way to set this within a script, and
 I'd assume an   RS="\0"  to act not any different.

 I haven't had a look at the suggested patch but this "paragraph
 behaviour" should certainly not be broken.

 Martin

Prev by Date: Re: bin/59773: static NAT not allowed with inet4(interface) and inet6(interface) in NPF
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: bin/59766: awk does not handle RS="\0"
Next by Thread: Re: bin/59766: awk does not handle RS="\0"
Indexes:

Home | Main Index | Thread Index | Old Index