Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/external/historical/nawk/bin



On Mon, Jul 04, 2022 at 20:06:45 +0200, Roland Illig wrote:

> Am 04.07.2022 um 13:14 schrieb Valery Ushakov:
> > On Mon, Jul 04, 2022 at 08:04:57 +0200, Roland Illig wrote:
> > 
> > > 04.07.2022 01:54:16 Valery Ushakov <uwe%stderr.spb.ru@localhost>:
> > > > On Mon, Jul 04, 2022 at 00:07:23 +0200, Roland Illig wrote:
> > > > 
> > > > > Am 03.07.2022 um 21:55 schrieb Valery Ushakov:
> > > > > > On Sun, Jul 03, 2022 at 10:56:22 +0000, Roland Illig wrote:
> > > > > > 
> > > > > > > Module Name:    src
> > > > > > > Committed By:   rillig
> > > > > > > Date:       Sun Jul  3 10:56:22 UTC 2022
> > > > > > > 
> > > > > > > Modified Files:
> > > > > > >      src/external/historical/nawk/bin: awk.1
> > > > > > > 
> > > > > > > Log Message:
> > > > > > > awk.1: remove trailing space in output of 'echo' example program
> > > > > > 
> > > > > > This is cure worse than the desease.  Please revert.
> > > > > 
> > > > > Why is it worse?
> > > > 
> > > > It's ugly
> > > 
> > > Why is it ugly?
> > > 
> > > > and complicated (for an example),
> > > 
> > > Why is it complicated? It's still only 3 lines of code.
> > > 
> > > > it obscures the point this example tries to make.
> > > 
> > > What is this (single?) point this example tries to make? To me, it
> > > was how to write a BEGIN program that uses ARGV, and my rewritten
> > > code still illustrates this.
> > 
> > You have turned a trivial for loop that requires no mental bandwidth
> > to skim over into a code review errand with complex case analysis and
> > ugly inverted 1 < ARGC to boot.
> 
> I intentionally wrote '1 < ARGC' instead of 'ARGC > 1' to make the
> condition as close as possible to the condition in the 'for' loop.  If
> it weren't for this symmetry, I would of course have written it in the
> subject-first manner.
[...]
> > but this case is not
> > about code doing something specified with aesthetic considerations
> > being secondary to its actually doing the job.  Here taste is an
> > important factor b/c it's not code, but a man page - a literally work
> > really.
> 
> That's exactly my point.  The manual page is our reference
> documentation, and as such, it should demonstrate and teach best
> practices.  Presenting a program that is _almost correct_ misses this
> point, and I don't want to see any code derived from an _almost correct_
> example.  Mixing such code with other _almost correct_ example code will
> quickly lead to programs with bugs everywhere.
> 
> An example for this would be the common usage of <ctype.h> functions,
> which really many people get wrong, either by reading sloppily produced
> teaching material or incomplete documentation or by copying code
> snippets that seem to work.

So this needs the symmetry with the next line and the reader needs to
register and process that symmetry.  Then the fun part comes, the
reader needs to match the two bodies (if and for) and figure out if
and how they differ and find that visually elusive whitespace.  And
this has already snowballed into quite a code review errand.

ctype and the C integer promotion rules that it touches (often
inappropriately) go to some pretty dark corners of the C standard.
Proper examples of correct use are essential for that man page and
those examples are kinda the focal point of it.

Here the point of the example is a BEGIN-only program that doesn't
process any input.  Any content of that BEGIN action is rather
unimportant, a kind of "lorem ipsum" filler almost.  So turning it
into an exercise in correct re-implementation of echo(1) draws
attention to the wrong thing.  Consider that extra space in the output
to be poetic license.

Posix has the echo example as:

  BEGIN  {
          for (i = 1; i < ARGC; ++i)
          printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
  }

which I would still consider ugly :).  I would rather use instead:

  BEGIN { for (i = 1; i < ARGC; ++i) $i = ARGV[i]; print }

that, as a bonus, demonstrates $var field references (that the current
man page doesn't mention at all) and the magic $0 "reassembly".


> > PS: BTW, you also eliminated the "exit" at the end of the BEGIN
> > action.  This is not specified by POSIX and happens to work on all
> > major three - nawk, mawk, gawk, though only gawk seems to document
> > this and only in the info file, not its man page (though its man page
> > has exampes that rely on this behaviour).
> > 
> > E.g. solaris /usr/bin/awk will still wait for input to consume and
> > discard it.
> 
> Ouch, thanks for telling me.  I was so sure that POSIX had specified
> this that I didn't bother to look it up again.  Now I did, and I will
> revert my changes.

Your g/c'ing the "exit" actually piqued my curiosity, b/c that's not
how I remeber my awk.  I verified that 2.9BSD has awk that always
processes its input so that explains why I remember it that way.  Of
course it doesn't support ARGC/ARGV.

Anyway, I was wrong about POSIX - I somehow managed to screw up my
search and missed the passage that specifies it:

  If an awk program consists of only actions with the pattern BEGIN,
  and the BEGIN action contains no getline function, awk shall exit
  without reading its input when the last statement in the last BEGIN
  action is executed. If an awk program consists of only actions with
  the pattern END or only actions with the patterns BEGIN and END, the
  input shall be read before the statements in the END actions are
  executed.

-uwe


Home | Main Index | Thread Index | Old Index