Subject: Dissection of GNU awk bugs on NetBSD-pmax
To: None <arnold@cc.gatech.edu>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-pmax
Date: 12/12/1994 13:40:39
>Just to be clear, there are two "bugs"
>
>a) output of inftest.awk --- seems to be strtod related, definitely a library bug
>b) last few characters of the line, does not go away when recompiled, but
>   ultrix binary works.
>
>Is that a correct summary?

Well, mabye.  I'm far from certain that apparently not seeing the last
few chars on a line is not the same bug...

below is the result of the following awkscript, run on input files
gawk-2.15.5/test/manpage, and diff'ing Ultrix and NetBSD-pmax output.
It seems to me like the last word on each line is consistently
being lost.  It turns out that if I change the loop test from
``i <= NF'' to ``i <= NF + 1;'' the NetBSD awk gives the
``right'' answer. So it looks like NF is being set incorrectly.

Where might I look to see that? Is it plausible that gawk is keeping
its NF in floating-point internall, and a badly-broken floating point
ascii <-> decimal is causing this bug?  That was my first guess...

I just hope this isn't a bug in the gcc-2.5.8 mips backend!

--Jonathan
PS: anyone know a good way to compose MIME messages with
multiple files included as a multipart MIME document, in MH??



--------begin example awkscript
# From Gawk Manual modified by bug fix and removal of punctuation
# Record every word which is used at least once
{
        for (i = 1; i <= NF; i++) {
                tmp = tolower($i)
                if (0 != (pos = match(tmp, /([a-z]|-)+/)))
                        #used[substr(tmp, pos, RLENGTH)] = 1
                        print tmp
        }
}

--------end example awkscript


Context diff of output of above script on NetBSD-pmax and ultrix:


*** /pescadero/u2/jonathan/TMP/netbsd-out	Mon Dec 12 13:31:14 1994
--- /pescadero/u2/jonathan/TMP/ultrix-out	Mon Dec 12 13:31:32 1994
***************
*** 1,9 ****
--- 1,12 ----
  .ds
  px
+ \s-1posix\s+1
  .ds
  ux
+ \s-1unix\s+1
  .ds
  an
+ \s-1ansi\s+1
  .th
  gawk
  "may
***************
*** 11,17 ****
--- 14,22 ----
  software
  foundation"
  "utility
+ commands"
  .sh
+ name
  gawk
  \-
  pattern
***************
*** 18,50 ****
--- 23,76 ----
  scanning
  and
  processing
+ language
  .sh
+ synopsis
  .b
+ gawk
  .b
+ \-w
  .i
+ gawk-options
  .bi
  \-f\^
+ fs
  .b
+ \-v
  .ir
  var
+ val
  .b
+ \-f
  .i
+ program-file
  .b
+ \-\^\-
  file
  .br
  .b
+ gawk
  .b
+ \-w
  .i
+ gawk-options
  .bi
  \-f\^
+ fs
  .b
+ \-v
  .ir
  var
+ val
  .b
+ \-\^\-
  .i
+ program-text
  file
  .sh
+ description
  .i
+ gawk
  is
  the
  gnu
***************
*** 54,59 ****
--- 80,86 ----
  the
  awk
  programming
+ language.
  it
  conforms
  to
***************
*** 62,67 ****
--- 89,95 ----
  of
  the
  language
+ in
  the
  \*(px
  command
***************
*** 68,73 ****
--- 96,102 ----
  language
  and
  utilities
+ standard
  (draft
  this
  version
***************
*** 78,83 ****
--- 107,113 ----
  on
  the
  description
+ in
  .ir
  "the
  awk
***************
*** 87,92 ****
--- 117,123 ----
  aho,
  kernighan,
  and
+ weinberger,
  with
  the
  additional
***************
*** 97,110 ****
--- 128,145 ----
  system
  v
  release
+ version
  of
+ \*(ux
  .ir
  awk
  .i
+ gawk
  also
  provides
  some
  gnu-specific
+ extensions.
  .pp
  the
  command
***************
*** 112,118 ****
--- 147,155 ----
  consists
  of
  options
+ to
  .i
+ gawk
  itself,
  the
  awk
***************
*** 122,143 ****
--- 159,189 ----
  not
  supplied
  via
+ the
  .b
+ \-f
  option),
  and
  values
  to
  be
+ made
  available
  in
+ the
  .b
+ argc
  and
  .b
+ argv
  pre-defined
  awk
+ variables.
  .sh
+ options
  .pp
  .i
+ gawk
  accepts
  the
  following
***************
*** 148,161 ****
--- 194,211 ----
  available
  on
  any
+ implementation
  of
  the
  awk
+ language.
  .tp
  .bi
  \-f
+ fs
  use
  .i
+ fs
  for
  the
  input
***************
*** 164,180 ****
--- 214,235 ----
  (the
  value
  of
+ the
  .b
+ fs
  predefined
  variable).
  .tp
  \fb\-v\fi
+ var\fr\^=\^\fival\fr
  assign
  the
+ value
  .ir
  val
  to
  the
+ variable
  .ir
  var
  before
***************
*** 182,187 ****
--- 237,243 ----
  of
  the
  program
+ begins.
  such
  variable
  values
***************
*** 188,201 ****
--- 244,261 ----
  are
  available
  to
+ the
  .b
+ begin
  block
  of
  an
  awk
+ program.
  .tp
  .bi
  \-f
+ program-file"
  read
  the
  awk
***************
*** 203,208 ****
--- 263,269 ----
  source
  from
  the
+ file
  .ir
  program-file
  instead
***************
*** 212,224 ****
--- 273,289 ----
  first
  command
  line
+ argument.
  multiple
  .b
+ \-f
  options
  may
  be
+ used.
  .tp
  .b
+ \-\^\-
  signal
  the
  end
***************
*** 232,237 ****
--- 297,303 ----
  further
  arguments
  to
+ the
  awk
  program
  itself
***************
*** 239,244 ****
--- 305,311 ----
  start
  with
  a
+ ``\-''.
  this
  is
  mainly
***************
*** 249,264 ****
--- 316,335 ----
  argument
  parsing
  convention
+ used
  by
  most
  other
  \*(px
+ programs.
  .pp
  following
  the
  \*(px
+ standard,
  .ir
  gawk
+ -specific
  options
  are
  supplied
***************
*** 265,273 ****
--- 336,348 ----
  via
  arguments
  to
+ the
  .b
+ \-w
  option.
+ multiple
  .b
+ \-w
  options
  may
  be
***************
*** 278,283 ****
--- 353,359 ----
  may
  be
  supplied
+ together
  if
  they
  are
***************
*** 289,296 ****
--- 365,374 ----
  in
  quotes
  and
+ separated
  by
  white
+ space.
  case
  is
  ignored
***************
*** 297,322 ****
--- 375,411 ----
  in
  arguments
  to
+ the
  .b
+ \-w
  option.
  .pp
  the
  .b
+ \-w
  option
  accepts
  the
  following
+ arguments:
  .tp
+ \w'\fbcopyright\fr'u+1n
  .b
+ compat
  run
+ in
  .i
+ compatibility
  mode.
  in
  compatibility
+ mode,
  .i
+ gawk
  behaves
  identically
  to
+ \*(ux
  .ir
  awk
  none
***************
*** 325,336 ****
--- 414,428 ----
  gnu-specific
  extensions
  are
+ recognized.
  .tp
  .pd
  .b
+ copyleft
  .tp
  .pd
  .b
+ copyright
  print
  the
  short
***************
*** 341,355 ****
--- 433,451 ----
  copyright
  information
  message
+ on
  the
  error
+ output.
  .tp
  .b
+ lint
  provide
  warnings
  about
  constructs
  that
+ are
  dubious
  or
  non-portable
***************
*** 356,391 ****
--- 452,501 ----
  to
  other
  awk
+ implementations.
  .tp
  .b
+ posix
  this
  turns
+ on
  .i
+ compatibility
  mode,
  with
  the
  following
  additional
+ restrictions:
  .rs
  .tp
+ \w'\(bu'u+1n
  \(bu
  .b
+ \ex
  escape
  sequences
  are
  not
+ recognized.
  .tp
  \(bu
  the
+ synonym
  .b
+ func
  for
  the
+ keyword
  .b
+ function
  is
  not
+ recognized.
  .tp
  \(bu
  the
+ operators
  .b
  and
  .b
***************
*** 394,399 ****
--- 504,510 ----
  used
  in
  place
+ of
  .b
  and
  .br
***************
*** 400,405 ****
--- 511,517 ----
  .re
  .tp
  .b
+ version
  print
  version
  information
***************
*** 407,416 ****
--- 519,531 ----
  this
  particular
  copy
+ of
  .i
+ gawk
  on
  the
  error
+ output.
  this
  is
  useful
***************
*** 421,429 ****
--- 536,547 ----
  the
  current
  copy
+ of
  .i
+ gawk
  on
  your
+ system
  is
  up
  to
***************
*** 435,441 ****
--- 553,561 ----
  the
  free
  software
+ foundation
  is
+ distributing.
  .pp
  any
  other
***************
*** 447,455 ****
--- 567,577 ----
  but
  are
  otherwise
+ ignored.
  .sh
  awk
  program
+ execution
  .pp
  an
  awk
***************
*** 460,481 ****
--- 582,608 ----
  sequence
  of
  pattern-action
+ statements
  and
  optional
  function
+ definitions.
  .rs
  .pp
  \fipattern\fb
  \fiaction
  statements\fb
+ }\fr
  .br
  \fbfunction
  \finame\fb(\fiparameter
  list\fb)
  \fistatements\fb
+ }\fr
  .re
  .pp
  .i
+ gawk
  first
  reads
  the
***************
*** 482,489 ****
--- 609,618 ----
  program
  source
  from
+ the
  .ir
  program-file
+ (s)
  if
  specified,
  or
***************
*** 495,502 ****
--- 624,633 ----
  on
  the
  command
+ line.
  the
  .b
+ \-f
  option
  may
  be
***************
*** 506,512 ****
--- 637,645 ----
  on
  the
  command
+ line.
  .i
+ gawk
  will
  read
  the
***************
*** 515,522 ****
--- 648,657 ----
  as
  if
  all
+ the
  .ir
  program-file
+ s
  had
  been
  concatenated
***************
*** 526,531 ****
--- 661,667 ----
  useful
  for
  building
+ libraries
  of
  awk
  functions,
***************
*** 537,539 ****
--- 673,676 ----
  in
  each
  new
+ awk