Subject: Dissection of GNU awk bugs on NetBSD-pmax
To: None <arnold@cc.gatech.edu>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-pmax
Date: 12/12/1994 13:40:39
>Just to be clear, there are two "bugs"
>
>a) output of inftest.awk --- seems to be strtod related, definitely a library bug
>b) last few characters of the line, does not go away when recompiled, but
> ultrix binary works.
>
>Is that a correct summary?
Well, mabye. I'm far from certain that apparently not seeing the last
few chars on a line is not the same bug...
below is the result of the following awkscript, run on input files
gawk-2.15.5/test/manpage, and diff'ing Ultrix and NetBSD-pmax output.
It seems to me like the last word on each line is consistently
being lost. It turns out that if I change the loop test from
``i <= NF'' to ``i <= NF + 1;'' the NetBSD awk gives the
``right'' answer. So it looks like NF is being set incorrectly.
Where might I look to see that? Is it plausible that gawk is keeping
its NF in floating-point internall, and a badly-broken floating point
ascii <-> decimal is causing this bug? That was my first guess...
I just hope this isn't a bug in the gcc-2.5.8 mips backend!
--Jonathan
PS: anyone know a good way to compose MIME messages with
multiple files included as a multipart MIME document, in MH??
--------begin example awkscript
# From Gawk Manual modified by bug fix and removal of punctuation
# Record every word which is used at least once
{
for (i = 1; i <= NF; i++) {
tmp = tolower($i)
if (0 != (pos = match(tmp, /([a-z]|-)+/)))
#used[substr(tmp, pos, RLENGTH)] = 1
print tmp
}
}
--------end example awkscript
Context diff of output of above script on NetBSD-pmax and ultrix:
*** /pescadero/u2/jonathan/TMP/netbsd-out Mon Dec 12 13:31:14 1994
--- /pescadero/u2/jonathan/TMP/ultrix-out Mon Dec 12 13:31:32 1994
***************
*** 1,9 ****
--- 1,12 ----
.ds
px
+ \s-1posix\s+1
.ds
ux
+ \s-1unix\s+1
.ds
an
+ \s-1ansi\s+1
.th
gawk
"may
***************
*** 11,17 ****
--- 14,22 ----
software
foundation"
"utility
+ commands"
.sh
+ name
gawk
\-
pattern
***************
*** 18,50 ****
--- 23,76 ----
scanning
and
processing
+ language
.sh
+ synopsis
.b
+ gawk
.b
+ \-w
.i
+ gawk-options
.bi
\-f\^
+ fs
.b
+ \-v
.ir
var
+ val
.b
+ \-f
.i
+ program-file
.b
+ \-\^\-
file
.br
.b
+ gawk
.b
+ \-w
.i
+ gawk-options
.bi
\-f\^
+ fs
.b
+ \-v
.ir
var
+ val
.b
+ \-\^\-
.i
+ program-text
file
.sh
+ description
.i
+ gawk
is
the
gnu
***************
*** 54,59 ****
--- 80,86 ----
the
awk
programming
+ language.
it
conforms
to
***************
*** 62,67 ****
--- 89,95 ----
of
the
language
+ in
the
\*(px
command
***************
*** 68,73 ****
--- 96,102 ----
language
and
utilities
+ standard
(draft
this
version
***************
*** 78,83 ****
--- 107,113 ----
on
the
description
+ in
.ir
"the
awk
***************
*** 87,92 ****
--- 117,123 ----
aho,
kernighan,
and
+ weinberger,
with
the
additional
***************
*** 97,110 ****
--- 128,145 ----
system
v
release
+ version
of
+ \*(ux
.ir
awk
.i
+ gawk
also
provides
some
gnu-specific
+ extensions.
.pp
the
command
***************
*** 112,118 ****
--- 147,155 ----
consists
of
options
+ to
.i
+ gawk
itself,
the
awk
***************
*** 122,143 ****
--- 159,189 ----
not
supplied
via
+ the
.b
+ \-f
option),
and
values
to
be
+ made
available
in
+ the
.b
+ argc
and
.b
+ argv
pre-defined
awk
+ variables.
.sh
+ options
.pp
.i
+ gawk
accepts
the
following
***************
*** 148,161 ****
--- 194,211 ----
available
on
any
+ implementation
of
the
awk
+ language.
.tp
.bi
\-f
+ fs
use
.i
+ fs
for
the
input
***************
*** 164,180 ****
--- 214,235 ----
(the
value
of
+ the
.b
+ fs
predefined
variable).
.tp
\fb\-v\fi
+ var\fr\^=\^\fival\fr
assign
the
+ value
.ir
val
to
the
+ variable
.ir
var
before
***************
*** 182,187 ****
--- 237,243 ----
of
the
program
+ begins.
such
variable
values
***************
*** 188,201 ****
--- 244,261 ----
are
available
to
+ the
.b
+ begin
block
of
an
awk
+ program.
.tp
.bi
\-f
+ program-file"
read
the
awk
***************
*** 203,208 ****
--- 263,269 ----
source
from
the
+ file
.ir
program-file
instead
***************
*** 212,224 ****
--- 273,289 ----
first
command
line
+ argument.
multiple
.b
+ \-f
options
may
be
+ used.
.tp
.b
+ \-\^\-
signal
the
end
***************
*** 232,237 ****
--- 297,303 ----
further
arguments
to
+ the
awk
program
itself
***************
*** 239,244 ****
--- 305,311 ----
start
with
a
+ ``\-''.
this
is
mainly
***************
*** 249,264 ****
--- 316,335 ----
argument
parsing
convention
+ used
by
most
other
\*(px
+ programs.
.pp
following
the
\*(px
+ standard,
.ir
gawk
+ -specific
options
are
supplied
***************
*** 265,273 ****
--- 336,348 ----
via
arguments
to
+ the
.b
+ \-w
option.
+ multiple
.b
+ \-w
options
may
be
***************
*** 278,283 ****
--- 353,359 ----
may
be
supplied
+ together
if
they
are
***************
*** 289,296 ****
--- 365,374 ----
in
quotes
and
+ separated
by
white
+ space.
case
is
ignored
***************
*** 297,322 ****
--- 375,411 ----
in
arguments
to
+ the
.b
+ \-w
option.
.pp
the
.b
+ \-w
option
accepts
the
following
+ arguments:
.tp
+ \w'\fbcopyright\fr'u+1n
.b
+ compat
run
+ in
.i
+ compatibility
mode.
in
compatibility
+ mode,
.i
+ gawk
behaves
identically
to
+ \*(ux
.ir
awk
none
***************
*** 325,336 ****
--- 414,428 ----
gnu-specific
extensions
are
+ recognized.
.tp
.pd
.b
+ copyleft
.tp
.pd
.b
+ copyright
print
the
short
***************
*** 341,355 ****
--- 433,451 ----
copyright
information
message
+ on
the
error
+ output.
.tp
.b
+ lint
provide
warnings
about
constructs
that
+ are
dubious
or
non-portable
***************
*** 356,391 ****
--- 452,501 ----
to
other
awk
+ implementations.
.tp
.b
+ posix
this
turns
+ on
.i
+ compatibility
mode,
with
the
following
additional
+ restrictions:
.rs
.tp
+ \w'\(bu'u+1n
\(bu
.b
+ \ex
escape
sequences
are
not
+ recognized.
.tp
\(bu
the
+ synonym
.b
+ func
for
the
+ keyword
.b
+ function
is
not
+ recognized.
.tp
\(bu
the
+ operators
.b
and
.b
***************
*** 394,399 ****
--- 504,510 ----
used
in
place
+ of
.b
and
.br
***************
*** 400,405 ****
--- 511,517 ----
.re
.tp
.b
+ version
print
version
information
***************
*** 407,416 ****
--- 519,531 ----
this
particular
copy
+ of
.i
+ gawk
on
the
error
+ output.
this
is
useful
***************
*** 421,429 ****
--- 536,547 ----
the
current
copy
+ of
.i
+ gawk
on
your
+ system
is
up
to
***************
*** 435,441 ****
--- 553,561 ----
the
free
software
+ foundation
is
+ distributing.
.pp
any
other
***************
*** 447,455 ****
--- 567,577 ----
but
are
otherwise
+ ignored.
.sh
awk
program
+ execution
.pp
an
awk
***************
*** 460,481 ****
--- 582,608 ----
sequence
of
pattern-action
+ statements
and
optional
function
+ definitions.
.rs
.pp
\fipattern\fb
\fiaction
statements\fb
+ }\fr
.br
\fbfunction
\finame\fb(\fiparameter
list\fb)
\fistatements\fb
+ }\fr
.re
.pp
.i
+ gawk
first
reads
the
***************
*** 482,489 ****
--- 609,618 ----
program
source
from
+ the
.ir
program-file
+ (s)
if
specified,
or
***************
*** 495,502 ****
--- 624,633 ----
on
the
command
+ line.
the
.b
+ \-f
option
may
be
***************
*** 506,512 ****
--- 637,645 ----
on
the
command
+ line.
.i
+ gawk
will
read
the
***************
*** 515,522 ****
--- 648,657 ----
as
if
all
+ the
.ir
program-file
+ s
had
been
concatenated
***************
*** 526,531 ****
--- 661,667 ----
useful
for
building
+ libraries
of
awk
functions,
***************
*** 537,539 ****
--- 673,676 ----
in
each
new
+ awk