odd awk memory leak?

To: tech-userlevel%netbsd.org@localhost
Subject: odd awk memory leak?
From: Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Sun, 27 Oct 2013 03:48:21 -0400 (EDT)

I'm seeing an odd memory leak in awk.  This manifests on 4.0.1 and 5.2,
though something about the underlying system means that it doesn't
actually crash on 5.2.  (On 5.2, it blows past its soft datasize limit
and just keeps on growing, whereas on 4.0.1 it fails with an
out-of-memory error when it hits its datasize limit.)

I have a big data blob, which gets processed by an awk program into a
shell script; the details of the shell script don't matter, because
even if I redirect the script to /dev/null I still see the odd
behaviour.  There's nothing secret involved, so I've made the whole
thing available on ftp.rodents-montreal.org, in
/mouse/misc/awk-bug.in.bz2 and awk-bug.sh.  (Look at the shell script -
it just runs awk; this was slightly more convenient in my use case.)

If you just "bunzip2 < awk-bug.in.bz2 | sh awk-bug.sh > /dev/null", on
4.0.1 I see (for example)

[Chip] 24> date; bunzip2 < awk-bug.in.bz2 | sh awk-bug.sh > /dev/null; date
Sun Oct 27 02:37:58 EDT 2013
awk: out of memory in array
 input record number 1788264, file 
 source line number 3
Sun Oct 27 02:38:57 EDT 2013
[Chip] 25> 

top indicates that awk is growing by tens of megabytes a second during
this.  On 5.2, awk still grows like mad, but instead of crashing when
it reaches its datasize limit, it just blows past it and keeps on
going.  I even tried building 5.2's awk source on 4.0.1; it misbehaves
the same as 4.0.1's awk (not surprising in view of what top reports).

This wouldn't be a surprise (well, except for 5.2 not enforcing the
datasize limit), except that, as far as I can see, the awk program does
nothing that should accumulate memory.

A friend tried it with GNU awk on Linux and says it completed just
fine.  I tried it on 1.4T/sparc, whose awk is (according to the
manpage) GNU awk, and, while I didn't wait anything like long enough
for it to finish, top does report the awk process failing to grow (it's
been multiple minutes and awk still hasn't gone above 364K/156K; even
though it's a much slower machine, I would expect _some_ growth if the
problem were present).  This reinforces my impression that the program
shouldn't be gobbling memory.

So, I'm tentatively considering this a bug in awk.  Any confirmation or
rebuttal of that opinion?  If confirmation, does anyone have any
thoughts on tracking it down?  I've never gone grubbing around inside
any awk implementation....

Of course, I don't know whether this behaviour is present post-5.x.
But it might be worth trying it to see; even on a modern fast machine,
it should take long enough to run that top should see the awk process
growing, or not.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: odd awk memory leak?
  - From: Michael van Elst

Prev by Date: Re: Lua in-kernel (lbuf library)
Next by Date: Re: odd awk memory leak?
Previous by Thread: mrand48 broken on 64-bit systems
Next by Thread: Re: odd awk memory leak?
Indexes:

Home | Main Index | Thread Index | Old Index