performance of shell read loops

To: tech-userlevel%netbsd.org@localhost
Subject: performance of shell read loops
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Mon, 7 Mar 2016 05:21:35 +0000

Today it came up (in the context of something Christos did) that shell
read loops are horribly, horribly slow. I'd always assumed that this
was because shell read loops tend to fork at least once for every
iteration (so O(n) times) and a more pipeline-oriented approach tends
to fork O(1) times. While this is probably true it doesn't appear to
be the whole story.

In the following example the shell read loop doesn't fork, but it's
still monumentally slow compared to either sed or awk. (awk is
probably a fairer comparison as it executes the exact same logic; sed
is specialized for this sort of thing...)

   valkyrie% time ./script-sed.sh 
   8.907u 0.086s 0:09.16 98.0%     0+0k 0+1io 0pf+0w
   valkyrie% time ./script-awk.sh
   15.968u 0.101s 0:16.07 99.9%    0+0k 0+2io 0pf+0w
   valkyrie% time ./script-sh.sh
   91.311u 96.339s 3:07.93 99.8%   0+0k 0+2io 0pf+0w

pretty bad!

Some of this is doubtless because sh is required for stupid standards
reasons to reread the input file on every iteration. I wonder if
there's anything else going on... unfortunately I don't have any more
time to look into it this weekend, so I figured I'd post what I've
got.

   --- script-sed.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    sed < $FILE 's/#.*//;/^$/d'
done > /dev/null
   --- script-awk.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    awk < $FILE '{
	sub("#.*", "", $0);
	if (NF == 0) {
	    next;
	}
	print;
    }'
done > /dev/null
   --- script-sh.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    while read v; do
	v=${v%%#*}
	if [ -z "$v" ]; then
	    continue
	fi
	echo $v
    done < $FILE
done > /dev/null
   ------

-- 
David A. Holland
dholland%netbsd.org@localhost

Follow-Ups:
- Re: performance of shell read loops
  - From: Robert Elz
- Re: performance of shell read loops
  - From: Robert Elz

Prev by Date: Re: Google summer code 2016 project ideas
Next by Date: Re: performance of shell read loops
Previous by Thread: cp -i might violate POSIX
Next by Thread: Re: performance of shell read loops
Indexes:

Home | Main Index | Thread Index | Old Index