tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

performance of shell read loops



Today it came up (in the context of something Christos did) that shell
read loops are horribly, horribly slow. I'd always assumed that this
was because shell read loops tend to fork at least once for every
iteration (so O(n) times) and a more pipeline-oriented approach tends
to fork O(1) times. While this is probably true it doesn't appear to
be the whole story.

In the following example the shell read loop doesn't fork, but it's
still monumentally slow compared to either sed or awk. (awk is
probably a fairer comparison as it executes the exact same logic; sed
is specialized for this sort of thing...)

   valkyrie% time ./script-sed.sh 
   8.907u 0.086s 0:09.16 98.0%     0+0k 0+1io 0pf+0w
   valkyrie% time ./script-awk.sh
   15.968u 0.101s 0:16.07 99.9%    0+0k 0+2io 0pf+0w
   valkyrie% time ./script-sh.sh
   91.311u 96.339s 3:07.93 99.8%   0+0k 0+2io 0pf+0w

pretty bad!

Some of this is doubtless because sh is required for stupid standards
reasons to reread the input file on every iteration. I wonder if
there's anything else going on... unfortunately I don't have any more
time to look into it this weekend, so I figured I'd post what I've
got.

   --- script-sed.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    sed < $FILE 's/#.*//;/^$/d'
done > /dev/null
   --- script-awk.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    awk < $FILE '{
	sub("#.*", "", $0);
	if (NF == 0) {
	    next;
	}
	print;
    }'
done > /dev/null
   --- script-sh.sh ---
#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    while read v; do
	v=${v%%#*}
	if [ -z "$v" ]; then
	    continue
	fi
	echo $v
    done < $FILE
done > /dev/null
   ------

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index