Re: performance of shell read loops

To: David Holland <dholland-tech%netbsd.org@localhost>
Subject: Re: performance of shell read loops
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Mon, 07 Mar 2016 15:52:00 +0700
    Date:        Mon, 7 Mar 2016 05:21:35 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <20160307052135.GA7645%netbsd.org@localhost>

  | Today it came up (in the context of something Christos did) that shell
  | read loops are horribly, horribly slow.

I had been intending to look at shell performance a bit after I finished
fixing bugs (and perhaps add one or two small extensions - more of that
in a later message sometime... [but before it happens of course])

  | I'd always assumed that this
  | was because shell read loops tend to fork at least once for every
  | iteration (so O(n) times) and a more pipeline-oriented approach tends
  | to fork O(1) times. While this is probably true it doesn't appear to
  | be the whole story.

Actually, in the scripts you posted, all three scripts scripts need to fork
once per iteration, and definitely re-read the file once per iteration.

That shouldn't be the difference.  What might be is that the sed and awk
versions will get by with just a vfork()/exec() pair, but the sh version
will need a true fork() - however one would assume that exec of something
like sed or awk (and the startup dynamic linking each time) would
account for more than the cost of a fork of sh.

  | In the following example the shell read loop doesn't fork,

Actually, it does, while anytest; do anything; done < file
always forks - which is why that kind of loop cannot pass
answers back to the parent (outside the loop).   I made a slight
change to the way the script is written, which should avoid that,
but it make no appreciable difference to the results (would need
to run the test more times than I did to convince my self whether
it was just noise or not - but the system time consumer, which is
the truly huge difference between the various tests, descreased
by just over a second - out of about 100 - so a 1-2 % decrease).
[results and modified script are below.]

  | still monumentally slow compared to either sed or awk. (awk is
  | probably a fairer comparison as it executes the exact same logic; sed
  | is specialized for this sort of thing...)

They're both using regular expressions for the matching, sh is not, and
both pre-compile the code (including the regular expressions) and then
just rexecute them - using what I believe is fairly heavily optimized
code.   sh does none of that, it uses glob patterns, and does everything
every time around the loop (other than parsing the script itself - but the
form it uses as its intermediate isn't really picked for speed, rather
because that's the way it first occurred when it was written, 20 years or
so ago, and no-one has changed it since.)   That might account for some of
the user time difference (won't know till I eventually get time to profile it.)

More of the difference is perhaps related to the way that output is done
by "echo" which is certainly not optimised for performance - that will be
done at a rate of at least one write() sys call per echo, and perhaps more.
This is most likely where much of the excess time is being used - particularly
the system time (sed and awk will buffer their output and only write once
for each 8K).

I am leaving your results below so others don't have to jump back and
forth between messages to compare mine with yours.

Can I ask which sh you are using for the sh test?

If it is the sh in current, then it probably is a bit slower because of
the way I did the bug fixes that were incorporated a week or two ago.

I wrote those in a way that would be clean and easy to understand and
modify - but not, as the rest of the shell is/was built, at all fast.
The results for user time in my tests (which test my current sh with mods
that aren't yet in CVS, and with the 7.0 (or really 7.0_BETA but I doubt
that makes a difference) shell, which give comparable results - quite a bit
less than your sh user time result (whereas most of the other numbers are
roughly comparable - my system is probably just a little slower than yours,
but not by a lot.)

That makes me suspect that you tested the current shell at a particularly
bad time in its evolution (which would be anything from when Christos
started incorporating my changes until sometime in the future when I am
satisfied my speedups breaks nothing, and they get incorporated.)
Other speedups that might be possible after profiling would be even later.

(The one I have currently was done because I knew the way I handled the
bug fixes would slow things down, and what the obvious way was to reverse
that effect.)

Now, if you want to see something truly bad, repeat your shell test using
/bin/ksh instead of /bin/sh - in addition to having lots more bugs, it is
also a lot slower (at this anyway) - I suspect it is doing 1 byte at a time
read syscalls as well as small writes, nothing else I can think of would
explain the results.

  |    valkyrie% time ./script-sed.sh 
  |    8.907u 0.086s 0:09.16 98.0%     0+0k 0+1io 0pf+0w
  |    valkyrie% time ./script-awk.sh
  |    15.968u 0.101s 0:16.07 99.9%    0+0k 0+2io 0pf+0w
  |    valkyrie% time ./script-sh.sh
  |    91.311u 96.339s 3:07.93 99.8%   0+0k 0+2io 0pf+0w

  | Some of this is doubtless because sh is required for stupid standards
  | reasons to reread the input file on every iteration.

Umm, they all have to do that, or how would they produce and data?  Were
you expecting sed or awk to remember that they processed the same file last
time and what the answers were?

The modification I made to the script was (ie: this is script-nofork-sh
I did not see a need for trailing ".sh" suffixes on the file names...)
This will still fork (once) for the ">/dev/null" - not worth changing.

#!/bin/sh
FILE=/usr/share/dict/words
for i in $(jot 100 1); do
    exec < "$FILE"
    while read v; do
	v=${v%%#*}
	if [ -z "$v" ]; then
	    continue
	fi
	echo $v
    done
done > /dev/null

And the results I obtained are as below.   I only tested the non-forking
script with my current (optimized somewhat) version of current's sh as
it made so little difference.

The ./sh in the tests below is my (improved) NetBSD current sh,
/bin/sh the 7.0_BETA version of NetBSD /bin/sh,
fbsh is my slightly hacked version of FreeBSD's /bin/sh (more or less their
current version from a week or two ago.)
The awk & sed used for those tests are the versions from 7.0_BETA
I have included the results from ./sh run with the -F flag, to disable
vfork() just to see what difference (if any) that makes, and for completeness
the sed and awk tests run that way too, so the shell does a full fork before
the exec of them each time.

The others (various different shells) should be self explanatory, and are
all just for comparison, they are from some vintage in the past via pkgsrc
(not current versions - probably all versions that are 2-3 years old.)

You can see that it isn't something peculiar to NetBSD's sh that
causes the results, and that our shell is nowhere near the worst.

I have no idea what bash is doing to spend the user time it does, and I
am not sure I care (though its improved sys time suggests that it has a
better algorithm for either reading or writing, which are the only intensive
sys calls that the script performs - they can't have optimised the file
opens very much I don't think...)

Last, it is also worth mentioning, I think, that none of these results
mean anything until the outputs are not sent to /dev/null, but are
collected somewhere and analysed - further someone wanting to make their
sed/awk look really good in benchmarks could look for cases where all the
output was being sent to /dev/null (compare fstat(1) with stat("/dev/null"))
and just exit after parsing the script for syntax errors, and to see no
other output was being sent elsewhere, in that case. Really fast execution!
So a sys call analysis would also be needed to verify that wasn't happening
(but since they is NetBSD's and I am fairly sure no nonsense like that has
been done, I won't bother.)

Oh, don't take too much notice of elapsed time - my system wasn't
completely idle while it was obtaining the results below, and less-so
for some than others.

I will be looking into all of this more (in particular, now I have seen
the results from this, I want to find out why sys time got worse with
my current shell than /bin/sh - none of my changes should have affected
that at all - but perhaps something else changed between 7.0 and current,
I will measure better sometime later - I believe in making it correct
first, and only then looking at making it faster.)

For those looking for a winner (amongst shells) FreeBSD's appears to have
that honour at the minute (it was compiled without history/line-editing
functions though, which, even though none of that stuff would apply in this 
test, just avoiding the check to determine that would give that shell a tiny 
advantage.)

kre

dholland's original results ...

  |    valkyrie% time ./script-sed.sh 
  |    8.907u 0.086s 0:09.16 98.0%     0+0k 0+1io 0pf+0w
  |    valkyrie% time ./script-awk.sh
  |    15.968u 0.101s 0:16.07 99.9%    0+0k 0+2io 0pf+0w
  |    valkyrie% time ./script-sh.sh
  |    91.311u 96.339s 3:07.93 99.8%   0+0k 0+2io 0pf+0w

and mine...  (not using csh, so the "time" format output is different,
but that of "bash" which is my normal command line interactive shell, I
should have used "sh" to run the tests, its time output is cleanest of all.)

andromeda$ time ./sh /tmp/script-sed

real    0m10.552s
user    0m9.173s
sys     0m1.144s

andromeda$ time ./sh /tmp/script-awk

real    0m16.816s
user    0m16.200s
sys     0m0.426s

andromeda$ time ./sh /tmp/script-sh 

real    2m55.068s
user    1m9.898s
sys     1m43.172s

andromeda$ time /bin/sh /tmp/script-sh

real    2m36.267s
user    1m7.690s
sys     1m28.416s

andromeda$ time ./sh /tmp/script-nofork-sh

real    2m54.210s
user    1m11.939s
sys     1m41.820s

andromeda$ time ./sh -F /tmp/script-sh

real    2m38.790s
user    1m11.478s
sys     1m25.473s

andromeda$ time ./sh -F /tmp/script-awk

real    0m16.122s
user    0m15.783s
sys     0m0.248s

andromeda$ time ./sh -F /tmp/script-sed

real    0m8.941s
user    0m8.697s
sys     0m0.183s

andromeda$ time /bin/ksh /tmp/script-sh

real    13m9.823s
user    1m30.408s
sys     11m33.149s

andromeda$ time fbsh /tmp/script-sh

real    2m45.785s
user    1m3.795s
sys     1m39.158s

andromeda$ time bash /tmp/script-sh

real    7m58.711s
user    6m48.652s
sys     1m5.841s

andromeda$ time zsh /tmp/script-sh

real    6m52.243s
user    4m10.064s
sys     2m36.129s

andromeda$ time dash /tmp/script-sh

real    3m24.942s
user    1m35.081s
sys     1m46.683s

andromeda$ time mksh /tmp/script-sh

real    5m11.705s
user    3m34.081s
sys     1m34.150s
References:
- performance of shell read loops
  - From: David Holland
Prev by Date: performance of shell read loops
Next by Date: Re: performance of shell read loops
Previous by Thread: performance of shell read loops
Next by Thread: Re: performance of shell read loops
Indexes:
Home | Main Index | Thread Index | Old Index