Re: Next steps for /bin/sh

To: tech-userlevel%netbsd.org@localhost
Subject: Re: Next steps for /bin/sh
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Mon, 29 Feb 2016 12:45:03 +0700
    Date:        Sun, 28 Feb 2016 17:11:17 -0500
    From:        "James K. Lowden" <jklowden%schemamania.org@localhost>
    Message-ID:  <20160228171117.a160d8ea72909ecdae77857f%schemamania.org@localhost>

  | I must be missing something.  The second false should never execute,
  | whether or not -e is in force.

Absolutely correct.   If it did execute, as in "true && false" then
if -e was set, the shell would exit (and our shell does.)

  | IMO, in the presence of -e, "false && false" should terminate
  | execution, just as "false && true" should.  The entire compound
  | statement is false because the first one is.  

No, it shouldn't.    The original reason for "set -e" was to allow for
lazy shell script programmers.   That is, so you could just write
scripts like

	cd /some/dir
	make my_command
	./my_command some args

without having to care whether /some/dir actually existed, whether the
make completed successfully, or whether the compiled program worked.

If it had been that simple, it would be easy, but that doesn't do what
anyone wants if the script is something like

	if [ -z "$1" ]; then
		# whatever
	fi
	# more stuff here

as if the test fails ($1 is not empty) then the test ('[') command
exits with non-zero status, then the 'more stuff' would never be done.

That is clearly not what the script author intended, if it had been
they would have just written

	if [ -n "$1" ]; then exit 1; fi

except that with -e (if it worked this way) that would be equivalent
to
	exit 1

as if the test succeeds the explicit exit is done, and if it fails, then
the one caused by -e would happen.

Since no-one wants that, the rule is that if the script tests the output
of a command, -e does not apply to it.  Since in "false && false" the
result of the first false is tested, -e does not apply to it, and as the
second one is never executed, it does not produce a non-zero exit code,
then neither of the simple commands can make -e apply.

Now with "if" "while" and "until" 'commands' (they aren't really commands
but let that slide) that's all we need to worry about, as the exit status
of the "if" (etc) is that of the last command executed in the body
(that is, the code controlled by the test, not the test itself) or 0
if no commands in the body are executed - the exit status of the command(s)
tested is irrelevant.

That is in the code
	while false; do : 'anything at all' ; done; echo $?
the echo is guaranteed to always print "0" and whether -e is set
or not, the shell will never exit when that is executed (unless there
is a syntax error in "anything at all".)

But with '&&' and '||' things get messy - the exit status of those is
that of the last of the commands they actually execute (it needs to be
so the result can be tested by if/while/until or more && or || operators).

I believe in really old shells, -e would apply, and the shell would exit
because the result of the '&&' in a bare command like "false && false" is
not tested, and so the shell would exit.

But people like to write commands like

	[ "$1" = x ] && echo found an x

just as a kind of shorthand for

	if [ "$1" = x]; then echo found an x; fi

which (without -e) would do exactly the same thing.   But with -e the
way it was initially, the first version would cause the shell to exit
if $1 was not x .. which is clearly not the intent (and which never
happened in the second, supposedly equivalent, version).

That, naturally led to complaints, which caused changes, which ended up
leading to a lot of the mess in the specification as people attempted to
explain, in as few words as possible, just what the rule actually was.

Unfortunately, it is not easy to really describe it, or at least not
much better than "DWIM".

That is, if you want -e to cause your script to exit, you complain when it
does not happen, if you don't want it to exit, you complain when it does,
so it all depends upon what was in the programmer's mind.   Even NetBSD's
sh has not reached that level of sophistication yet, so the current rule
is that if any compound command
	(which really means && and || commands only, as for the others
	 there's nothing for it to apply to, '!' is not a compound command
	 and has its own specific rule, but this wording leaves the door
	 open for more similar "commands" to invented in the future)
returns a non-zero status, because a command in it returned non-zero, and
the -e was ignored for that command, because its result is tested, then -e is
ignored for the compound command as well (except for sub-shells - purely
because when the command was a sub-shell, we have no way to know what
caused the non-zero exit status.)

With "false && false" the first false exit's non-zero, but -e ignores that
because it is tested by the &&.   The && status is also non-zero, but -e
ignores that because that non-zero came from a command for which the -e
was ignored.  So, we do not exit.

If you really want -e to apply, just do
	( false && false )
the -e won't cause anything to happen inside the sub-shell, it will
execute the first false, ignore -e for that, execute the && (which having
a false left-command never executes the right-command) and so status 1
from that, -e ignores that as well, then having reached the end of its
command string, exit with the status of the last command executed,
which is the 1, from &&.

But in the parent shell, that 1 exit status from the sub-shell is tested
by -e and causes the parent shell to exit, because the parent shell has
no idea at all which command inside the sub-shell caused that "1" exit status.

While I am here, one example (which is in the std incidentally, though
here I give more body to make what is happening more plausible) to show
just how useless it is to attempt to rely upon -e:

Suppose I have an environment where standard practice (for some rarely
used commands anyway) is "lazy compilation" - that is, we only want to
compile the command when someone actually wants to run it (even in
NetBSD there are a whole bunch of commands we could apply that to, and
which would probably never be compiled again!)

To make life easy for myself, and being a "lazy programmer" as above,
I decide to write a function to assist me in dealing with this, so I
don't have to deal with all the steps...

build_and_run()
{
	# I am lazy, so don't want to test these commands, so I will use
	set -e

	# imagine we have a command that prints the directory which
	# holds the sources for a command we name (for this purpose, we do!)

	cd $( find-source-directory "$1" )

	make install

	set +e		# clean up again, if I was good, I would do that
			# only if -e was off when the function started, but
			# for present purposes, this will do.

	# we do not want -e to apply to this command by default
	# just run it and return its exit status
	"$@"
}

Then I can do

	build_and_run some-command and args that apply

and it will do the "right thing" - if "some-command" does not exist, it
will not have a source directory, and our mythical find-source-directory
command will exit 1, and the set -x will cause the shell to exit (well,
that's what I expect, it actually won't, but we will "fix" that by having it
print a non-existing directory in that case) - so now the cd command fails,
and the -e will apply to that, and the shell will exit (that is, the
shell which ran the "build_and_run" unless it is interactive, in which
case it will just abort whatever it was doing, in this case running that
function, print an error, and prompt for the next command .. note that
the "set +e" doesn't get executed in that case, but what do we care, we're
an interactive shell...)

Similarly if the "make" fails because some developer has just broken the
build of "some-command".

And then we turn -e back off again (the code to do that only if it had
been off at entry is what would really be there, that is easy enough,
but would just complicate this example.)

And finally we run some-command, with its args, and its exit status
is the exit status of the function.

So now I can get into the habit of just doing

	build_and_run everything I ever want to execute

and I don't have to care if the "everything" was re-compiled or
not, if it can be compiled and work, it will be.  If not, it will
fail (if in a script, that script will exit, if interactive, I'll
get told about it).  All good, right?

Of course, being lazy, my "build_and_run" would actually be called "b"
but never mind...

Unfortunately, no  - one day I am going to want to run a command and
check that the command I'm running exited with zero exit status, and
if not, do something else (like send me mail).

So, I will write something like (probably a little fancier than)

	build_and_run command-that-should-not-fail ||
		mail -s help kre </dev/null

The problem is, that now, build_and_run is being run in a context where
its result is tested, so -e does not apply - and once we get to a state
where -e is ignored, it is ignored, attempting to turn it on again with
"set -e" doesn't help, ignored is ignored, and every command run while
we are still in the "ignoring -e" environment ignores -e)

So now if my "command-that-should-not-fail" is actually a typo, and
the cd fails, the function does not exit, instead it does "make install"
in whatever directory I happen to be before I did the build_and_run,
and then whether or not that succeeds, I go ahead and run the command
which has not been compiled, probably does not exist, and so will probably fail.

I might get my e-mail from that, but I certainly never intended to run
that make in the wrong directory!

Using -e is full of issues like that, consider the similar related
case

	sh -ec "if false; true; then echo fail; fi"

in that, the exit status of the "false" is never tested, so you would
expect -e to apply, the shell to exit, and the true to never be run,
and obviously not the echo either.

But no, because we're in the condition commands for the "if", which
means we are testing the result, even though the result of "false" can
never influence that result in this case, -e is being ignored, so the
false just returns its 1 exit status, which no-one cares about, we go
ahead, execute true, the exit status of the test is now 0, so we echo "fail".

So, my advice is, stop caring about how -e works, or whether it works
correctly, its only remaining useful purpose seems to be make work for
the standards writers, and exasperation for authors of shells who have
to try to make it do what the standards writers produce.   Just don't
use it, or expect it to do what you anticipate if you do use it, because
it probably won't.

kre
Follow-Ups:
- Re: Next steps for /bin/sh
  - From: Taylor R Campbell
References:
- Re: Next steps for /bin/sh
  - From: James K. Lowden
- Re: Next steps for /bin/sh
  - From: Paul Goyette
- Next steps for /bin/sh
  - From: Robert Elz
- Re: Next steps for /bin/sh
  - From: Robert Elz
Prev by Date: Re: Next steps for /bin/sh
Next by Date: Re: Next steps for /bin/sh
Previous by Thread: Re: Next steps for /bin/sh
Next by Thread: Re: Next steps for /bin/sh
Indexes:
Home | Main Index | Thread Index | Old Index