tech-userlevel: Re: make -j and failure modes

Subject: Re: make -j and failure modes
To: Ben Harris <bjh21@netbsd.org>
From: James Chacon <jmc@netbsd.org>
List: tech-userlevel
Date: 12/09/2003 19:45:44
On Wed, Dec 10, 2003 at 12:40:45AM +0000, Ben Harris wrote:
> In article <m2n.s.1ATpUq-000hYr@chiark.greenend.org.uk> you write:
> >        (cd ${.CURDIR} && make bar)
> >
> >According to SUSE3:
> >
> >-e When this option is on, if a simple command fails for any of the reasons 
> >   listed in Consequences of Shell Errors or returns an exit status value >0,
> >   and is not part of the compound list following a while, until, or if 
> >   keyword, and is not a part of an AND or OR list, and is not a pipeline 
> >   preceded by the ! reserved word, then the shell shall immediately exit. 
> 
> Note "simple command".  A group like "(false)" is not a simple command.  The
> command "false" is a simple command, but it's executed in a subshell
> environment.

Compound commands are groups of simple commands. Nothing in the above
exclusions says the return code of these shouldn't be interpreted as far as 
-e usage goes. As a matter of fact it goes to great lengths to specify which 
ones don't fall under this (as pipeline, list-compound-list are also not 
"simple command"'s). Otherwise based on the "X isn't a simple command" logic 
there would have been no reason at all to list & exclude the others as those 
aren't simple commandseither.

> 
> >I'm purely guessing here but bash (/bin/sh on my linux box) and our shell
> >seem to be taking the group command definition to mean -e has no effect.
> 
> Not quite correct.  As an example (bash and NetBSD sh agree here too):
> 
> wraith:~$ bash -posix -e -c '(false; echo foo);(echo bar)'
> bar
> 
> The "-e" option meant that the first "false" caused the subshell it was
> executing in to exit, but the outer shell carried on, because the command
> that it saw fail was a compound command.

However, as I pointed out, it was bash and our sh which seemed to be acting
against convention. 

In my opinion the first subshell returns non-zero which should cause the shell
to then exit and not continue. This is historical behavior (to my experience)
and testing on 2 other distributions of unix shows this behavior
(Solaris xpg4 sh and FreeBSD both return $? = 1 to your test and print 
nothing. As does SunOS 4.1.x for that matter). 

I'm not saying that remaining historically accurate is always right, or even 
that "just because other systems implement this way" is the way to go. 
However, in this case it's also what someone passing -e to their shell would 
expect to happen when any command in the chain fails.

> 
> >1. Have make scan the command for parens and if it finds them, exec via
> >   the compat methods.
> >2. Fix sh to deal with group'd commands and -e. Then provide nbsh as a host
> >   tool and tell make to use it.
> >3. Go through all the Makefile's and change (... && ...) into .... && ...
> 
> That will mean that the directory change is still in effect for the next
> command, which is precisely what the parens are there to avoid.  I'd
> suggest:

The only reason the parens are needed is due to the new make behavior of
feeding the entire command into 1 shell. I understand the performance 
gains here but it does cause issues like this to come up. Mainly in this
case clearer documentation in the man page is probably in order.

> 
> 4. Change all (foo && bar) to (foo && bar) || exit $?

Ugly, but doable. I'll start looking how much this will affect.

> 
> >(I'm leaning towards #2 but I need opinions/knowledge on whether sh is
> >doing the right thing or not.)
> 
> I think both behaviours are arguably correct, and since it's not terribly
> hard (if slightly ugly) to write scripts that cope with both, we should just
> do that.

The concern is which one to use as a system. From what I can tell the current
behavior wasn't necessarily planned, but a side effect of fixing && failures
within for loops (which I think was the right thing there).

James