tech-userlevel: Re: make -j and failure modes

Subject: Re: make -j and failure modes
To: Robert Elz <kre@munnari.OZ.AU>
From: Greywolf <greywolf@starwolf.com>
List: tech-userlevel
Date: 12/10/2003 13:00:56
[Okay, this got longer than I intended.  I've tried to inquire/address
a couple issues herein.  Apologies if I've missed the point.  I tried
not to...]

Thus spake Robert Elz ("RE> ") sometime Tomorrow...

RE> No.   But that is make's problem.
RE> Not the shells - make is using sh incorrectly.

Someone said that sol9 ksh is "properly compliant"; if this is the case, then
we (and bash) are broken.

sol9$ ksh -ec '(false && echo bad); echo ok'
sol9$

This is what I would expect.  I don't understand why the exit status from
within a () should be treated any differently than the exit status from
a simple command.  Inside the (), it does not produce no output because
it aborted before "echo bad"; it produces no output because the && evaluated
as false, causing the subshell -- the first command in the sequence --
to exit with an error.  Because we provided the -e flag to the shell
running the command, and the first command in the sequence exited non-zero,
the subsequent command did not get executed; hence, no output.

[1] sol9$ ksh -ec '(false || echo bad); echo ok'
[I expect 'bad\nok\n' from this...]
bad
ok
[2] sol9$ ksh -ec '(false || { echo bad; exit 1; } ); echo ok'
[I expect 'bad\n' from this...]
bad
[3] sol9$ ksh -ec '(false || ( echo bad; exit 1)); echo ok'
[I expect 'bad\n' from this...]
bad
sol9$

Of course all I can give is my word that I typed my expectations
before I ran my commands.

[1] proceeded to 'ok' because the conditional inside the () completed
with zero status, thus telling the shell it's ok to continue to the next
command;

[2] did not proceed to 'ok' because the last command within the () exited
non-zero;

[3] did not proceed to 'ok' because the sub-subshell exited non-zero which
caused the subshell to exit non-zero.

Final return values propagate back up the chain.  You may disagree with
it but it is the way that the bourne shell has behaved for as far back
as I can remember (1984, for what that's worth among us).  For it to not
behave that way, by comparison to that legacy, means that somehow we have
broken sh and it needs to be fixed, not that we need to mangle make in
some nonstandard way to deal with errors that the shell should be handing
back.

As far as -jN goes, all it should be doing is processing N sources
in a single target, I would think.  I know there's more to it than that,
but if one says 'make -j5 foo', for example with

foo:	bar.o baz.o qux.o foo.o

bar.o:	someutil
	someutil -f bar.c > bar.mangled.c
	cc -c bar.mangled.c -o bar.o

I would expect all dependencies to be calculated and ordered into a list,
*then* kick off the requisite processes, but only as far out as the
first, deepest unresolved level; i.e. here:

foo:
	bar.o:
		someutil:
			util.o		<-proc[0]
			util_sub.o	<-proc[1]
			util_x.o	<-proc[2]
	baz.o:
	qux.o:
	foo.o:

Since there are 5 procs available, but only 3 simultaneous things to build
for the first thing in line, I would not expect the other two procs to
be doing anything until they got back from that first lowest level of
dependency.  If anything errored out there, the process chain would
die out by atrition.  The leveling would eliminate the need for semaphore
targets, though you could still force them if you really wanted to.

This, of course, loses when all your targets have one source each, so
I'm sure there's more to it than this, but in general, this is my
(possibly misguided) perception of it.

But I digressed a bit.  Enlightenment welcomed.

				--*greywolf;
--
# "Operator Precedence is that which causes statements such as *foo->bar to
# work properly.  It is also that which causes statements such as *foo->bar
# NOT to work properly."
# greywolf@starwolf.com