Subject: Build race conditions, any solution?
To: None <current-users@netbsd.org>
From: Robert Elz <kre@munnari.OZ.AU>
List: current-users
Date: 03/12/2002 20:42:42
The subject doesn't refer to running make with -j N (N > 1) or
any short term race conditions like that, but the long term
slow race condition that occurs with a sequence like

	I check out sources (anoncvs, sup, tarballs, shouldn't matter).

	Time passes

	Someone changes the sources (updates them) - esp when an
		include file (which will go in /usr/include/* is updated)
	
	Time passes

	I "make build" (method by which that's done isn't important).
	Success, no problems.

	Time passes

	I update my sources because of some new feature that's now included

	I "make build" (again, method of doing this doesn't matter, the
	problems aren't related to the new toolchain, or build mechanisms)

	Crash & burn

The problem is that when I do my "update my sources" the mod time of
the include file that was updated earlier, gets set to the time the
include file was modified.  Note that's before I did my "make build".
When I did that, the mod time of the installed include files got set to
"now" (or that's what happened to me anyway).

When I come to "make build" again, the updated include files don't get
installed, as they look to be older than the ones installed already.
Then the rest of the system doesn't compile because it is expecting the
include files to be in the state they exist in in the master sources,
"make includes" having completed successfully.

For the special case of include files, simply always installing them with
the equivalent of "cp -p" should be enough, that way the mod time will always
reflect the time of the actual last change to the file, rather than the
time the file happened to be installed.

But this is only possible for files that don't have to be modified from the
sources to be installed (some shell scripts, and most include files).
For anything else, objects, constructed include files (rpcgen'd ones, etc)
binaries, ... there is no good "earlier" time to pick - but perhaps
as most of those get regenerated in obj dirs first, the newly constructed
one will always be newer than the old installed one anyway, and the
problem wouldn't occur.

It has also just occurred to me that use of the '-u' flag to build.sh
(UPDATE=yes I think?) may have had some bearing on this.  I'd test that
by doing a build without it, but since finding the problem I have simply
removed *everything* and will start again with clean destination directories.

In any case, this is certainly something to be aware of.

kre

ps: in case no-one is paying particular attention to mail sent to gnats-bugs
(and cc'd no-where else) - someone please go close PR lib/15846 - it was
a result of this situation.