NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/60324: unwind.h build race condition



>Number:         60324
>Category:       lib
>Synopsis:       unwind.h build race condition
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 12 08:25:01 +0000 2026
>Originator:     Robert Elz
>Release:        NetBSD 11.99.6
>Organization:
	Spooky Bizarre Co-incidences Inc.
>Environment:
	Irrelevant, the actual problem came from the NetBSD build cluster.
>Machine:	any
>Description:

If this PR would be better in toolchain (or something else) rather than
lib, someone please move it.

I believe this issue has been observed before, but in the 2026-06-11 19:51:16
build of the "daily" snapshot builds of HEAD, see:

	https://releng.netbsd.org/cgi-bin/builds.cgi

but you'll need to look at that soon, if needed, as within a few days the
relevant entries will have vanished.

The i386 build failed (click on the build date, look at the faied builds,
and select the i386 failure) with:

ln: unwind.h: File exists
--- unwind.h ---

*** Failed target: unwind.h
*** In directory: /home/source/ab/HEAD/src/external/gpl3/gcc/lib/libgcc/libgcov
*** Failed commands:
	${_MKMSG} "symlink " ${.CURDIR:T}/${.TARGET}
	=> @# "symlink " libgcov/unwind.h
	rm -f ${.TARGET}
	=> rm -f unwind.h
	ln -s ${.ALLSRC} ${.TARGET}
	=> ln -s /home/source/ab/HEAD/src/external/gpl3/gcc/dist/libgcc/unwind-generic.h unwind.h
*** [unwind.h] Error code 1

[The sgimips build failed as well, as it did in the previous and subsequent builds,
but that is (was) just a sets list issue, and should be fixed 2 builds later, unless
I botched the change.]

The following build 2026-06-12 00:35:01 build had no similar problem, and the i386
port (and all others except sgimips) built just fine.  The previous build of 2026-06-11
14:11:52 also had no problems building i386 (just sgimips, all for the same reason).

Needless to say, there have been no commits around this time which could even with
the wildest imagination possibly have caused this error to appear & vanish again.

So (as I believe to be generally understood) this is a build system race condition
that no-one has fixed so far.

I believe I might understand what the problem is, and have a potential solution to
offer, but I am not going to (even attempt to) implement it, as it reqires Makefile
changes which I don't feel confident making (the subtleties of make's programming
language are beyond my limited abilities).   Or rather, there is one very crude
"big stick" change which I could make which will probably fix the issue, but that
one I don't believe is the best solution, and another ugly one which might also work.

I believe the issue is "make includes" in .../gcc/lib/libgcc/libgcov racing against
itself, believe it or not.

In src/Makefile we have:

	_SUBDIR=	tools .WAIT lib
	.if ${MKLLVM} != "no"
	_SUBDIR+=	external/bsd/compiler_rt
	.endif
	_SUBDIR+=	include external crypto/external bin
	_SUBDIR+=	games libexec sbin usr.bin
	_SUBDIR+=	usr.sbin share sys etc tests compat
	_SUBDIR+=	.WAIT rescue .WAIT distrib regress

and in that observe that after tools, up to before rescue, all that can run in parallel.

(That's the crude big stick fix that I could make, insert an extra .WAIT in there
at the right place, and I believe, issue solved).

The failing code above is from the descent into external (and below that).  The racing
version (the one that won the race in this case, and caused the above to fail) is,
I believe, in lib.   src/lib/Makefile contains:

	SUBDIR=		csu .WAIT

	.if (${MKGCC} != "no")
	SUBDIR+=	../external/gpl3/${EXTERNAL_GCC_SUBDIR}/lib/libgcc .WAIT
	.endif

	SUBDIR+=	libc
	SUBDIR+=	.WAIT

	#
	# The SUBDIRs above are included here for completeness but should be built
	# and installed prior to make(dependall) in this file, as libraries listed
	# below will depend on versions from DESTDIR only.
	#

	SUBDIR+=	i18n_module

(followed by lots more SUBDIR additions, but none of them are relevant).

Note that the .WAIT's in that file only affect its internal operation, and have
no effect at all upon the parallel descent into external.

The issue is:

	SUBDIR+=	../external/gpl3/${EXTERNAL_GCC_SUBDIR}/lib/libgcc .WAIT

which is running the same make operations as are being done, potentially at tge
exact same instant as, the descent into external from src/Makefile

If the two makes (when making includes) happen to be running these 2 lines
at the exact same time:

	rm -f ${.TARGET}
	ln -s ${.ALLSRC} ${.TARGET}

so both do the rm in parallel (harmless, one actually removes the file, if
it happens to exist, the other just sees it not existing), then both, at about
the same time attempt the "ln".  One of the two will succeed, the other will
fail, and when it does, the whole build shuts down.

Now another crude (and very poor) fix would be to just make it:

	-ln -s ${.ALLSRC} ${.TARGET}

or perhaps slightly better:

	ln -s ${.ALLSRC} ${.TARGET} || test -e ${.TARGET}

so it only fails if the reason for the ln failing is because ${.TARGET} exists
(which given the preceding rm command, can only be because some other part
of the build is creating it in parallel).

But while kind of appealing (I like doing stuff in sh code), I don't think that
is the ideal solution either (though maybe the second of those might work if my
actual proposed solution has a flaw I am unaware of).

What I'd prefer to do, if someone can get the Makefile syntax and actual names correct
is to change the .if which guards the build of libgcc in src/lib from:

	.if (${MKGCC} != "no")

to something which achieves what I mean by:

	.if (${MKGCC} != "no") && (${.TARGET} != "includes")

so when making the include files, this branch of the parallel make
doesn't reach out into libgcc at all, but simply allows the build in
external/... to do that.   Of course this will fail if someone were
to attempt to build lib alobe (cd src/lib; make)

And naturally, this depends upon there being suitable make syntax to achieve what
I'm suggesting here, and that the descent into external/... when making includes
will always get to libgcov (if MKGCC != no).

Note that the order in which the various include files are added by "make includes"
should be irrelevant, installing them cannot rationally depend upon which other
include files have been installed yet, just as long as all of them get installed,
sometime - and also "make includes" has to be complete before the make system can
start actually making dependencies, or compiling anything.

So, can someone with knowledge of the lib builds, gcc builds, and the make system
take a look at this, and perhaps implement something to make this problem go away,
once and for all ?

In this, I am assuming that the actual builds that happen inside external/.../lib/libgcc
(other than making the include files) are safe from races.  If that's not the case,
and if that run off into libgcc is really needed for the src/lib build, then perhaps
the big stick ".WAIT" added to src/Makefile, somewhere between lib and external
is the correct solution.

>How-To-Repeat:
	Watch the builds and wait, every so often it happens.  Pure (bad) luck.
>Fix:
	See above for potential ways forward.




Home | Main Index | Thread Index | Old Index