NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: toolchain/57241: mips64el--netbsd-install core dumps randomly
The following reply was made to PR toolchain/57241; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Roland Illig <rillig%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: toolchain/57241: mips64el--netbsd-install core dumps randomly
Date: Fri, 18 Apr 2025 16:26:30 +0000
Hi rillig, I wonder whether you might be able to help solve a
make(1)-related mystery?
I'm drafting a change to fix the parallel-safety of the foo.debug
recipe in bsd.prog.mk (a little finicky because it has nontrivial
interaction with other makefiles like libexec/ld.elf_so/Makefile).
But before I commit it, I want to make sure I understand the
underlying cause of PR 57241.
The immediate symptom is that, e.g., `mips64el--netbsd-install ...
ipftest ${DESTDIR}/usr/sbin/ipftest' is crashing because its input
file has been truncated between fstat/mmap and access to file content.
And it looks like there's a concurrent objcopy from the .debug recipe
which has truncated ipftest to rewrite it in place.
But I can't figure out why the concurrent objcopy is happening only in
the mips64 builds of certain programs like ipftest(8) and crash(8),
which seem to have in common the use of compat/exec.mk. (These are
programs that run with the n64 ABI, in order to read out kernel guts
on mips64 CPUs, in a userland where _most_ programs run with the n32
ABI instead because it's more compact and they usually have <4GB RAM.)
And so I think I need a make(1) wizard to help.
Here's an example:
https://releng.netbsd.org/builds/HEAD/202504161330Z/evbmips-mips64el.build.=
failed
https://web.archive.org/web/20250418154748/https://releng.netbsd.org/builds=
/HEAD/202504161330Z/evbmips-mips64el.build.failed
[1] Bus error (core dumped) /home/builds/ab/HEAD/evbmips-mips64el/2025041=
6...
--- /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipfte=
st ---
...
*** Failed target: /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest=
/usr/sbin/ipftest
*** In directory: /home/source/ab/HEAD/src/external/bsd/ipf/bin/ipftest
*** Failed commands:
${_MKTARGET_INSTALL}
=3D> @# "install " /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-des=
t/usr/sbin/ipftest
${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP} -m ${BINMODE} ${STRIPFLAG} ${.A=
LLSRC} ${.TARGET}
=3D> /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64e=
l--netbsd-install -U -M /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z=
-dest/METALOG -D /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest -=
h sha256 -N /home/source/ab/HEAD/src/etc -c -r -o root -g wheel -m 555 i=
pftest /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ip=
ftest
*** [/home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipft=
est] Error code 138
...
/home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64el--net=
bsd-objcopy: libcrypto.so.15.0.debug: section `.note.netbsd.pax' can't be a=
llocated in segment 0
LOAD: .MIPS.abiflags .reginfo .dynamic .hash .dynsym .dynstr .gnu.version .=
gnu.version_d .gnu.version_r .rel.dyn .init .text .MIPS.stubs .fini .rodata=
.eh_frame_hdr .eh_frame .note.netbsd.ident .note.netbsd.pax
The last part -- a warning message about which I just filed another
bug, PR port-mips/59320: objcopy: section `.note.netbsd.pax' can't be
allocated in segment 0 -- is evidence that make(1) is still running
the buggy ipftest.debug recipe which rewrites ipftest in place:
507 ${_PROGDEBUG.${_P}}: ${_P}
508 ${_MKTARGET_CREATE}
509 ( ${OBJCOPY} --only-keep-debug --compress-debug-sections \
510 ${_P} ${_PROGDEBUG.${_P}} && \
511 ${OBJCOPY} --strip-debug -p -R .gnu_debuglink \
512 --add-gnu-debuglink=3D${_PROGDEBUG.${_P}} ${_P} \
513 ) || (rm -f ${_PROGDEBUG.${_P}}; false)
https://nxr.netbsd.org/xref/src/share/mk/bsd.prog.mk?r=3D1.355#509
My best guess was that:
1. When doing dependall, the ipftest.debug recipe above:
(a) creates ipftest.debug with objcopy at time t0,
(b) a moment later, modifies ipftest in place with objcopy, at time
t1 =3D t0 + eps > t1.
2. When doing install, make(1) finds that ${DESTDIR}/usr/sbin/ipftest
and ${DESTDIR}/usr/libdata/debug/usr/sbin/ipftest.debug are both
out of date, so it tries to run, _in parallel_:
(a) mips64el--netbsd-install ... ipftest ${DESTDIR}/usr/sbin/ipftest,
because ipftest exists and is up-to-date
(b) the .debug recipe above again, because ipftest exists and is
up-to-date with timestamp t1, but ipftest.debug exists and is
out-of-date with timestamp t0 < t1
Except this hypothesis doesn't make sense, for two reasons:
- The problem empirically _only_ happens in mips64 builds with a few
programs, and nothing in the hypothesis above is restricted to that.
- We pass `-p' (--preserve-dates) to objcopy(1) in step (1), so it
restores the mtime of the input file after truncating and
overwriting it -- and so by the time of make install, it should look
like ipftest.debug is up-to-date.
So I can't figure out why, under these circumstances, make install is
trying to rerun the .debug recipe. And I can't reproduce it on my
laptop.
I tried reading out `make -d g1' and `make -d m' output but it's kind
of inscrutable to me (I thought `-d g1' would show a graph, with nodes
and edges for dependency relations, but I can't figure out how to read
the edges in it).
Home |
Main Index |
Thread Index |
Old Index