NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: toolchain/57241: mips64el--netbsd-install core dumps randomly



The following reply was made to PR toolchain/57241; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Roland Illig <rillig%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: toolchain/57241: mips64el--netbsd-install core dumps randomly
Date: Fri, 18 Apr 2025 16:26:30 +0000

 Hi rillig, I wonder whether you might be able to help solve a
 make(1)-related mystery?
 
 I'm drafting a change to fix the parallel-safety of the foo.debug
 recipe in bsd.prog.mk (a little finicky because it has nontrivial
 interaction with other makefiles like libexec/ld.elf_so/Makefile).
 
 But before I commit it, I want to make sure I understand the
 underlying cause of PR 57241.
 
 The immediate symptom is that, e.g., `mips64el--netbsd-install ...
 ipftest ${DESTDIR}/usr/sbin/ipftest' is crashing because its input
 file has been truncated between fstat/mmap and access to file content.
 And it looks like there's a concurrent objcopy from the .debug recipe
 which has truncated ipftest to rewrite it in place.
 
 But I can't figure out why the concurrent objcopy is happening only in
 the mips64 builds of certain programs like ipftest(8) and crash(8),
 which seem to have in common the use of compat/exec.mk.  (These are
 programs that run with the n64 ABI, in order to read out kernel guts
 on mips64 CPUs, in a userland where _most_ programs run with the n32
 ABI instead because it's more compact and they usually have <4GB RAM.)
 
 And so I think I need a make(1) wizard to help.
 
 
 Here's an example:
 
 https://releng.netbsd.org/builds/HEAD/202504161330Z/evbmips-mips64el.build.=
 failed
 https://web.archive.org/web/20250418154748/https://releng.netbsd.org/builds=
 /HEAD/202504161330Z/evbmips-mips64el.build.failed
 
 [1]   Bus error (core dumped) /home/builds/ab/HEAD/evbmips-mips64el/2025041=
 6...
 --- /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipfte=
 st ---
 ...
 *** Failed target: /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest=
 /usr/sbin/ipftest
 *** In directory: /home/source/ab/HEAD/src/external/bsd/ipf/bin/ipftest
 *** Failed commands:
 	${_MKTARGET_INSTALL}
 	=3D> @# "install " /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-des=
 t/usr/sbin/ipftest
 	${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP} -m ${BINMODE}  ${STRIPFLAG} ${.A=
 LLSRC} ${.TARGET}
 	=3D> /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64e=
 l--netbsd-install -U -M /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z=
 -dest/METALOG -D /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest -=
 h sha256 -N /home/source/ab/HEAD/src/etc -c  -r -o root -g wheel -m 555   i=
 pftest /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ip=
 ftest
 *** [/home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipft=
 est] Error code 138
 ...
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64el--net=
 bsd-objcopy: libcrypto.so.15.0.debug: section `.note.netbsd.pax' can't be a=
 llocated in segment 0
 LOAD: .MIPS.abiflags .reginfo .dynamic .hash .dynsym .dynstr .gnu.version .=
 gnu.version_d .gnu.version_r .rel.dyn .init .text .MIPS.stubs .fini .rodata=
  .eh_frame_hdr .eh_frame .note.netbsd.ident .note.netbsd.pax
 
 The last part -- a warning message about which I just filed another
 bug, PR port-mips/59320: objcopy: section `.note.netbsd.pax' can't be
 allocated in segment 0 -- is evidence that make(1) is still running
 the buggy ipftest.debug recipe which rewrites ipftest in place:
 
     507 ${_PROGDEBUG.${_P}}: ${_P}
     508 	${_MKTARGET_CREATE}
     509 	( ${OBJCOPY} --only-keep-debug --compress-debug-sections \
     510 	    ${_P} ${_PROGDEBUG.${_P}} && \
     511 	  ${OBJCOPY} --strip-debug -p -R .gnu_debuglink \
     512 		--add-gnu-debuglink=3D${_PROGDEBUG.${_P}} ${_P} \
     513 	) || (rm -f ${_PROGDEBUG.${_P}}; false)
 
 https://nxr.netbsd.org/xref/src/share/mk/bsd.prog.mk?r=3D1.355#509
 
 
 My best guess was that:
 
 1. When doing dependall, the ipftest.debug recipe above:
    (a) creates ipftest.debug with objcopy at time t0,
    (b) a moment later, modifies ipftest in place with objcopy, at time
        t1 =3D t0 + eps > t1.
 
 2. When doing install, make(1) finds that ${DESTDIR}/usr/sbin/ipftest
    and ${DESTDIR}/usr/libdata/debug/usr/sbin/ipftest.debug are both
    out of date, so it tries to run, _in parallel_:
 
    (a) mips64el--netbsd-install ... ipftest ${DESTDIR}/usr/sbin/ipftest,
        because ipftest exists and is up-to-date
 
    (b) the .debug recipe above again, because ipftest exists and is
        up-to-date with timestamp t1, but ipftest.debug exists and is
        out-of-date with timestamp t0 < t1
 
 Except this hypothesis doesn't make sense, for two reasons:
 
 - The problem empirically _only_ happens in mips64 builds with a few
   programs, and nothing in the hypothesis above is restricted to that.
 
 - We pass `-p' (--preserve-dates) to objcopy(1) in step (1), so it
   restores the mtime of the input file after truncating and
   overwriting it -- and so by the time of make install, it should look
   like ipftest.debug is up-to-date.
 
 So I can't figure out why, under these circumstances, make install is
 trying to rerun the .debug recipe.  And I can't reproduce it on my
 laptop.
 
 I tried reading out `make -d g1' and `make -d m' output but it's kind
 of inscrutable to me (I thought `-d g1' would show a graph, with nodes
 and edges for dependency relations, but I can't figure out how to read
 the edges in it).
 


Home | Main Index | Thread Index | Old Index