NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: toolchain/57241: mips64el--netbsd-install core dumps randomly



The following reply was made to PR toolchain/57241; it has been noted by GNATS.

From: Christos Zoulas <christos%zoulas.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: toolchain-manager%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost,
 "martin%netbsd.org@localhost" <martin%NetBSD.org@localhost>
Subject: Re: toolchain/57241: mips64el--netbsd-install core dumps randomly
Date: Fri, 18 Apr 2025 14:34:00 -0400

 --Apple-Mail=_F6441B03-E773-42AB-93E9-466FAD395E7C
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=utf-8
 
 Why not build the prog.debug file as part of the prog target?
 
 christos
 
 --- bsd.prog.mk 28 Nov 2021 15:49:36 -0000      1.340
 +++ bsd.prog.mk 18 Apr 2025 18:33:21 -0000
 @@ -529,6 +529,12 @@
  .if ${MKSTRIPIDENT} !=3D "no"
         ${OBJCOPY} -R .ident ${.TARGET}
  .endif
 +.if defined(_PROGDEBUG.${_P})
 +       (  ${OBJCOPY} --only-keep-debug ${_P} ${_PROGDEBUG.${_P}} \
 +       && ${OBJCOPY} --strip-debug -p -R .gnu_debuglink \
 +               --add-gnu-debuglink=3D${_PROGDEBUG.${_P}} ${_P} \
 +       ) || (rm -f ${_PROGDEBUG.${_P}}; false)
 +.endif
 =20
  CLEANFILES+=3D   ${_P}.d
  .if exists(${_P}.d)
 @@ -554,21 +560,18 @@
  .if ${MKSTRIPIDENT} !=3D "no"
         ${OBJCOPY} -R .ident ${.TARGET}
  .endif
 -.endif # !commands(${_P})
 -.endif # USE_COMBINE
 -
 -${_P}.ro: ${OBJS.${_P}} ${_DPADD.${_P}}
 -       ${_MKTARGET_LINK}
 -       ${CC} ${LDFLAGS:N-pie} -nostdlib -r -Wl,-dc -o ${.TARGET} =
 ${OBJS.${_P}}
 -
  .if defined(_PROGDEBUG.${_P})
 -${_PROGDEBUG.${_P}}: ${_P}
 -       ${_MKTARGET_CREATE}
         (  ${OBJCOPY} --only-keep-debug ${_P} ${_PROGDEBUG.${_P}} \
         && ${OBJCOPY} --strip-debug -p -R .gnu_debuglink \
                 --add-gnu-debuglink=3D${_PROGDEBUG.${_P}} ${_P} \
         ) || (rm -f ${_PROGDEBUG.${_P}}; false)
  .endif
 +.endif # !commands(${_P})
 +.endif # USE_COMBINE
 +
 +${_P}.ro: ${OBJS.${_P}} ${_DPADD.${_P}}
 +       ${_MKTARGET_LINK}
 +       ${CC} ${LDFLAGS:N-pie} -nostdlib -r -Wl,-dc -o ${.TARGET} =
 ${OBJS.${_P}}
 =20
  .endif # defined(OBJS.${_P}) && !empty(OBJS.${_P})                     =
 # }
 =20
 
 
 > On Apr 18, 2025, at 12:30=E2=80=AFPM, Taylor R Campbell via gnats =
 <gnats-admin%netbsd.org@localhost> wrote:
 >=20
 > The following reply was made to PR toolchain/57241; it has been noted =
 by GNATS.
 >=20
 > From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
 > To: Roland Illig <rillig%NetBSD.org@localhost>
 > Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
 > Subject: Re: toolchain/57241: mips64el--netbsd-install core dumps =
 randomly
 > Date: Fri, 18 Apr 2025 16:26:30 +0000
 >=20
 > Hi rillig, I wonder whether you might be able to help solve a
 > make(1)-related mystery?
 >=20
 > I'm drafting a change to fix the parallel-safety of the foo.debug
 > recipe in bsd.prog.mk (a little finicky because it has nontrivial
 > interaction with other makefiles like libexec/ld.elf_so/Makefile).
 >=20
 > But before I commit it, I want to make sure I understand the
 > underlying cause of PR 57241.
 >=20
 > The immediate symptom is that, e.g., `mips64el--netbsd-install ...
 > ipftest ${DESTDIR}/usr/sbin/ipftest' is crashing because its input
 > file has been truncated between fstat/mmap and access to file content.
 > And it looks like there's a concurrent objcopy from the .debug recipe
 > which has truncated ipftest to rewrite it in place.
 >=20
 > But I can't figure out why the concurrent objcopy is happening only in
 > the mips64 builds of certain programs like ipftest(8) and crash(8),
 > which seem to have in common the use of compat/exec.mk.  (These are
 > programs that run with the n64 ABI, in order to read out kernel guts
 > on mips64 CPUs, in a userland where _most_ programs run with the n32
 > ABI instead because it's more compact and they usually have <4GB RAM.)
 >=20
 > And so I think I need a make(1) wizard to help.
 >=20
 >=20
 > Here's an example:
 >=20
 > =
 https://releng.netbsd.org/builds/HEAD/202504161330Z/evbmips-mips64el.build=
 .=3D
 > failed
 > =
 https://web.archive.org/web/20250418154748/https://releng.netbsd.org/build=
 s=3D
 > /HEAD/202504161330Z/evbmips-mips64el.build.failed
 >=20
 > [1]   Bus error (core dumped) =
 /home/builds/ab/HEAD/evbmips-mips64el/2025041=3D
 > 6...
 > --- =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipfte=3D=
 
 > st ---
 > ...
 > *** Failed target: =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest=3D
 > /usr/sbin/ipftest
 > *** In directory: =
 /home/source/ab/HEAD/src/external/bsd/ipf/bin/ipftest
 > *** Failed commands:
 > 	${_MKTARGET_INSTALL}
 > 	=3D3D> @# "install " =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-des=3D
 > t/usr/sbin/ipftest
 > 	${INSTALL_FILE} -o ${BINOWN} -g ${BINGRP} -m ${BINMODE}  =
 ${STRIPFLAG} ${.A=3D
 > LLSRC} ${.TARGET}
 > 	=3D3D> =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64e=3D
 > l--netbsd-install -U -M =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z=3D
 > -dest/METALOG -D =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest -=3D
 > h sha256 -N /home/source/ab/HEAD/src/etc -c  -r -o root -g wheel -m =
 555   i=3D
 > pftest =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ip=3D
 > ftest
 > *** =
 [/home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-dest/usr/sbin/ipft=3D=
 
 > est] Error code 138
 > ...
 > =
 /home/builds/ab/HEAD/evbmips-mips64el/202504161330Z-tools/bin/mips64el--ne=
 t=3D
 > bsd-objcopy: libcrypto.so.15.0.debug: section `.note.netbsd.pax' can't =
 be a=3D
 > llocated in segment 0
 > LOAD: .MIPS.abiflags .reginfo .dynamic .hash .dynsym .dynstr =
 .gnu.version .=3D
 > gnu.version_d .gnu.version_r .rel.dyn .init .text .MIPS.stubs .fini =
 .rodata=3D
 >  .eh_frame_hdr .eh_frame .note.netbsd.ident .note.netbsd.pax
 >=20
 > The last part -- a warning message about which I just filed another
 > bug, PR port-mips/59320: objcopy: section `.note.netbsd.pax' can't be
 > allocated in segment 0 -- is evidence that make(1) is still running
 > the buggy ipftest.debug recipe which rewrites ipftest in place:
 >=20
 >     507 ${_PROGDEBUG.${_P}}: ${_P}
 >     508 	${_MKTARGET_CREATE}
 >     509 	( ${OBJCOPY} --only-keep-debug --compress-debug-sections =
 \
 >     510 	    ${_P} ${_PROGDEBUG.${_P}} && \
 >     511 	  ${OBJCOPY} --strip-debug -p -R .gnu_debuglink \
 >     512 		--add-gnu-debuglink=3D3D${_PROGDEBUG.${_P}} =
 ${_P} \
 >     513 	) || (rm -f ${_PROGDEBUG.${_P}}; false)
 >=20
 > https://nxr.netbsd.org/xref/src/share/mk/bsd.prog.mk?r=3D3D1.355#509
 >=20
 >=20
 > My best guess was that:
 >=20
 > 1. When doing dependall, the ipftest.debug recipe above:
 >    (a) creates ipftest.debug with objcopy at time t0,
 >    (b) a moment later, modifies ipftest in place with objcopy, at time
 >        t1 =3D3D t0 + eps > t1.
 >=20
 > 2. When doing install, make(1) finds that ${DESTDIR}/usr/sbin/ipftest
 >    and ${DESTDIR}/usr/libdata/debug/usr/sbin/ipftest.debug are both
 >    out of date, so it tries to run, _in parallel_:
 >=20
 >    (a) mips64el--netbsd-install ... ipftest =
 ${DESTDIR}/usr/sbin/ipftest,
 >        because ipftest exists and is up-to-date
 >=20
 >    (b) the .debug recipe above again, because ipftest exists and is
 >        up-to-date with timestamp t1, but ipftest.debug exists and is
 >        out-of-date with timestamp t0 < t1
 >=20
 > Except this hypothesis doesn't make sense, for two reasons:
 >=20
 > - The problem empirically _only_ happens in mips64 builds with a few
 >   programs, and nothing in the hypothesis above is restricted to that.
 >=20
 > - We pass `-p' (--preserve-dates) to objcopy(1) in step (1), so it
 >   restores the mtime of the input file after truncating and
 >   overwriting it -- and so by the time of make install, it should look
 >   like ipftest.debug is up-to-date.
 >=20
 > So I can't figure out why, under these circumstances, make install is
 > trying to rerun the .debug recipe.  And I can't reproduce it on my
 > laptop.
 >=20
 > I tried reading out `make -d g1' and `make -d m' output but it's kind
 > of inscrutable to me (I thought `-d g1' would show a graph, with nodes
 > and edges for dependency relations, but I can't figure out how to read
 > the edges in it).
 >=20
 
 
 --Apple-Mail=_F6441B03-E773-42AB-93E9-466FAD395E7C
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP
 
 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org
 
 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCaAKbGAAKCRBxESqxbLM7
 OpmlAJ40VE7fhoWs1JtbkPyiKlSBOwk0MQCg6KsYD4sS/xSzrMg8p+qNNMKkjew=
 =Dly2
 -----END PGP SIGNATURE-----
 
 --Apple-Mail=_F6441B03-E773-42AB-93E9-466FAD395E7C--
 


Home | Main Index | Thread Index | Old Index