Subject: port-sparc64/20907: [rkb] ld.elf_so with RTLD_DEBUG_RELOC broken on sparc64
To: None <gnats-bugs@gnats.netbsd.org>
From: None <rafal@netbsd.org>
List: netbsd-bugs
Date: 03/27/2003 12:23:33
>Number:         20907
>Category:       port-sparc64
>Synopsis:       sparc64 ld.elf_so broken when RTLD_DEBUG_RELOC enabled
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 27 09:24:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Rafal Boni
>Release:        NetBSD 1.6P
>Organization:
Organized?  Me?  Hah!
>Environment:
	
	
System: NetBSD fearless-vampire-killer.waterside.net 1.6P NetBSD 1.6P (FEARLESS_VAMPIRE_KILLER) #5: Wed Mar 5 17:43:57 EST 2003 rafal@fearless-vampire-killer.waterside.net:/extra/src-current/sys/arch/sparc64/compile/FEARLESS_VAMPIRE_KILLER sparc64
Architecture: sparc64
Machine: sparc64

64-bit kernel and userland.

Sources from ~ Mar 10 15:15 EST (GMT-5); userland rebuilt from those sources.
ident output from /libexec/ld.elf_so is:

/libexec/ld.elf_so:
     $NetBSD: mmap.c,v 1.11 2000/01/22 22:19:20 mycroft Exp $
     $NetBSD: cerror.S,v 1.5 2002/05/07 01:34:22 eeh Exp $
     $NetBSD: rindex.c,v 1.12 2001/02/09 11:47:22 wiz Exp $
     $NetBSD: index.c,v 1.12 2001/02/09 11:47:21 wiz Exp $
     $NetBSD: strspn.c,v 1.9 1999/09/20 04:39:48 lukem Exp $
     $NetBSD: strsep.c,v 1.13 2002/01/31 22:43:41 tv Exp $
     $NetBSD: strncpy.c,v 1.11 1999/09/20 04:39:48 lukem Exp $
     $NetBSD: strlen.S,v 1.4 2002/04/02 22:07:55 eeh Exp $
     $NetBSD: strcspn.c,v 1.9 1999/09/20 04:39:46 lukem Exp $
     $NetBSD: strcpy.c,v 1.12 1999/09/20 04:39:46 lukem Exp $
     $NetBSD: strcmp.c,v 1.12 1999/09/20 04:39:46 lukem Exp $
     $NetBSD: memset.S,v 1.4 2001/08/02 01:17:28 eeh Exp $
     $NetBSD: memcpy.S,v 1.2 2001/08/01 05:52:12 eeh Exp $
     $NetBSD: memcmp.c,v 1.11 1999/09/20 04:39:45 lukem Exp $
     $NetBSD: strdup.c,v 1.12 2000/01/22 22:19:20 mycroft Exp $
     $NetBSD: getenv.c,v 1.16 2003/01/18 11:32:03 thorpej Exp $
     $NetBSD: exit.c,v 1.9 2003/03/01 04:19:38 thorpej Exp $
     $NetBSD: abort.c,v 1.11 1998/10/12 15:56:16 kleink Exp $
     $NetBSD: sysctl.c,v 1.12 2002/12/19 23:31:55 kleink Exp $
     $NetBSD: signal.c,v 1.11 2000/01/22 22:19:12 mycroft Exp $
     $NetBSD: __errlist14.c,v 1.5 2002/11/12 10:28:27 kleink Exp $
     $NetBSD: bcopy.c,v 1.14 2002/07/04 15:48:41 kent Exp $
     $NetBSD: strncmp.c,v 1.12 1999/09/20 04:39:48 lukem Exp $

Only changes made to ld.elf_so directory in my sources are to the Makefile,
like so; the "-g" vs. "-O3 -fomit-frame-pointer" are irrelevant, as things
behaved the same before I changed that.

Index: Makefile
===================================================================
RCS file: /cvsroot/src/libexec/ld.elf_so/Makefile,v
retrieving revision 1.61
diff -u -r1.61 Makefile
--- Makefile    2003/03/25 13:11:53     1.61
+++ Makefile    2003/03/27 16:43:42
@@ -40,11 +40,11 @@
 CPPFLAGS+= -I${.CURDIR}
 CPPFLAGS+= -DRTLD_LOADER
 CPPFLAGS+= -D_RTLD_SOURCE
-#CPPFLAGS+= -DDEBUG
+CPPFLAGS+= -DDEBUG
 #CPPFLAGS+= -DRTLD_DEBUG
-#CPPFLAGS+= -DRTLD_DEBUG_RELOC
-#DBG=  -g
-DBG=   -O3 -fomit-frame-pointer
+CPPFLAGS+= -DRTLD_DEBUG_RELOC
+DBG=   -g
+#DBG=  -O3 -fomit-frame-pointer
 
 .if ${SHLIBDIR} != ${LIBDIR}
 CPPFLAGS+= -DRTLD_DEFAULT_LIBRARY_PATH=\"${SHLIBDIR}:${LIBDIR}\"

>Description:
	If I enable RTLD_DEBUG_RELOC in the sparc64 ld.elf_so Makefile,
	certain binaries reliably cause it to crash.  The crash is due
	to a SIGBUS somewhere in memcpy() in _rtld_do_copy_relocation().
	(RTLD_DEBUG_RELOC also requires DEBUG to be enabled, so that's
	 on as well)

	stty, tput, tset seem to be frequent candidates, as was /bin/sh
	before I replaced it with /rescue/sh to I could keep using the
	system.

	For one of the stty cores, gdb says:

Core was generated by `stty'.
Program terminated with signal 10, Bus error.

warning: current_sos: Can't read pathname for load map: Input/output error

Reading symbols from /lib/libc.so.12...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.12
#0  0x40307a88 in ?? ()
(gdb) where
#0  0x40307a88 in ?? ()
#1  0x40307c44 in ?? ()
#2  0x40306714 in ?? ()

Doing some extrapolation with `nm -n /libexec/ld.elf_so', this looks like:
	00000000000078ec t _rtld_do_copy_relocation
	0000000000007ad0 T _rtld_do_copy_relocations

	0000000000005d80 T _rtld
	00000000000068b8 T _rtld_die

	so that's:
		_rtld_do_copy_relocation
		_rtld_do_copy_relocations
		_rtld

I added a rdbg() statement before the memcpy() in _rtld_do_copy_relocation
and the last things I see with LD_DEBUG=1 set are:

[...]
about to copy COPY stty /lib/libc.so.12 opterr --> src=0x405da398 dst=0x2053b8 size 4
COPY stty /lib/libc.so.12 opterr --> src=0x405da398 dst=0x2053b8 *dst= 0x100000000 size 4
check optind vs rpcbaddr_cache_lock in 0x40208200
check optind vs optind in 0x40208200
about to copy COPY stty /lib/libc.so.12 optind --> src=0x405da39c dst=0x2053bc size 4

>How-To-Repeat:
	Rebuild ld.elf_so with DEBUG & RTLD_DEBUG_RELOC both set, install
	it and watch stuff fall over.

>Fix:
	Dunno.

>Release-Note:
>Audit-Trail:
>Unformatted: