Subject: port-powerpc/12938: shlib incompatibility 1.5.head vs 20010408-1.5.1_BETA
To: None <gnats-bugs@gnats.netbsd.org>
From: None <cagney@tpgi.com.au>
List: netbsd-bugs
Date: 05/14/2001 11:00:59
>Number:         12938
>Category:       port-powerpc
>Synopsis:       shlib incompatibility 1.5.head vs 20010408-1.5.1_BETA
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-powerpc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 14 08:55:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     
>Release:        NetBSD localhost 1.5V NetBSD 1.5V (NETLUX)
>Organization:

>Environment:

System: NetBSD localhost 1.5V NetBSD 1.5V (NETLUX) #0: Sat May 12 23:09:17 EDT 2001 boor@localhost:/usr/trunk.src/sys/arch/macppc/compile/NETLUX macppc

G4 Ti, the middle of the range version.


>Description:

	When trying to upgrade a powerpc system from
	20010408-1.5.1_BETA to the head of the 1.5 branch it develops
	apparently random SIGSEG (11). (see how-to-repeat).

	I've seen the effect with cc, ranlib, ar, printf, nm but that
	is probably because they all used when building.

	Once it has started happening for ar and ranlib, it is
	reproducable vis (using how-to-repeat):

# rm /tmp/obj/lib/libcrypto/*.a
# make build-install-lib (don't ask)
(cd /usr/src/lib &&  make   MKSHARE=no dependall &&  make  MKSHARE=no install)
....
dependall ===> libcrypto
building standard crypto library
ranlib libcrypto.a
*** Signal 11

	However, for cc it is more sensative.  Given it has something
	to do with exec and shared library loading I'm not suprised.

	It doesn't appear to be affected by machine load.

	The cause of the SIGSEG is always the same (see below).

	If you cd to the relevant directory and run the commads
	dumping core from there, the problem goes away ... vis:

bash-2.04# cd lib/libcrypto/
bash-2.04# rm obj/*.a
bash-2.04# make dependall
building standard crypto library
ranlib libcrypto.a
building profiled crypto library
ranlib libcrypto_p.a
building shared object crypto library
ranlib libcrypto_pic.a

	Examining the core dump.

	In the below I'm looking at an unstripped ranlib built /
	installed / linked against the head-of-1.5.  The same behavour
	occures using a 20010408-1.5.1_BETA ranlib dynamically linked
	against head-of-1.5.  This was simply the easiest way to get
	an unstripped binary.

# /home/scratch/WIP/mi/gdb/gdb ranlib /tmp/obj/lib/libcrypto/ranlib.core
(again don't ask - I've a fix to FSF GDB I need to check in :-)
GNU gdb 5.0 (MI_OUT)
This GDB was configured as "powerpc-apple-netbsd1.5V"...
Core was generated by `ranlib'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/libexec/ld.elf_so...done.
Loaded symbols for /usr/libexec/ld.elf_so
Reading symbols from /usr/lib/libbfd.so.3...done.
Loaded symbols for /usr/lib/libbfd.so.3
Reading symbols from /usr/lib/libc.so.12...done.
Loaded symbols for /usr/lib/libc.so.12
#0  0x419ab150 in _init ()
    at /usr/src/lib/csu/powerpc/../common_elf/crtbegin.c:106
106             if (!initialized) {
(gdb) x/i $pc
0x419ab150 <_init+16>:  lwz     r0,0(r9)
(gdb) disassemble 
Dump of assembler code for function _init:
0x419ab140 <_init>:     stwu    r1,-16(r1)
0x419ab144 <_init+4>:   mflr    r0
0x419ab148 <_init+8>:   stw     r0,20(r1)
0x419ab14c <_init+12>:  lis     r9,0
0x419ab150 <_init+16>:  lwz     r0,0(r9)
0x419ab154 <_init+20>:  cmpwi   r0,0
0x419ab158 <_init+24>:  bne     0x419ab174 <_init+52>
0x419ab15c <_init+28>:  li      r0,1
0x419ab160 <_init+32>:  stw     r0,0(r9)
0x419ab164 <_init+36>:  lis     r9,0
0x419ab168 <_init+40>:  lwz     r0,0(r9)
0x419ab16c <_init+44>:  mtlr    r0
0x419ab170 <_init+48>:  blrl
0x419ab174 <_init+52>:  lwz     r0,20(r1)
0x419ab178 <_init+56>:  mtlr    r0
0x419ab17c <_init+60>:  addi    r1,r1,16
0x419ab180 <_init+64>:  blr
End of assembler dump.

	Note the sequence:

0x419ab14c <_init+12>:  lis     r9,0
0x419ab150 <_init+16>:  lwz     r0,0(r9)

	and compare that to the executable:

# /home/scratch/WIP/mi/gdb/gdb ranlib
(gdb) disassemble _init
...
0x18059c4 <_init+12>:   lis     r9,388
0x18059c8 <_init+16>:   lwz     r0,30900(r9)

>How-To-Repeat:

	The obvious thing to do is to give the below a wirl and if it
	works fine on a similar system conclude that it is something
	to do with my hardware and not the kernel et.al.

	# gzcat base.tgz | ( cd / && tar --unlink -xpf - )
	# gzcat comp.tgz | ( cd / && tar --unlink -xpf - )
	# cd /usr/src && make build
	....
	building standard crypto library
	ranlib libcrypto.a
	*** Signal 1

>Fix:

	Workaround: per, how-to-repeat, revert to the old
	libraries/binaries.
>Release-Note:
>Audit-Trail:
>Unformatted: