tech-toolchain: sparc64 toolchain JMP_SLOT reloc/PLT lossage (aka tcl vs. XFree 4.3)

Subject: sparc64 toolchain JMP_SLOT reloc/PLT lossage (aka tcl vs. XFree 4.3)
To: None <port-sparc64@netbsd.org, tech-toolchain@netbsd.org>
From: Rafal Boni <rafal@attbi.com>
List: tech-toolchain
Date: 03/28/2003 18:28:09
Folks:
	So I've tracked down my "exmh won't run after installing the XF86
	4.3 libraries on my Ultra5" to what looks to be a toolchain issue.
	THe short of the story is that wish (the Tcl shell) is killed by 
	the kernel fairly early on in the process lifetime as a result of
	an illegal instruction fault (always in the same place).

	Here's the traceback I get from gdb:

$ gdb `which wish` wish8.3.core 
GNU gdb 5.0nb1
[...]
This GDB was configured as "sparc64--netbsd"...(no debugging symbols found)...
Core was generated by `wish8.3'.
Program terminated with signal 4, Illegal instruction.
Reading symbols from /usr/libexec/ld.elf_so...Deprecated bfd_read called at /extra/src-current/gnu/dist/toolchain/gdb/dbxread.c line 2638 in elfstab_build_psymtabs
Deprecated bfd_read called at /extra/src-current/gnu/dist/toolchain/gdb/dbxread.c line 976 in fill_symbuf
done.
Loaded symbols for /usr/libexec/ld.elf_so
Reading symbols from /usr/pkg/lib/libtk83.so.1...done.
Loaded symbols for /usr/pkg/lib/libtk83.so.1
Reading symbols from /usr/pkg/lib/libtcl83.so.1...done.
Loaded symbols for /usr/pkg/lib/libtcl83.so.1
Reading symbols from /usr/X11R6/lib/libX11.so.6...done.
Loaded symbols for /usr/X11R6/lib/libX11.so.6
Reading symbols from /usr/lib/libm.so.0...done.
Loaded symbols for /usr/lib/libm.so.0
Reading symbols from /usr/lib/libc.so.12...done.
Loaded symbols for /usr/lib/libc.so.12
Reading symbols from /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2...done.
Loaded symbols for /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2
Reading symbols from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2...done.
Loaded symbols for /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
#0  0x40dcc980 in __JCR_LIST__ ()
   from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
(gdb) where
#0  0x40dcc980 in __JCR_LIST__ ()
   from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
#1  0x40cc28b8 in _XimSetLocalIMDefaults ()
   from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
#2  0x40cc0068 in _XimLocalOpenIM ()
   from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
#3  0x40cbf618 in _XimOpenIM ()
   from /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2
#4  0x40746c04 in _XDynamicOpenIM () from /usr/X11R6/lib/libX11.so.6
#5  0x4071aa0c in XOpenIM () from /usr/X11R6/lib/libX11.so.6
#6  0x40383fc8 in OpenIM () from /usr/pkg/lib/libtk83.so.1
#7  0x403824c8 in GetScreen () from /usr/pkg/lib/libtk83.so.1
#8  0x403821b0 in CreateTopLevelWindow () from /usr/pkg/lib/libtk83.so.1
#9  0x40382a10 in TkCreateMainWindow () from /usr/pkg/lib/libtk83.so.1
#10 0x4039c37c in CreateFrame () from /usr/pkg/lib/libtk83.so.1
#11 0x4039c078 in TkCreateFrame () from /usr/pkg/lib/libtk83.so.1
#12 0x4038468c in Initialize () from /usr/pkg/lib/libtk83.so.1
#13 0x40384128 in Tk_Init () from /usr/pkg/lib/libtk83.so.1
#14 0x101198 in Tcl_AppInit ()
#15 0x40377d00 in Tk_MainEx () from /usr/pkg/lib/libtk83.so.1
#16 0x101164 in main ()
#17 0x100cdc in _init ()
(gdb) x/i  0x40dcc980
0x40dcc980 <__JCR_LIST__+3096>: illtrap  0
(gdb) 
0x40dcc984 <__JCR_LIST__+3100>: illtrap  0
(gdb) 
0x40dcc988 <__JCR_LIST__+3104>: illtrap  0
(gdb) 
0x40dcc98c <__JCR_LIST__+3108>: illtrap  0
(gdb) 
0x40dcc990 <__JCR_LIST__+3112>: illtrap  0
(gdb) 
0x40dcc994 <__JCR_LIST__+3116>: illtrap  0
(gdb) 
0x40dcc998 <__JCR_LIST__+3120>: illtrap  0
(gdb) 
0x40dcc99c <__JCR_LIST__+3124>: illtrap  0

This happens to be in /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2,
which is loaded at 0x40cac000, so it's offset 0x120980.  At first, I
figured it was either cache or ld.elf_so lossage, maybe related to the
cache.  But looking further, it appears that ximcp.so.2 shared object
is actually hosed...

$ objdump -R /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2 | fgrep 120980
0000000000120980 R_SPARC_JMP_SLOT  _XimCheckIMMode

Which makes sense given where this call trace is coming from in the X11
library sources (xsrc/xfree/xc/lib/X11/imRm.c, ~ line 2554).

Poking at the PLT in the ximcp.so.2 shared object, it looks like parts of
the PLT are zorched, though, containing only zeroes instead of the expected
entries, like so:

$ objdump -D --start-address=0x120900 --stop-address=0x120a00 /usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2

/usr/X11R6/lib/X11/locale/lib/common/ximcp.so.2:     file format elf64-sparc

Disassembly of section .hash:
Disassembly of section .dynsym:
Disassembly of section .dynstr:
Disassembly of section .rela.dyn:
Disassembly of section .rela.plt:
Disassembly of section .init:
Disassembly of section .text:
Disassembly of section .fini:
Disassembly of section .rodata:
Disassembly of section .note.netbsd.ident:
Disassembly of section .data:
Disassembly of section .eh_frame:
Disassembly of section .dynamic:
Disassembly of section .ctors:
Disassembly of section .dtors:
Disassembly of section .jcr:
Disassembly of section .plt:

0000000000120900 <.plt+0xb00>:
        ...
  120908:       01 00 00 00     nop 
  12090c:       01 00 00 00     nop 
  120910:       01 00 00 00     nop 
  120914:       01 00 00 00     nop 
  120918:       01 00 00 00     nop 
  12091c:       01 00 00 00     nop 
  120920:       03 00 0b 20     sethi  %hi(0x2c8000), %g1
  120924:       30 6f fd 3f     b,a   %xcc, 11fe20 <_PROCEDURE_LINKAGE_TABLE_+0x20>
  120928:       00 00 00 00     illtrap  0
        ...
Disassembly of section .got:

So the PLT entry we were looking for, at 0x120980, is indeed all zeroes,
and therefore it's not surprising the program dies with an illegal instr.
fault.

Anyone seem this type of lossage before?  A quick glance through the
Oracle of Google didn't find anything interesting, but it was a real
quick look.

Thanks,
--rafal

----
Rafal Boni                                                     rafal@attbi.com
  We are all worms.  But I do believe I am