NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-alpha/48709: static threaded programs crash



>Number:         48709
>Category:       port-alpha
>Synopsis:       static threaded programs crash
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Apr 04 19:30:00 +0000 2014
>Originator:     Martin Husemann
>Release:        NetBSD 6.99.39
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gemini.duskware.de 6.99.39 NetBSD 6.99.39 (GENERIC-$Revision: 
1.358 $) #11: Fri Apr 4 14:06:15 CEST 2014 
martin%night-owl.duskware.de@localhost:/usr/src/sys/arch/alpha/compile/GENERIC 
alpha
Architecture: alpha
Machine: alpha
>Description:

The test program for staticaly linked TLS access dies on alpha with a segfault,
try:

        cd /usr/tests/lib/libc/tls && ./t_tls_static -l

but also this minimalistic program dies, if compiled with -static -pthread:

--8<--
#include <stdlib.h>
#include <pthread.h>
void *func(void *arg) { return 0; }
int main(int arg, char **argv)
{
        pthread_t t;
        pthread_create(&t, NULL, func, NULL);
        return 0;
}
-->8--

Nick and I traced this to register t12 getting corrupted in _libc_init when
calling __libc_thr_init. However, it is not that simple: the original object
code (with relocations) is fine:

Disassembly of section .text.startup:

0000000000000000 <_libc_init>:
   0:   00 00 bb 27     ldah    gp,0(t12)
                        0: GPDISP       .text.startup+0x4
   4:   00 00 bd 23     lda     gp,0(gp)
   8:   f0 ff de 23     lda     sp,-16(sp)
   c:   00 00 5e b7     stq     ra,0(sp)
[..]
  64:   00 00 7d a7     ldq     t12,0(gp)
                        64: ELF_LITERAL __guard_setup
  68:   00 40 5b 6b     jsr     ra,(t12),6c <_libc_init+0x6c>
                        68: LITUSE      .text.startup+0x3
                        68: HINT        __guard_setup
  6c:   00 00 ba 27     ldah    gp,0(ra)
                        6c: GPDISP      .text.startup+0x4
  70:   00 00 bd 23     lda     gp,0(gp)
  74:   00 00 7d a7     ldq     t12,0(gp)
                        74: ELF_LITERAL __libc_atomic_init
  78:   00 40 5b 6b     jsr     ra,(t12),7c <_libc_init+0x7c>
                        78: LITUSE      .text.startup+0x3
                        78: HINT        __libc_atomic_init
  7c:   00 00 ba 27     ldah    gp,0(ra)
                        7c: GPDISP      .text.startup+0x4
  80:   00 00 bd 23     lda     gp,0(gp)
  84:   00 00 7d a7     ldq     t12,0(gp)
                        84: ELF_LITERAL __libc_static_tls_setup
  88:   00 40 5b 6b     jsr     ra,(t12),8c <_libc_init+0x8c>
                        88: LITUSE      .text.startup+0x3
                        88: HINT        __libc_static_tls_setup
  8c:   00 00 ba 27     ldah    gp,0(ra)
                        8c: GPDISP      .text.startup+0x4
  90:   00 00 bd 23     lda     gp,0(gp)
  94:   00 00 7d a7     ldq     t12,0(gp)
                        94: ELF_LITERAL __libc_thr_init
  98:   00 40 5b 6b     jsr     ra,(t12),9c <_libc_init+0x9c>
                        98: LITUSE      .text.startup+0x3
                        98: HINT        __libc_thr_init

As you can see, t12 is used for the jsr call and set up right before the
function call. I don't see where it is restored, but I might have 
misunderstood something about the abi.

Now binutils is smart and optimizes this for static linked
(and small) binaries:

   1200467b4:   00 00 fe 2f     unop    
   1200467b8:   43 00 40 d3     bsr     ra,1200468c8 <__guard_setup+0x8>
   1200467bc:   00 00 fe 2f     unop    
   1200467c0:   00 00 fe 2f     unop    
   1200467c4:   00 00 fe 2f     unop    
   1200467c8:   15 00 40 d3     bsr     ra,120046820 <__libc_atomic_init>
   1200467cc:   00 00 fe 2f     unop    
   1200467d0:   00 00 fe 2f     unop    
   1200467d4:   00 00 fe 2f     unop    
   1200467d8:   9b 5a 5f d3     bsr     ra,12001d248 
<__libc_static_tls_setup+0x8>
   1200467dc:   00 00 fe 2f     unop    
   1200467e0:   00 00 fe 2f     unop    
   1200467e4:   00 00 fe 2f     unop    
   1200467e8:   b1 f0 5e d3     bsr     ra,120002ab0 <__libc_thr_init>
   1200467ec:   00 00 fe 2f     unop    
   1200467f0:   00 00 fe 2f     unop    
   1200467f4:   00 00 fe 2f     unop    

The pattern is simple: the first two instructions in public functions
set up the gp register for this function. When it is known that the gp value
is not needed, or the same as in the calling function, the gp setup is skipped
and the jsr can be replaced by a bsr - but to past the two instructions (i.e.
typically function address + 8).

If you look at the disassembly above, you'll find some functions where the bsr
does not use the +8 offset: __libc_atomic_init and __libc_thr_init.

For the former, this is fine: it is an empty function and does not do
gp setup:

0000000120046820 <__libc_atomic_init>:
   120046820:   01 80 fa 6b     ret
   120046824:   00 00 fe 2f     unop    
   120046828:   1f 04 ff 47     nop     
   12004682c:   00 00 fe 2f     unop    

However, for __libc_thr_init it is not fine:

0000000120002ab0 <__libc_thr_init>:
   120002ab0:   06 00 bb 27     ldah    gp,6(t12)
   120002ab4:   78 f6 bd 23     lda     gp,-2440(gp)
   120002ab8:   c0 ff de 23     lda     sp,-64(sp)

So the
   1200467e8:   b1 f0 5e d3     bsr     ra,120002ab0 <__libc_thr_init>
actually modifies register t12 - and it is never restored.

I wonder if the "hidden" attribute of the function name is declared 
inconsistently somewhere (__libc_thr_init is aliased for libpthread), or
if this is a binutils bug -- or if I am just missing something.

Does anyone have an idea why binutils things it is ok to convert the
__libc_thr_init call to bsr without an offset?

>How-To-Repeat:
cd /usr/tests/lib/libc/tls
./t_tls_static -l

>Fix:
n/a



Home | Main Index | Thread Index | Old Index