port-cobalt: Re: Strange segmentation fault trying to run postgresql on

Subject: Re: Strange segmentation fault trying to run postgresql on
To: None <mk@kilbi.de>
From: Alex Pelts <alexp@broadcom.com>
List: port-cobalt
Date: 04/29/2007 20:54:01
You can see that v0 is loaded from memory location pointed by gp. I 
guess some sort of small data area optimization of some sort. The offset 
is a bit large for the structure access. Unless it is array or a large 
structure.

You also need to post c source that generates this code. I run in to 
similar problem when code would run on linux but would produce exception 
on nucleus and vxworks. Turned out that linux exception handler would 
recover from unaligned access by loading register and continuing 
program. Maybe this is the case for this application as well. No one 
checked this app on mips.

Regards,
Alex


Markus W Kilbinger wrote:
>>>>>> "Alex" == Alex Pelts <alexp@broadcom.com> writes:
> 
>     Alex> It is not the mapping that is the problem but alignment. It
>     Alex> is trying to store word (32 bits) at the address that is
>     Alex> aligned on 1 byte (cf). I don't know much about how netbsd
>     Alex> translates mips exceptions to signals but I think from your
>     Alex> register dump the problem is alignment.
> 
> I've investigated this a bit further (I can reproduce this problem on
> my qube 2 (running -current kernel and userland) and
> pkgsrc/databases/postgresql82(-server)). I've recompiled main/main.c
> with -O0, otherwise the SIGSEGV seems to be 'hidden' in a branch delay
> slot. Starting such postgres within gdb yields:
> 
>   (gdb) run
>   Starting program: /usr/obj/pkg/databases/postgresql82-server/work.mipsel/postgresql-8.2.3/src/backend/postgres 
>   
>   Program received signal SIGSEGV, Segmentation fault.
>   0x00573bcc in main ()
> 
>   (gdb) disas
>   [...]
>   0x00573bb8 <main+56>:   jalr    t9
>   0x00573bbc <main+60>:   nop
>   0x00573bc0 <main+64>:   lw      gp,16(s8)
>   0x00573bc4 <main+68>:   move    v1,v0
>   0x00573bc8 <main+72>:   lw      v0,-13948(gp)
>   0x00573bcc <main+76>:   sw      v1,0(v0)
>   0x00573bd0 <main+80>:   lw      v0,-13948(gp)
>   0x00573bd4 <main+84>:   lw      v0,0(v0)
>   0x00573bd8 <main+88>:   move    a0,v0
>   0x00573bdc <main+92>:   lw      v0,-32632(gp)
>   0x00573be0 <main+96>:   addiu   t9,v0,16308
>   0x00573be4 <main+100>:  jalr    t9
> 
>   (gdb) info registers 
>             zero       at       v0       v1       a0       a1       a2       a3
>    R0   00000000 00000000 007e9f53 00830030 00830039 7fffda1d 00000000 7fffda14 
>               t0       t1       t2       t3       t4       t5       t6       t7
>    R8   7fffda1c 00000000 00000008 00000000 8000001f ffffffe0 7fffd978 006f9a34 
>               s0       s1       s2       s3       s4       s5       s6       s7
>    R16  7fffd9c0 7fffd8f8 00000001 7fffd8fc 007e9edc 7fffeff0 7dfb2e80 7dfaa000 
>               t8       t9       k0       k1       gp       sp       s8       ra
>    R24  000007de 7dbf6da8 00000000 00000000 007ed250 7fffd898 7fffd898 00573bc0 
>               sr       lo       hi      bad    cause       pc
>         0000ff13 0009f79c 000000b4 007e9f53 00000014 00573bcc 
>              fsr      fir
>         007b6300 00000000 
> 
> The problematic instruction seems to be
> 
>   0x00573bcc <main+76>:   sw      v1,0(v0)
> 
> whereas 'v0' contains an unaligned address '007e9f53' for a word
> access.
> 
> Astonishingly:
> 
>   (gdb) x 0x007e9f53
>   0x7e9f53 <progname>:    0x00000000
> 
> ... this seems to be the address of
> 
>   const char *progname;
> 
> defined in main/main.c itself!?
> 
> So, how can this happen? Shouldn't be any pointer type variable
> adequately aligned by cc/as/ld? (-> Bug within gcc/binutils?)
> 
> Maybe one of the mips guru's can help/comment here...
> 
> Markus.
>