Subject: Re: Strange segmentation fault trying to run postgresql on
To: None <mk@kilbi.de>
From: Alex Pelts <alexp@broadcom.com>
List: port-cobalt
Date: 04/29/2007 20:54:01
You can see that v0 is loaded from memory location pointed by gp. I
guess some sort of small data area optimization of some sort. The offset
is a bit large for the structure access. Unless it is array or a large
structure.
You also need to post c source that generates this code. I run in to
similar problem when code would run on linux but would produce exception
on nucleus and vxworks. Turned out that linux exception handler would
recover from unaligned access by loading register and continuing
program. Maybe this is the case for this application as well. No one
checked this app on mips.
Regards,
Alex
Markus W Kilbinger wrote:
>>>>>> "Alex" == Alex Pelts <alexp@broadcom.com> writes:
>
> Alex> It is not the mapping that is the problem but alignment. It
> Alex> is trying to store word (32 bits) at the address that is
> Alex> aligned on 1 byte (cf). I don't know much about how netbsd
> Alex> translates mips exceptions to signals but I think from your
> Alex> register dump the problem is alignment.
>
> I've investigated this a bit further (I can reproduce this problem on
> my qube 2 (running -current kernel and userland) and
> pkgsrc/databases/postgresql82(-server)). I've recompiled main/main.c
> with -O0, otherwise the SIGSEGV seems to be 'hidden' in a branch delay
> slot. Starting such postgres within gdb yields:
>
> (gdb) run
> Starting program: /usr/obj/pkg/databases/postgresql82-server/work.mipsel/postgresql-8.2.3/src/backend/postgres
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00573bcc in main ()
>
> (gdb) disas
> [...]
> 0x00573bb8 <main+56>: jalr t9
> 0x00573bbc <main+60>: nop
> 0x00573bc0 <main+64>: lw gp,16(s8)
> 0x00573bc4 <main+68>: move v1,v0
> 0x00573bc8 <main+72>: lw v0,-13948(gp)
> 0x00573bcc <main+76>: sw v1,0(v0)
> 0x00573bd0 <main+80>: lw v0,-13948(gp)
> 0x00573bd4 <main+84>: lw v0,0(v0)
> 0x00573bd8 <main+88>: move a0,v0
> 0x00573bdc <main+92>: lw v0,-32632(gp)
> 0x00573be0 <main+96>: addiu t9,v0,16308
> 0x00573be4 <main+100>: jalr t9
>
> (gdb) info registers
> zero at v0 v1 a0 a1 a2 a3
> R0 00000000 00000000 007e9f53 00830030 00830039 7fffda1d 00000000 7fffda14
> t0 t1 t2 t3 t4 t5 t6 t7
> R8 7fffda1c 00000000 00000008 00000000 8000001f ffffffe0 7fffd978 006f9a34
> s0 s1 s2 s3 s4 s5 s6 s7
> R16 7fffd9c0 7fffd8f8 00000001 7fffd8fc 007e9edc 7fffeff0 7dfb2e80 7dfaa000
> t8 t9 k0 k1 gp sp s8 ra
> R24 000007de 7dbf6da8 00000000 00000000 007ed250 7fffd898 7fffd898 00573bc0
> sr lo hi bad cause pc
> 0000ff13 0009f79c 000000b4 007e9f53 00000014 00573bcc
> fsr fir
> 007b6300 00000000
>
> The problematic instruction seems to be
>
> 0x00573bcc <main+76>: sw v1,0(v0)
>
> whereas 'v0' contains an unaligned address '007e9f53' for a word
> access.
>
> Astonishingly:
>
> (gdb) x 0x007e9f53
> 0x7e9f53 <progname>: 0x00000000
>
> ... this seems to be the address of
>
> const char *progname;
>
> defined in main/main.c itself!?
>
> So, how can this happen? Shouldn't be any pointer type variable
> adequately aligned by cc/as/ld? (-> Bug within gcc/binutils?)
>
> Maybe one of the mips guru's can help/comment here...
>
> Markus.
>