Subject: Re: Strange segmentation fault trying to run postgresql on
To: Alex Pelts <alexp@broadcom.com>
From: Markus W Kilbinger <mk@kilbi.de>
List: port-cobalt
Date: 04/29/2007 11:16:13
>>>>> "Alex" == Alex Pelts <alexp@broadcom.com> writes:

    Alex> It is not the mapping that is the problem but alignment. It
    Alex> is trying to store word (32 bits) at the address that is
    Alex> aligned on 1 byte (cf). I don't know much about how netbsd
    Alex> translates mips exceptions to signals but I think from your
    Alex> register dump the problem is alignment.

I've investigated this a bit further (I can reproduce this problem on
my qube 2 (running -current kernel and userland) and
pkgsrc/databases/postgresql82(-server)). I've recompiled main/main.c
with -O0, otherwise the SIGSEGV seems to be 'hidden' in a branch delay
slot. Starting such postgres within gdb yields:

  (gdb) run
  Starting program: /usr/obj/pkg/databases/postgresql82-server/work.mipsel/postgresql-8.2.3/src/backend/postgres 
  
  Program received signal SIGSEGV, Segmentation fault.
  0x00573bcc in main ()

  (gdb) disas
  [...]
  0x00573bb8 <main+56>:   jalr    t9
  0x00573bbc <main+60>:   nop
  0x00573bc0 <main+64>:   lw      gp,16(s8)
  0x00573bc4 <main+68>:   move    v1,v0
  0x00573bc8 <main+72>:   lw      v0,-13948(gp)
  0x00573bcc <main+76>:   sw      v1,0(v0)
  0x00573bd0 <main+80>:   lw      v0,-13948(gp)
  0x00573bd4 <main+84>:   lw      v0,0(v0)
  0x00573bd8 <main+88>:   move    a0,v0
  0x00573bdc <main+92>:   lw      v0,-32632(gp)
  0x00573be0 <main+96>:   addiu   t9,v0,16308
  0x00573be4 <main+100>:  jalr    t9

  (gdb) info registers 
            zero       at       v0       v1       a0       a1       a2       a3
   R0   00000000 00000000 007e9f53 00830030 00830039 7fffda1d 00000000 7fffda14 
              t0       t1       t2       t3       t4       t5       t6       t7
   R8   7fffda1c 00000000 00000008 00000000 8000001f ffffffe0 7fffd978 006f9a34 
              s0       s1       s2       s3       s4       s5       s6       s7
   R16  7fffd9c0 7fffd8f8 00000001 7fffd8fc 007e9edc 7fffeff0 7dfb2e80 7dfaa000 
              t8       t9       k0       k1       gp       sp       s8       ra
   R24  000007de 7dbf6da8 00000000 00000000 007ed250 7fffd898 7fffd898 00573bc0 
              sr       lo       hi      bad    cause       pc
        0000ff13 0009f79c 000000b4 007e9f53 00000014 00573bcc 
             fsr      fir
        007b6300 00000000 

The problematic instruction seems to be

  0x00573bcc <main+76>:   sw      v1,0(v0)

whereas 'v0' contains an unaligned address '007e9f53' for a word
access.

Astonishingly:

  (gdb) x 0x007e9f53
  0x7e9f53 <progname>:    0x00000000

... this seems to be the address of

  const char *progname;

defined in main/main.c itself!?

So, how can this happen? Shouldn't be any pointer type variable
adequately aligned by cc/as/ld? (-> Bug within gcc/binutils?)

Maybe one of the mips guru's can help/comment here...

Markus.