Subject: Re: Seemingly random SIGILL in SMP
To: Allen Wong <allen@submoron.org>
From: Michael Lorenz <macallan@NetBSD.org>
List: port-macppc
Date: 10/05/2007 11:53:59
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On Oct 5, 2007, at 11:42, Allen Wong wrote:

> -> in both -current and 4.0 I occasionally see processes dies with
> -> SIGILL, apparently at random. Looking at the core files revealed  
> that
> -> the faulting instruction was always part of a PLT table, apparently
> -> they're not always flushed out after writing them. I can't reliably
> -> trigger the fault but building something non-trivial ( like a
> -> userland ) usually runs into it at some point.
> -> So,
> -> - - does anyone else see this?
>
> Yes, I do whenever I try to build the userland.  It usually takes  
> me five or
> six tries before completion.  I also get the occasional SIGSTOP and  
> SIGBUS,
> I believe.  It's very random, it will almost never happen again at  
> the same
> place.

To verify it's the same problem please load the core file into gdb  
and disassemble what's at the fault address:
gdb -c whatever.core /path/to/whatever
disassemble 0xwhereveritborked

If the disassembly dump looks like this:
li r11,something
b somewhere
li r11, somethingelse
b elsewhere
or something like that ( just a long list of loads and branches )  
then it's the same problem.

> -> - - if so, in SMP or in UP as well? I've never seen this with an
> -> uniprocessor kernel.
> ->
>
> I'll build the userland in a UP kernel tonight and let you know.   
> My iMac G3
> has never had this problem and is extremely stable.

That sounds indeed like the problem I'm talking about.

> -> I changed the powerpc-specific part of ld.elf_so to flush the cache
> -> in a more consistent way and since then I haven't seen any  
> SIGILL and
> -> my G4's been building stuff from pkgsrc all night.
> ->
> -> If you see those SIGILL on a recent -current please try my patched
> -> ld.elf_so ( just dump it into /libexec, you might have to use  
> install
> -> instead of cp though ) and see if they go away. The binary is here:
> -> ftp://ftp.netbsd.org/pub/NetBSD/misc/macallan/macppc/ld.elf_so
> -> built from yesterday's sources.
> ->
>
> I'll test the new ld.elf_so as well.

It probably won't work in 4.0, I'll build you one that does.

have fun
Michael

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBRwZeF8pnzkX8Yg2nAQJcjwgAm8F9tMOsLw0vfCwZtbr+VeKLOQv7cxpX
GzUFFClkhD6nV9bXBgIAYD0A1+d4yRi2LhZ3w7I6YWblDpCvc8TCxPYKk/BsgA+z
ZteZLY9K9bv+X1NcvVeiMRsWROaDCLc4AnPdCC9f0r8LshyP8AY33HqQN1xmq9Xq
iEPEYfWhBHAq7JEUYqVUS3cC2y7tScbgUzUash1GXuDmOegTiBIh/4bOqbkpYjcC
Wn9oXOBAm90ZcPbvf9paRAgWjGrlUhpXlueJvRK+ElAinAj4Xk53Dqb9l0X4/LsZ
UV3ixjMaQtUehla/Lvo2m3tLMe0GE4ApF8auau+hkFV0GK1SdCd6xA==
=KGXi
-----END PGP SIGNATURE-----