Subject: Re: Seemingly random SIGILL in SMP
To: Allen Wong <allen@submoron.org>
From: Michael Lorenz <macallan@netbsd.org>
List: port-macppc
Date: 10/06/2007 03:46:00
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--Apple-Mail-9-748679615
Content-Type: multipart/mixed; boundary=Apple-Mail-8-748679290


--Apple-Mail-8-748679290
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	delsp=yes;
	format=flowed

Hello,

On Oct 6, 2007, at 02:49, Allen Wong wrote:

> -> To verify it's the same problem please load the core file into gdb
> -> and disassemble what's at the fault address:
> -> gdb -c whatever.core /path/to/whatever
> -> disassemble 0xwhereveritborked
> ->
> -> If the disassembly dump looks like this:
> -> li r11,something
> -> b somewhere
> -> li r11, somethingelse
> -> b elsewhere
> -> or something like that ( just a long list of loads and branches )
> -> then it's the same problem.
> ->
>
> I don't get anything from gdb:
...
> Program terminated with signal 4, Illegal instruction.
...
> #0  0xeff7cf74 in ?? () from /lib/libc.so.12
> (gdb) disassemble 0xeff7cf74
> No function contains specified address.

Might be a different issue.
Although, a miscached PLT entry doesn't necessarily give a SIGILL  
right away, the cache might contain some valid instruction that could  
do pretty much anything before faulting somewhere. Is the stack valid  
( eg, does the bt command give a useful stack trace? )

> -> >I'll build the userland in a UP kernel tonight and let you know.
> -> >My iMac G3
> -> >has never had this problem and is extremely stable.
> ->
> -> That sounds indeed like the problem I'm talking about.
> ->
>
> A UP kernel builds the userland with no problems.

Yeah, there are probably more cache syncing problems left.

> -> It probably won't work in 4.0, I'll build you one that does.
> ->
>
> Can you please build one for 3.1 as well?  I'm still working on  
> booting 4.0
> on an ide drive.  Thanks!

That might take some time, I need to download the source first. If  
you have the source handy, the patch is small, changes only a handful  
lines in a single file. See attachment.

have fun
Michael


--Apple-Mail-8-748679290
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name=ldso.patch
Content-Disposition: attachment;
	filename=ldso.patch

Index: ppc_reloc.c
===================================================================
RCS file: /cvsroot/src/libexec/ld.elf_so/arch/powerpc/ppc_reloc.c,v
retrieving revision 1.40
diff -u -w -r1.40 ppc_reloc.c
--- ppc_reloc.c	23 May 2006 16:27:41 -0000	1.40
+++ ppc_reloc.c	5 Oct 2007 23:12:00 -0000
@@ -239,7 +239,7 @@
 		 * in _rtld_setup_pltgot() after all the entries have been
 		 * initialized.
 		 */
-		/* __syncicache(where - 3, 12); */
+		__syncicache(where - 3, 12);
 	}
 
 	return 0;
@@ -271,6 +271,7 @@
 		__syncicache(where, 4);
 	} else {
 		Elf_Addr *pltcall, *jmptab;
+		Elf_Word *start = where;
 		int N = obj->pltrelalim - obj->pltrela;
 	
 		/* Entries beyond 8192 take twice as much space. */
@@ -294,7 +295,7 @@
 		/* b	pltcall	*/
 		distance = (Elf_Addr)pltcall - (Elf_Addr)where;
 		*where++ = 0x48000000 | (distance & 0x03fffffc);
-		__syncicache(where - 3, 12);
+		__syncicache(start, 12);
 	}
 
 	if (tp)

--Apple-Mail-8-748679290--

--Apple-Mail-9-748679615
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBRwc9OMpnzkX8Yg2nAQLeRwgAsAgGXgp6XnaIIQ05dVeSCN5SpPKUNEGb
2pO1zzlXg0PcUo5R02PRON+9yioMNMlbzeh9bDlYxOR+r43a8t7ltjYWP8d0H/P+
fQyDh8HVq5357XS1c7pDBHR5qP5mtwA/2nrWVInsG83+LRTY8kp6wtvcDjbBUiBV
absliJS24oRsaSRzVGDUMWxgX2iMPDcKyx67frspF5EtQipy/ZfsrtDOj5S5Xkv1
d0zexSmnC6PD0JocrIYFrMKHi1afxq61vgHnoJ4iyQlrdJCTAn9xSk1SalkaBmb3
wtujpkHTT0rr1bOt9ObBgD/Eymhf7d/sLV3g1b8spJQq1fZUKA2sCQ==
=aonj
-----END PGP SIGNATURE-----

--Apple-Mail-9-748679615--