Subject: Netscape - the plot thickens
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.Tujunga.CA.US>
List: port-sparc
Date: 11/15/1995 00:05:58
Well, I've gotten a little bit further.  Thanks to a timely hint from Theo,
I instrumented the remaining SIGILLs in trap.c.  I also tried it in machdep.c
(and even in fpu.c, just to cover all the bases), just to see if I could
catch everything.

The weird thing is that somehow, most of the SIGILLs are still getting
through, undetected.  I have no idea how this is happening.

The good news, though, is that a few times, I was able to catch it at work:

netscape-2.0beta2[pid 212]: T_RWRET read_rw failed: pc=3a44fc npc=3a4500
	psr=90001081<EF,S>

This comes from instrumenting this snippet of trap():

...
#define read_rw(src, dst) \
	copyin((caddr_t)(src), (caddr_t)(dst), sizeof(struct rwindow))

	case T_RWRET:
		/*
		 * T_RWRET is a window load needed in order to rett.
		 * It simply needs the window to which tf->tf_out[6]
		 * (%sp) points.  There are no user or saved windows now.
		 * Copy the one from %sp into pcb->pcb_rw[0] and set
		 * nsaved to -1.  If we decide to deliver a signal on
		 * our way out, we will clear nsaved.
		 */
if (pcb->pcb_uw || pcb->pcb_nsaved) panic("trap T_RWRET 1"); 
if (rwindow_debug)
printf("%s[%d]: rwindow: pcb<-stack: %x\n", p->p_comm, p->p_pid, tf->tf_out[6]);		if (read_rw(tf->tf_out[6], &pcb->pcb_rw[0]))
			sigexit(p, SIGILL);

The above debug message was inserted right before the sigexit(p, SIGILL).

I unstripped a copy of the SunOS binary so I could get some symbols, and
took a look:

...
_PR_Start+0xa0: add     %o2, 0x1, %o1
_PR_Start+0xa4: mov     %o1, %o2
_PR_Start+0xa8: st      %o2, [%o0 + 0x60]
_PR_Start+0xac: ld      [%fp + 0x44], %o0
_PR_Start+0xb0: add     %o0, 0x58, %o1
_PR_Start+0xb4: mov     %o1, %o0
_PR_Start+0xb8: call    _setjmp
_PR_Start+0xbc: nop
_PR_Start+0xc0: tst     %o0			! == PC
_PR_Start+0xc4: be      _PR_Start + 0xd4	! == NPC
_PR_Start+0xc8: nop
_PR_Start+0xcc: call    _HopToadNoArgs
_PR_Start+0xd0: nop

This smells like a longjmp() returning and the restoration of the register
windows getting botched somehow?  Is there any more debugging info I should
be trying to print out?

Thanks to Theo and Charles for their suggestions.

	- Greg