Subject: possible codegen bug?
To: None <port-sparc@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-sparc
Date: 07/27/1999 12:41:09
I've got two NetBSD/sparc machines.  Both are running 1.4 kernels; one
of them (callisto) has a matching userland, the other (sparkle) has a
userland built sources supped from 1999-03-11.  (I'm trying to do a
build-from-source upgrade in a chrooted area on sparkle; it's being
difficult.)

I have a program that uses some gccisms (nested functions and nonlocal
gotos, implementing catch/throw paradigm).  On sparkle, this builds and
works Just Fine.  I copied the same source to callisto, recompiled, and
it fell over hard.  Building the affected file with -save-temps and
examining the .s files, it looks as though some offsets in the nested
function are wrong.  Toolchain folks, does this match anything known to
have been fixed since 1.4?

Specifically,

source code:
	static int dis_wrap(void (*fn)(unsigned char), unsigned char arg)
	{
	 __label__ fail;
	 ADDR addrsave;
	 int ofillsave;
	
	 static void _fail(void)
	  { goto fail;
	  }
	
	 if (0)
	  {
	fail:;
	    addr = addrsave;
	    ofill = ofillsave;
	    return(0);
	  }
	 addrsave = addr;
	 ofillsave = ofill;
	 nodisfn = &_fail;
	 (*fn)(arg);
	 return(1);
	}

sparkle (working) code for _fail() [L265 is the fail: label]:
		.align 4
		.type	__fail.6,@function
		.proc	020
	__fail.6:
	.stabn 68,0,1101,LM330
	LM330:
		!#PROLOGUE# 0
		save %sp,-112,%sp
		!#PROLOGUE# 1
		st %g2,[%fp-12]
	.stabn 68,0,1101,LM331
	LM331:
		ta 3
		ld [%fp-12],%o0
		add %o0,8,%o1
		mov %o1,%fp
		ld [%fp-16],%o0
		ld [%fp-20],%fp
		sethi %hi(L265),%o1
		or %o1,%lo(L265),%g2
		jmp %o0+0
		restore

callisto (broken) code for _fail() [L265 is the fail: label]:
		.align 4
		.type	 __fail.6,@function
		.proc	020
	__fail.6:
	.stabn 68,0,1101,LM330
	LM330:
		!#PROLOGUE# 0
		save %sp,-112,%sp
		!#PROLOGUE# 1
		st %g2,[%fp-12]
	.stabn 68,0,1101,LM331
	LM331:
		ta 3
		ld [%fp-16],%o0
		add %o0,8,%o1
		mov %o1,%fp
		sethi %hi(L265),%o1
		or %o1,%lo(L265),%o0
		ld [%fp-20],%fp
		ld [%fp-12],%g2
		jmp %o0+0
		restore

Admittedly, some of this might be related to the trampoline, which is
generated very differently on the two machines.  But *something* is
going wrong with the newer toolchain, and the code I quote above looks
wrong to me.  (The "broken" toolchain identifies itself as
"egcs-1.1.1"; the "working" one, "2.7.2.2+myc1".)  And because I'm
already running a new kernel under the old ("working") userland, I
don't think it's a kernel issue.

I can supply full details, including full .c, .i, .s, and even .o
files, if anyone wants them.  If necessary I can even ship someone full
source code to the program in question, since it's all my own code.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B