Subject: Re: Floating Point broken on Alpha 21066?
To: R.o.s.s H.a.r.v.e.y <ross@ghs.com>
From: Werner Backes <werner@bit-1.de>
List: port-alpha
Date: 01/04/2002 14:02:43
> That pretty much settles it. Something in your system is broken,
> and it's probably the cpu chip itself.

Yes, I think this makes sense. During FTP downloads yesterday
I got FPE's, when I did the same download again, it sometimes
worked or crashed later during the download. This seems much 
like a hardware problem to me. Unforunately I cannot change the 
CPU because it's soldered.

Being curious, I reinstalled comp.tgz again to have a closer look
to the error gcc gave me:

  multia# cc x.c
  /usr/src/gnu/usr.bin/egcs/common/../../../dist/gcc/toplev.c:2263: 
  Internal compiler error in function float_signal

Line 2263 is the abort() in the function below:

/* Signals actually come here.  */

static void
float_signal (signo)
     /* If this is missing, some compilers complain.  */
     int signo ATTRIBUTE_UNUSED;
{
  if (float_handled == 0)
    abort ();
#if defined (USG) || defined (hpux)
  signal (SIGFPE, float_signal);  /* re-enable the signal catcher */
#endif
  float_handled = 0;
  signal (SIGFPE, float_signal);
  longjmp (float_handler, 1);
}

I'm not sure what to learn from this but it seems that this
function is being called twice without setting float_handled=1
in the meantime, or something like this.

> Now, I really shouldn't say this, but it does look like your
> particular individual failure might actually have a possible SW
> workaround, especially with a -current kernel. The -current NetBSD
> kernel has an almost complete software IEEE FP interpreter built into
> it. 

I would have to build the -current kernel myself to try this? 
That wouldn't be possible at the moment because the multia is the
only Alpha I have. I installed NetBSD on my Sparc-5 last week to 
try a cross-compile but that's not much fun on this slow machine.
I possibly get another Multia during the next weeks and will make
further tests then.

Werner