Subject: Re: fpu emulation on LC040 - did it ever work?
To: <port-m68k@netbsd.org>
From: Tod McQuillin <devin@spamcop.net>
List: port-m68k
Date: 01/04/2001 17:58:28
On Thu, 4 Jan 2001, Ignatios Souvatzis wrote:

> I noticed something strange, which might be a CPU bug on my machine,
> but even with my workaround, it still didn't work.
>
> I know have a nearly working system, which basically ignores the
> hardware speedups in the
> 68LC040/68040V/68LC060/68060V/68060withdisabledFPU floating point
> emulation exception stack frame (only restores the instruction PC onto
> the stack frame).
>
> This makes me very much wonder whether it ever worked on a LC040
> system. Can anybody confirm this? A few positive reports would help
> me, too.

There are (at least) two problems with the FPE on LC040.

The first is that it sometimes generates incorrect results.  This is
because on LC040 the FPE code uses for the next instruction the PC from
the stack frame instead of the PC it calculated, since Motorola tells us
it is all nicely precalculated.  However, this fails for instructions that
change the PC, like branches, like FBcc.  As a result branch instructions
don't branch every time they're supposed to.  At the end of this email
I've attached part of an exchange I had with Ken Nakata where I pointed
this out.  No PR has been submitted.

The second problem is that somehow, somewhere, user programs using the FPE
get (apparently) random segfaults.  This one is a mystery to me.
-- 
Tod McQuillin

Here's the excerpt from my earlier email:

Ken Nakata wrote:

> Did you try with the second if statement commented out?  That makes
> FPE use software-calculated address of next instruction whether or not
> the processor is 68LC040.  If it's really doing something wrong, it
> might generate an interesting result (which I doubt will, since we
> didn't use to have this statement before, yet it didn't run on LC040
> anyway).

Well, it's very interesting.  It seemed to me that if the FPE worked on
the 030 by calculating the next PC, it should work on the LC040 as
well.  If we are really calculating the PC correctly, our results should
agree with the PC in the stack frame.

So I created a new DL_ constant (DL_PC = 0x2000) added this code:

    if (frame->f_format == 4) {
#ifdef DEBUG
      if (fpu_debug_level & DL_PC)
        if (frame->f_pc != savedpc)
          printf("  fpu_emulate: savedpc and calc pc differ by %d\n",
                 savedpc - frame->f_pc);
#endif
        frame->f_pc = savedpc;  /* XXX Restore PC -- 68{EC,LC}040 only */
    }

Then I ran a few test programs on an LC040 and watched the sparks fly :-)

Actually, it was a very interesting result.  Our calculated PC matched the
lc040's stack frame almost every time, except when emulating the FBcc
instruction.  With FBcc we got results quite different (sometimes).

Well, this makes sense!  The FBcc instruction is all about changing the
PC.  There's no way the LC040's stack frame can have the proper next PC
without actually knowing how to do a FBcc insn.  The PC in the stack frame
is "the address of the instruction after the faulting instruction" but
that's not necessarily what we want to return to if we are emulating a
FBcc (and maybe a few other opcodes that touch the PC as well but I didn't
look for those).

After removing the savedpc code, I get better results: the LC040 now
properly does floating point arithmetic that it failed on before.  There
is still at least one more bug though; I still get random segfaults.  But
I am making progress.