Subject: Re: NetBSD-1.4.2 kernel is slower than 1.4 and 1.4.1
To: Inaki Saez <jisaez@sfe.indra.es>
From: Ignatios Souvatzis <is@jocelyn.rhein.de>
List: port-amiga
Date: 03/31/2000 22:24:17
Inaki Saez wrote:
> Both systems use 68040 cpu.
>
> Anyway most of the rest of the tests have a drop of 3% to 5%. I think they
> should not have that much performance penalty.
Ok, let me cite from the cvs log, which is a citation from the M68040 user
manual:
<cvs log>
revision 1.26
date: 1999/11/25 20:30:38; author: is; state: Exp; lines: +8 -1
From the 68040 User Manual, page 4-10:
"To fully support self-modifying code in any situation, it is imperative that
a CPUSHA intrcution is executed before the execution of the first self-modified
instruction. The CPUSHA instruction has the effect of ensuring that there is
no stale data in memory, the pipeline is flushed, and instruction prefetches
are repeated and taken from external memory."
I verified that this is the only way (I can think of) to make the sigtramp
regression test work on 68040. doing cpushl dc; cinvl ic; over the affected
address range, then nop (to synchronize the pipeline) is not enough; apparently
the nop does not FLUSH the pipeline and prefetch...
Note that the 68060 UM has copied the above cited passage, but in fact this is
not true. This might be connected to the fact that the 68060 does ensure
memory access order under most conditions.
</cvs log>
I cite this here to demonstrate that I tried possible alternatives, back then.
The problem solved is not theoretical; I started to look at it because
our regression tests showed a bug, and I finally found the cause. I tried
to do it smarter than what the 68040 UM _requires_, but nothing worked, so
I finally did it as documented.
Yes, this is evil.
Yes, you should expect a heavy performance hit on exec() benchmarks, and
a noticable performance hit on other benchmarks that consist of executing
really small test programs. Do you want fast programs or correct programs?
I'm sorry, but I can't help it. Besides: normal programs will have a smaller
performance hit, as they consist of more than starting up.
Regards,
Ignatios