[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Delay slots
On Tue, 21 Jun 2016, coypu%SDF.ORG@localhost wrote:
> > As for "gas" and reordering, I have always viewed assembler reordering
> > as a very large design error. Assemblers should assemble; if I meant
> > something different from what I wrote I would have written the other
> > thing instead. Reordering by programs belongs in compilers, not
> > assemblers. Note that GCC for years now has used gas in no-reorder
> > mode for this exact reason -- gas does a horrible job, gcc knows far
> > more about what should be done.
> I see your point now, it seems whoever wrote much of the MIPS code in
> NetBSD felt the same - there's set noreorder almost everywhere.
While GAS's reordering may not be ideal as far as performance of code
generated is concerned, stating that its job is horrible is I think unfair
and creates FUD, which in turn makes people overuse the `noreorder' mode
in handcoded assembly, which then breaks on one processor or another.
I've seen this happen all too often. You only really need the `noreorder'
mode where you want to squeeze out every cycle and schedule a delay slot
instruction that has a data dependency with the preceding jump or branch,
move $4, $2
beq $2, $3, foo
addiu $2, $2, 1
(delay-slot instructions indented by convention).
Surely any semi-decent compiler will be better at scheduling useful
instructions into delay slots, however GAS still gets the basic task of
ensuring machine code correctness by filling delay slots in handcoded
assembly right: it swaps branches and jumps which have a delay slot (not
all do) with the preceding instruction if possible, and otherwise
schedules a NOP into the delay slot or converts a jump to a compact form
if there is one. Similarly it schedules NOPs to fulfil data dependencies
where the producer delivers its result late -- this in particular includes
MIPS I memory loads, MIPS I-III coprocessor moves (including both CP0 and
CP1/FPU), and various corner cases with the HI/LO accumulator registers.
Consequently you don't have to handle any of this stuff in handcoded
assembly, which is probably still somewhat better than having to have
conditions sprinkled across your source to put NOPs in various places
depending on what ISA level you assemble for. Of course you can instead
assume the worst and just put the maximum number of NOPs ever required
everywhere, but then users of newer ISAs only will start demanding to drop
support for older ISAs so that they don't lose performance and memory
space for these extraneous (from their point of view) NOP fillers.
Certainly there are complex hazards too where side effects are involved,
such as with poking at the TLB, but that is never handled automatically,
be it with the compiler or the assembler -- you need to handcode this
stuff anyway (or switch to a modern MIPS ISA which has hazard barrier
instructions such as EHB and JR.HB).
So being able to support older ISAs with no pain for the newer ISAs is
perhaps a worthwhile gain from the "very large design error". Otherwise
we probably wouldn't have modern software support anymore for legacy MIPS
ISAs (anything MIPS IV or below) and consequently computers with those
Main Index |
Thread Index |