Port-mips archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: arcbios_calls.S data hazards
On Sun, 7 Dec 2025, Steve Rumble wrote:
> If I remove noreorder, the assembler addresses the data hazards.
Correct. Obviously only ones coming from the execution flow itself and
not complex ones, such as between writing a TLB entry and making use of it
for virtual address translation.
> However, it also adds a nop after each jump, pushing the intended
> delay slot instruction back, which introduces a bug. Presumably in
> reorder mode one is supposed to write as though the BDS doesn't exist
> (i.e., the instruction after a branch in the asm file will always be
> executed after the branch). Is that correct?
Correct, you need to swap the intended delay slot instruction with the
respective preceding jump or branch and the assembler will swap every such
pair back.
The assembler will only add a NOP after a jump or branch if it is unable
to reorder the preceding instruction into the delay slot. This might be
because of a data dependency between the preceding instruction and the
jump or branch, or the preceding instruction itself having a delay slot,
or the preceding instruction being one of the few special instructions the
execution of which is not safe in a delay slot, such as ERET.
You obviously want to remove any redundant NOPs then as well.
> However, if I move the BDS instruction to before the branch, the
> assembler just keeps the same order and adds a nop rather than
> reordering to exploit the unused delay slot. If a nop is already
> there, it still adds another. I tried passing -O2 to as ("remove
> unneeded NOPs and swap branches"), but that made no difference.
Weird, -O2 is already the default, and things work here just fine, albeit
with a cross-assembler:
$ cat reorder.s
.text
.globl foo
.ent foo
foo:
move $2, $4
jr $31
.end foo
$ mips-netbsd-as -o reorder.o reorder.s
$ mips-netbsd-objdump -d reorder.o
reorder.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <foo>:
0: 03e00008 jr ra
4: 00801025 move v0,a0
...
$
It starts making a lot of sense when you switch ISA levels, e.g.:
$ cat mfc0.s
.text
.globl foo
.ent foo
foo:
mfc0 $2, $15
jr $31
.end foo
$ mips-netbsd-as -mips3 -o mfc0-mips3.o mfc0.s
$ mips-netbsd-objdump -d mfc0-mips3.o
mfc0-mips3.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <foo>:
0: 40027800 mfc0 v0,c0_prid
4: 03e00008 jr ra
8: 00000000 nop
c: 00000000 nop
$ mips-netbsd-as -mips4 -o mfc0-mips4.o mfc0.s
$ mips-netbsd-objdump -d mfc0-mips4.o
mfc0-mips4.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <foo>:
0: 03e00008 jr ra
4: 40027800 mfc0 v0,$15
...
$
Here as from MIPS IV ISA the MFC0 instruction has lost its coprocessor
move delay slot, so it's now safe to be reordered into a branch delay slot
(please ignore the peculiarity of c0_prid vs $15 in the dump; the machine
instruction is bitwise the same in both cases).
It also helps when the machine instruction to land in the delay slot
comes from a macro, e.g.:
$ cat li.s
.text
.globl foo
.ent foo
foo:
li $2, bar
jr $31
.end foo
$ mips-netbsd-as --defsym bar=0x1234 -o li16.o li.s
$ mips-netbsd-objdump -d li16.o
li16.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <foo>:
0: 03e00008 jr ra
4: 24021234 li v0,4660
...
$ mips-netbsd-as --defsym bar=0x12345678 -o li32.o li.s
$ mips-netbsd-objdump -d li32.o
li32.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <foo>:
0: 3c021234 lui v0,0x1234
4: 03e00008 jr ra
8: 34425678 ori v0,v0,0x5678
c: 00000000 nop
$
-- which seems relevant here.
There are some restrictions though on the expansion of an assembly macro
into a delay slot where relaxation is involved (required because only a
single pass is made over the source), i.e. where the immediate or offset
operand is a symbol that needs to be resolved at link time and machine
code alternatives are considered by the assembler depending on whether the
symbol turns out local or external at the conclusion of assembly or where
GP-relative addressing is considered.
It is a limitation of the tool, which could possibly be lifted if someone
were willing to invest their resources. It does not appear relevant for
this source code though.
> Optimising for maintenance is a good point and this code is hardly
> performance-critical. But maybe reordering ends up being similarly
> tricky in the end? Of the .S files in sys/arch/mips/mips that aren't
> just macro expansions, about 80% set noreorder.
I don't know. For decades now I have been trying to persuade people to
avoid writing noreorder code unless absolutely necessary and yet both such
code continues to exist and new one appears, and causes headaches with
missed delay slot fillers, especially when someone used to newer ISAs only
is unaware of the stricter requirements of the earlier ISAs.
I guess you need to figure out what's wrong with your build system that
prevents the assembler from doing its job. I'm only familiar with the
compiler tool chain and not the NetBSD compilation setup.
Maciej
Home |
Main Index |
Thread Index |
Old Index