Port-mips archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: arcbios_calls.S data hazards



On Sun, 7 Dec 2025, Steve Rumble wrote:

> If I remove noreorder, the assembler addresses the data hazards.

 Correct.  Obviously only ones coming from the execution flow itself and 
not complex ones, such as between writing a TLB entry and making use of it 
for virtual address translation.

> However, it also adds a nop after each jump, pushing the intended
> delay slot instruction back, which introduces a bug. Presumably in
> reorder mode one is supposed to write as though the BDS doesn't exist
> (i.e., the instruction after a branch in the asm file will always be
> executed after the branch). Is that correct?

 Correct, you need to swap the intended delay slot instruction with the 
respective preceding jump or branch and the assembler will swap every such 
pair back.

 The assembler will only add a NOP after a jump or branch if it is unable 
to reorder the preceding instruction into the delay slot.  This might be 
because of a data dependency between the preceding instruction and the 
jump or branch, or the preceding instruction itself having a delay slot, 
or the preceding instruction being one of the few special instructions the 
execution of which is not safe in a delay slot, such as ERET.

 You obviously want to remove any redundant NOPs then as well.

> However, if I move the BDS instruction to before the branch, the
> assembler just keeps the same order and adds a nop rather than
> reordering to exploit the unused delay slot. If a nop is already
> there, it still adds another. I tried passing -O2 to as ("remove
> unneeded NOPs and swap branches"), but that made no difference.

 Weird, -O2 is already the default, and things work here just fine, albeit 
with a cross-assembler:

$ cat reorder.s
	.text
	.globl	foo
	.ent	foo
foo:
	move	$2, $4
	jr	$31
	.end	foo

$ mips-netbsd-as -o reorder.o reorder.s
$ mips-netbsd-objdump -d reorder.o

reorder.o:     file format elf32-tradbigmips


Disassembly of section .text:

00000000 <foo>:
   0:	03e00008 	jr	ra
   4:	00801025 	move	v0,a0
	...
$ 

 It starts making a lot of sense when you switch ISA levels, e.g.:

$ cat mfc0.s
	.text
	.globl	foo
	.ent	foo
foo:
	mfc0	$2, $15
	jr	$31
	.end	foo
$ mips-netbsd-as -mips3 -o mfc0-mips3.o mfc0.s
$ mips-netbsd-objdump -d mfc0-mips3.o

mfc0-mips3.o:     file format elf32-tradbigmips


Disassembly of section .text:

00000000 <foo>:
   0:	40027800 	mfc0	v0,c0_prid
   4:	03e00008 	jr	ra
   8:	00000000 	nop
   c:	00000000 	nop
$ mips-netbsd-as -mips4 -o mfc0-mips4.o mfc0.s
$ mips-netbsd-objdump -d mfc0-mips4.o

mfc0-mips4.o:     file format elf32-tradbigmips


Disassembly of section .text:

00000000 <foo>:
   0:	03e00008 	jr	ra
   4:	40027800 	mfc0	v0,$15
	...
$ 

Here as from MIPS IV ISA the MFC0 instruction has lost its coprocessor 
move delay slot, so it's now safe to be reordered into a branch delay slot 
(please ignore the peculiarity of c0_prid vs $15 in the dump; the machine 
instruction is bitwise the same in both cases).

 It also helps when the machine instruction to land in the delay slot 
comes from a macro, e.g.:

$ cat li.s
	.text
	.globl	foo
	.ent	foo
foo:
	li	$2, bar
	jr	$31
	.end	foo
$ mips-netbsd-as --defsym bar=0x1234 -o li16.o li.s
$ mips-netbsd-objdump -d li16.o

li16.o:     file format elf32-tradbigmips


Disassembly of section .text:

00000000 <foo>:
   0:	03e00008 	jr	ra
   4:	24021234 	li	v0,4660
	...
$ mips-netbsd-as --defsym bar=0x12345678 -o li32.o li.s
$ mips-netbsd-objdump -d li32.o

li32.o:     file format elf32-tradbigmips


Disassembly of section .text:

00000000 <foo>:
   0:	3c021234 	lui	v0,0x1234
   4:	03e00008 	jr	ra
   8:	34425678 	ori	v0,v0,0x5678
   c:	00000000 	nop
$ 

-- which seems relevant here.

 There are some restrictions though on the expansion of an assembly macro 
into a delay slot where relaxation is involved (required because only a 
single pass is made over the source), i.e. where the immediate or offset 
operand is a symbol that needs to be resolved at link time and machine 
code alternatives are considered by the assembler depending on whether the 
symbol turns out local or external at the conclusion of assembly or where 
GP-relative addressing is considered.

 It is a limitation of the tool, which could possibly be lifted if someone 
were willing to invest their resources.  It does not appear relevant for 
this source code though.

> Optimising for maintenance is a good point and this code is hardly
> performance-critical. But maybe reordering ends up being similarly
> tricky in the end? Of the .S files in sys/arch/mips/mips that aren't
> just macro expansions, about 80% set noreorder.

 I don't know.  For decades now I have been trying to persuade people to 
avoid writing noreorder code unless absolutely necessary and yet both such 
code continues to exist and new one appears, and causes headaches with 
missed delay slot fillers, especially when someone used to newer ISAs only 
is unaware of the stricter requirements of the earlier ISAs.

 I guess you need to figure out what's wrong with your build system that 
prevents the assembler from doing its job.  I'm only familiar with the 
compiler tool chain and not the NetBSD compilation setup.

  Maciej


Home | Main Index | Thread Index | Old Index