Re: Delay slots

To: <coypu%SDF.ORG@localhost>
Subject: Re: Delay slots
From: <Paul_Koning%Dell.com@localhost>
Date: Mon, 20 Jun 2016 18:04:39 +0000
> On Jun 20, 2016, at 1:04 PM, coypu%SDF.ORG@localhost wrote:
> 
> On Mon, Jun 20, 2016 at 02:44:48PM +0000, Paul_Koning%Dell.com@localhost wrote:
>> 
>>> On Jun 11, 2016, at 1:23 PM, coypu%SDF.ORG@localhost wrote:
>>> 
>>> Hi,
>>> 
>>> I've heard port-pmax has trouble with mfc0 instruction needing a delay
>>> slot after it, and the generic MIPS code being modified and tested
>>> against newer machines which do not need this.
>>> 
>>> Delay slots are a generic MIPS problem, 
>>> ...
>>> If someone could provide a useful list of problematic instructions and
>>> mention architectures suffering problems, this could be a good tool.
>> 
>> I think "delay slot" is a specific term with a different meaning.  Mostly it refers to the bizarre handling of the instruction immediately following a branch instruction (other than branch-likely). In MIPS-1 it also shows up in loads, but that disappeared a very long time ago.
>> 
>> What you're talking about I know as a "hazard" -- a machine-specific rule that says after some privileged instructions you need to do extra stuff before looking for a result, or expecting the action to take effect, or whatever.  That extra stuff might be NOP instructions, SSNOP instructions, or even weirder things.
>> 
>> This is all specific to individual machine implementations.  A generic list would have to be the union of all hazards, which is going to be quite a long list.  Some machines have very few; others (such as the Sibyte chips) have a rather substantial set.  In some cases, specific chip revs have even more hazards and particularly bizarre rules because you're actually looking at chip bugs and their workarounds.
>> 
>> So my answer would be: read the specific machine data sheet or programming manual for the hazards.  Hopefully they are described correctly and completely.  (Some vendors assign data sheet writing to the marketing department, so the English is pretty good but the technical content is not.)  You may need to find the "errata" for the product in question; access to that information may be hard to get at times.
>> 
>> 	paul
>> 
> 
> Hi Paul!
> 
> I've been playing a bit since, and came across this helpful stockpile of
> docs: http://wiki.prplfoundation.org/wiki/MIPS_documentation
> Which seems pretty decent for any CPU designed in this century.

Nice.  It seems to be almost entirely MIPS Co. designs, though.  Not Sibyte, or Raza for example.

> I was under the impression we've got a big issue with many instructions
> being done without considering MIPS-I hazards. I'm no longer sure that
> is the case, but the unifdef & grep -A trick was really handy to feeling
> more confident about not having missed some cases in our ifdef-heavy
> assembly code.
> 
> The impression I get thus far is that things executed on the same
> pipeline (i.e. 'regular' instructions) are mostly sensible and
> consistent across an ISA version (with MIPS-I being the worst offender,
> and many machines since have interlocks for nearly everything, and gas
> with reorder doing a fairly good job of doing the right thing if we
> just let it try).

I think the best answer for MIPS-I is "just say no".  Beyond that, yes, interlocks seem to be there to take care of things in the non-privileged part of the ISA.  The one oddity is the branch delay slot.  That long ago stopped being about hazards; it's now just a very ugly wart in the architecture that can't be removed because of code compatibility.

I haven't studied all that many MIPS implementations in detail.  The ones I know best are the Sibyte 1250 (alias BCM 12500) and the Raza XLR and XLP.  These are all MIPS64 (XLP is rev 2, I think), fully interlocked in all the non-privileged operations both within a single pipe line and across pipelines.  That is as it should be.  But the 1250 has a rather substantial number of privileged instruction hazards that may require SSNOP or other trickery.  Raza has far fewer but there are a couple, at least in the errata.  Writing code that's safe across these machines isn't too painful; just following SB1250 rules is 95% of what's needed.  But if you also want to do a dozen other MIPS implementations it probably gets messier and the code may become somewhat bulky.

As for "gas" and reordering, I have always viewed assembler reordering as a very large design error.  Assemblers should assemble; if I meant something different from what I wrote I would have written the other thing instead.  Reordering by programs belongs in compilers, not assemblers.  Note that GCC for years now has used gas in no-reorder mode for this exact reason -- gas does a horrible job, gcc knows far more about what should be done.

> And that most of the per-machine variance is when doing things like
> waiting for the FPU to be enabled (hopefully we can just wait enough)

"wait long enough" sounds like a bug waiting to happen.  The machine documentation will (well, should) spell out the rules.  It may require NOPs, or SSNOPs, or other stuff like ERET.  But it will be something definite, and the answer is to implement what is required.  If different machines have different rules, the union of the mechanisms will be needed.  So if machine A says "4 SSNOPs" and machine B says "cross an ERET" then you'll need both.  Hopefully that union isn't too excessive, and the requirements aren't in conflict.  I haven't seen problems; we might have SSNOPs that only one of the machines need, but (apart from the loss of a couple of cycles) they cause no harm in other platforms.

> ...and instructions like 'wait' being broken on some machines which
> we've already got an opt-in quirk table for. (coincidentially came
> across it mentioned on sbmips - BCM4706)
> 
> Perhaps I'll run into more issues once I play with some real hardware ;)
> I've been using an emulator so far, but already found some issues worth
> tackling, e.g. SOSEND_LOAN being broken on some MIPS machines, and I can
> see this issue in emulation, too.

That's interesting.  In general, emulators are likely to tell you very little about hazards.  They might have hazard-like bugs, but actual hazards in the sense we've been talking about are probably only going to appear in real hardware.  And there they might not show up consistently.  It is surprisingly hard to get the whole machine pipeline to be in the same state consistently.  I've seen CPU bugs having to do with mishandled pipeline states that the factory had never seen, and that in our code would only show up under load once every couple of hours or so.

	paul
Follow-Ups:
- Re: Delay slots
  - From: coypu
References:
- Delay slots
  - From: coypu
- Re: Delay slots
  - From: Paul_Koning
- Re: Delay slots
  - From: coypu
Prev by Date: Re: Delay slots
Next by Date: Re: Delay slots
Previous by Thread: Re: Delay slots
Next by Thread: Re: Delay slots
Indexes:
Home | Main Index | Thread Index | Old Index