Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RE: Race in MSCP (ra/rx) driver



On Friday, August 28, 2020 at 4:54 AM, Mouse wrote:
> >> Not entirely true. Many controllers were designed long after the
> >> processors they were connected to, and made use of way more modern
> >> hardware that was much faster.  [...]
> > They might be faster on a per instruction basis, BUT none would be so
> > fast that they could receive a command, probe some arbitrary amount of
> > memory and produce useful results (usually back in memory) before the
> > system CPU started executing the next instruction.
> 
> Not "some arbitrary amount of memory".  The amount of memory accessed
> by the device - at least in the races I dealt with - was fixed, and quite small.
> In the case of the bootblocks' race, the first DMA cycle by the device was all it
> took to break things - and that *could* well arrive before the next instruction
> runs, especially if the setup sequencing is done in a gate array or some such
> instead of firmware.

I don't see this potential.  These theoretically infinitely fast devices that I'm 
talking about (MSCP, TMSCP, Network devices) aren't engaged by simple device
register interactions (like a DL11 or even the RH11) but they have to reach into
the host's memory and fetch the command information and parameters.  
These memory references will be at the memory speed which will be closely
tied to, and competing with, the processor, so the potential for immediate 
(before the next instruction) completion can't actually happen.

On a separate and/or related issue, your observation about the MicroVAX2's
documented initial Qbus map register state when a MSCP boot is invoked 
from the >>> prompt seems to be at least slightly wrong.  The following 
details about the Qbus Map register state are observed on the MicroVAX2 
and MicroVAX 3900 simh vax simulators which have the DEC supplied boot 
ROMs without any modifications that are related to the Qbus map registers:

MicroVAX II (KA630) simulator V4.0-0 Current        git commit id: 5c48229c
sim> show qba map
Qbus-MAP[0000] = 00000000 (Value == Index)
Qbus-MAP[0001] = 00000000
Qbus-MAP[0002 thru 1FFF] same as above
sim> boot
Loading boot code from internal ka630.bin



KA630-A.V1.3

Performing normal system tests.

  5..4..3..

Tests completed.

>>>
Simulation stopped, PC: 20040CD5 (INCL R0)
sim> show qba map
Qbus-MAP[0000] = 00000000 (Value == Index)
Qbus-MAP[0001 thru 1FFF] same as above
sim> show qba map=2
Qbus-MAP[0002] = 00000002 (Value == Index)
sim> c
boot dua0

  2..
?4D DEVOFFLINE, DUA0
HALT instruction, PC: 00000EE6 (MOVL A(R7),R1)
sim> show qba map
Qbus-MAP[0000] = 80000009 (Valid)
Qbus-MAP[0001] = 8000000A (Valid)
Qbus-MAP[0002] = 80000002 (Valid, Value == Index)
Qbus-MAP[0003 thru 1FFF] same as above
sim>

---------------------------------------------------------------------
MicroVAX 3900 simulator V4.0-0 Current        git commit id: 5c48229c
sim> show qba map
Qbus-MAP[0000] = 00000000 (Value == Index)
Qbus-MAP[0001] = 00000000
Qbus-MAP[0002 thru 003F] same as above
Qbus-MAP[0040] = D61DAFD6 (Valid)
Qbus-MAP[0041] = AFC51EAF (Valid)
Qbus-MAP[0042] = AF19AF17 (Valid)
Qbus-MAP[0043] = FFF0311B (Valid)
Qbus-MAP[0044] = 00000000
Qbus-MAP[0045 thru 0047] same as above
Qbus-MAP[0048] = 001312D0
Qbus-MAP[0049 thru 0049] same as above
Qbus-MAP[004A] = CC41E900 (Valid)
Qbus-MAP[004B] = 00000000
Qbus-MAP[004C thru 1FFF] same as above

NOTE: On the MicroVAX 3900, the Qbus Map is actually located somewhere 
            in in the system RAM with a register in the Qbus adapter that points 
            to this RAM and those memory pages specifically excluded from 
            availability in the memory descriptor passed from the boot ROM into
            booted operating systems.  The above displayed unusual values are 
            the initial state of the allocated RAM memory is not particularly 
            relevant since once the Boot ROM gets to run, it will properly test 
            and initialize both RAM and the Qbus map.

sim> b
Loading boot code from internal ka655x.bin

KA655-B V5.3, VMB 2.7
Performing normal system tests.
40..39..38..37..36..35..34..33..32..31..30..29..28..27..26..25..
24..23..22..21..20..19..18..17..16..15..14..13..12..11..10..09..
08..07..06..05..04..03..
Tests completed.
>>>
Simulation stopped, PC: 200436B0 (BLBC R0,200436DB)
sim> sh qba map
Qbus-MAP[0000] = 80000000 (Valid, Value == Index)
Qbus-MAP[0001 thru 1FFF] same as above
sim> sh qba map=2
Qbus-MAP[0002] = 80000002 (Valid, Value == Index)
sim> continue

>>>b dua0
(BOOT/R5:0 DUA0
Simulation stopped, PC: 2004D848 (MOVL (R1)[R2],44(R9))
sim> sh qba map
Qbus-MAP[0000] = 00000000 (Value == Index)
Qbus-MAP[0001 thru 1FFF] same as above
sim> continue

  2..
Simulation stopped, PC: 00001961 (ADDL3 34(R9),#42,(SP))
sim> sh qba map
Qbus-MAP[0000] = 8000000C (Valid)
Qbus-MAP[0001] = 8000000D (Valid)
Qbus-MAP[0002] = 80000002 (Valid, Value == Index)
Qbus-MAP[0003 thru 1FFF] same as above
sim> continue

?4D DEVOFFLINE, DUA0
HALT instruction, PC: 00000C1A (MOVL (R11),SP)
sim> sh qba map
Qbus-MAP[0000] = 8000000C (Valid)
Qbus-MAP[0001] = 8000000D (Valid)
Qbus-MAP[0002] = 80000002 (Valid, Value == Index)
Qbus-MAP[0003 thru 1FFF] same as above
sim>

In both the MicroVAX2 and the MicroVAX 3900, the boot ROM maps 2 pages
(1KB) of addresses 0x0-0x3FF in Qbus space to known good RAM to load at 
least the first 1 or 2 blocks of data from the disk.  Both of these boot ROMs
know how to interpret an ODS2 file system and then to achieve a VMS boot.
If the file system isn't ODS2, then code in the initial sector (maybe 2) will be
executed to perform a non VMS boot.  In  both cases, all of the Qbus map
registers are initialized with valid bits set pointing, more with only 2 
exceptions to the related physical memory address.

> Also, "before the [host] started executing the next instruction" is not what
> matters.  "Before the host finishes preparing for completion" is what
> matters, and, in each of the case I fixed, that took well over one instruction.
> In the bootblock case, I think it was something like
> 8-to-10 instructions; in the kernel case, I don't know - it was something like
> five instructions before it entered tsleep, and I don't know how long to took
> tsleep to get to the point where the device could interrupt without breaking
> anything.  Looking at the source, I'd estimate at least another five
> instructions (and one of those 10 or so instructions was a CALLS, which is
> notoriously slow).

This absolutely demonstrates that many such cases exist with relatively long
sequences of instructions before the original hardware responded, and yet
the authors of the code in question managed to be satisfied that the code 
they wrote was good enough.

> None of which strikes me as very relevant.  A race is still a race and arguably
> should be fixed, especially when it's as simple to fix as the ones I ran into
> were, even if the host won't ever lose the race under normal conditions.  In
> addition to all the things already raised, consider someone single-stepping in
> kgdb or ddb.

All good points.

> > All true, but none could possible perform all the necessary steps to
> > interpret->process->saveresults->interrupt BEFORE the system CPU
> > started to interpret the next instruction.
> 
> Even when true, does that excuse the code depending on it for correctness,
> especially when it's so simple to fix?  I think not.

I'm not suggesting that your fix is inappropriate.  

I am suggesting that when you create a simulator that has dramatically 
different timing interactions than what actually happened with the 
original hardware, you will then be burdened with chasing problems like 
this on every operating system (and each historical version of that 
operating system) that ran on the hardware you're simulating.  

Changing the case where you've got the original source code, is 
somewhat easy.  Changing all the other cases involves either a 
crazy amount of digging through binary code creating non trivial
ways to patch the code (repeat for each version and different 
software).

I am suggesting that it is far easier to change the simulator to reflect
the relative timing interactions that existed on the original hardware.
This change goes in one place (the simulator you're currently 
working on), rather than trying to dig inside each ancient black-box 
that was code that worked well enough on the old hardware.

- Mark


Home | Main Index | Thread Index | Old Index